biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 11:20:54 +00:00
88719f39b4 Add single-layer trace (Phase 2.6) for detailed debugging
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 11:10:39 +00:00
8256e23aed Fix mHCContext attribute access (not tuple unpacking) and enable attention diag
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 11:07:24 +00:00
72c139a59f Enable MHC_DIAG for diagnostic run
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 11:07:19 +00:00
cd661c2e40 Add attention and Q/KV diagnostics (MHC_DIAG flag)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 10:54:42 +00:00
9584fcbc23 Fix top5_ids variable name in decode logging
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 10:49:31 +00:00
a6d56d10ca Add top-20 logging and thinking token detection in decode loop
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 10:33:44 +00:00
d891ae7e96 Fix prompt format: use DeepSeek V4 chat tokens
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 10:28:26 +00:00
f86742ef8e Cache layer weights on GPU — eliminates per-token CPU→GPU transfer
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 10:07:16 +00:00
ce3d6069cc CRITICAL FIX: mHC base/scale ordering matches fn ordering [pre, res, post]
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 10:02:59 +00:00
9a43e9aa77 CRITICAL FIX: mHC fn weight row ordering was wrong
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 09:56:20 +00:00
0346e479d4 Add system prompt, CLI args, inverse RoPE flag, minimal e2e test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 09:23:12 +00:00
429fc3db40 Fix expert weight indexing for 1D tensor
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 09:22:28 +00:00
33004dcbf4 Fix expert weight broadcasting (wt.item() for scalar multiply)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 09:21:52 +00:00
1434b35971 Add residual diagnostic test — per-layer magnitude tracking
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 09:17:38 +00:00
1c18c16c68 Fix production rope.py: FP32 arithmetic for forward_rope_partial + inverse_rope_bf16
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 09:17:10 +00:00
970869d017 Fix mHCBlock import + relax RoPE round-trip threshold (BF16 noise expected)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 09:16:00 +00:00
a2ee78b564 Fix RoPE shape bug (interleave needs separate even/odd assembly)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 09:15:01 +00:00
9d96c2fbbf CRITICAL FIX: FP32 RoPE cache + FP32 arithmetic for inverse RoPE round-trip
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 09:14:05 +00:00
db74a887ab Add minimal e2e test + fix MoE expert loop bug (indentation)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 07:02:40 +00:00
e195d9d3a7 add SKIP_ROUTED_MOE debug flag, re-enable sinks