biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 06:52:00 +00:00
4f28673bec debug: disable sinks in SDPA to check |X| impact
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 06:40:03 +00:00
e3db90b56c switch back to original prompt
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 06:39:34 +00:00
d2cf5ccc32 CRITICAL FIX: use SDPA for short sequences (FMHA padding bug)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 06:28:48 +00:00
5f98855141 test with simpler prompt
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 06:17:01 +00:00
152af7295a debug: compare FMHA vs SDPA output at layer 0
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 06:07:07 +00:00
59c75ca4e9 fix: cast attn_out back to BF16 after sink correction
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 06:03:14 +00:00
e5245ea34e fix: V tensor must be (B, n_h, hd, N) for FMHA — was transposed wrong
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 05:58:03 +00:00
91abf0f921 FMHA + analytic sink bias correction using LSE
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 05:55:11 +00:00
fac269c938 fix verify_attention: proper multi-head SDPA + GQA
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 05:53:50 +00:00
2333fc8b4b fix verify_attention.py: proper nvfp4_linear calls
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 05:51:37 +00:00
c09f68c867 add verify_attention.py: single-layer attention component test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 04:51:19 +00:00
04dd7545b3 switch to production FMHA for full run
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 04:51:17 +00:00
738088cf49 revert: K=V with RoPE + inverse RoPE is the correct DSV4 approach
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 04:46:16 +00:00
781ee43521 try separate K (RoPE'd) and V (raw) — no inverse RoPE needed
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 04:45:59 +00:00
889521009b re-enable inverse RoPE (confirmed necessary — without it output is garbage)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 04:40:43 +00:00
92e465ca04 debug: disable inverse RoPE to check impact on output
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 04:38:43 +00:00
c69dc51b3b switch to SDPA with sinks (better residual control)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 04:32:03 +00:00
3ed8f3cc44 switch back to production FMHA kernel (with FP4 LUT fix)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 04:25:04 +00:00
ae79bd8fce debug: add top-5 logit predictions
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-31 04:16:15 +00:00
aafe2eee12 CRITICAL FIX: FP4 LUT was 4x too large!