biondizzle

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 06:52:00 +00:00

4f28673bec debug: disable sinks in SDPA to check |X| impact

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 06:40:03 +00:00

e3db90b56c switch back to original prompt

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 06:39:34 +00:00

d2cf5ccc32 CRITICAL FIX: use SDPA for short sequences (FMHA padding bug)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 06:28:48 +00:00

5f98855141 test with simpler prompt

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 06:17:01 +00:00

152af7295a debug: compare FMHA vs SDPA output at layer 0

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 06:07:07 +00:00

59c75ca4e9 fix: cast attn_out back to BF16 after sink correction

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 06:03:14 +00:00

e5245ea34e fix: V tensor must be (B, n_h, hd, N) for FMHA — was transposed wrong

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 05:58:03 +00:00

91abf0f921 FMHA + analytic sink bias correction using LSE

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 05:55:11 +00:00

fac269c938 fix verify_attention: proper multi-head SDPA + GQA

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 05:53:50 +00:00

2333fc8b4b fix verify_attention.py: proper nvfp4_linear calls

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 05:51:37 +00:00

c09f68c867 add verify_attention.py: single-layer attention component test

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 04:51:19 +00:00

04dd7545b3 switch to production FMHA for full run

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 04:51:17 +00:00

738088cf49 revert: K=V with RoPE + inverse RoPE is the correct DSV4 approach

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 04:46:16 +00:00

781ee43521 try separate K (RoPE'd) and V (raw) — no inverse RoPE needed

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 04:45:59 +00:00

889521009b re-enable inverse RoPE (confirmed necessary — without it output is garbage)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 04:40:43 +00:00

92e465ca04 debug: disable inverse RoPE to check impact on output

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 04:38:43 +00:00

c69dc51b3b switch to SDPA with sinks (better residual control)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 04:32:03 +00:00

3ed8f3cc44 switch back to production FMHA kernel (with FP4 LUT fix)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 04:25:04 +00:00

ae79bd8fce debug: add top-5 logit predictions

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-31 04:16:15 +00:00

aafe2eee12 CRITICAL FIX: FP4 LUT was 4x too large!