biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:03:50 +00:00
014d647ba3 fix: sink bias domain correction — add attn_sink/scale to raw logits
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:02:45 +00:00
dbdbcecadc fix: sink_bias must be pre-converted to CuTe tensor before passing to compile
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 15:01:34 +00:00
04b66e0f9c fix: test_d5c use float for attn_sink in reference functions
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 14:59:56 +00:00
9d64434954 D5c: add sink bias (attn_sink) logit modification to FMHA kernel
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 11:03:58 +00:00
60a6f2d296 update README: D5b per-row LSE, D3/D4 DONE
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 11:03:47 +00:00
865eed0d33 cleanup: remove debug test file
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 11:02:46 +00:00
5b55cf0767 fix: k_seg is already 3D from slicing, don't add extra unsqueeze(-1)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 11:01:23 +00:00
375a682206 debug: isolated KV merge test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 11:00:37 +00:00
2252d7c865 fix: make K/V segments contiguous before passing to kernel (TMA needs contiguous tensors)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 10:59:52 +00:00
5407dc768a test: minor comment fix in D5b test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 10:59:07 +00:00
bb7ec341cb fix: D5b test uses reference attn_sum for normalization, correct D5 merge formula
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 10:57:56 +00:00
6c73069cb9 D5b: Per-row LSE output + Python KV merge test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 10:56:59 +00:00
4656fa81f9 update README: D3 and D4 status DONE
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 10:56:11 +00:00
24993428a2 fix: D4 test reference computation only applies causal mask when is_causal=True
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 10:54:42 +00:00
e3e01071f4 fix: swa_len as Int32 scalar instead of CuTe tensor
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 10:53:15 +00:00
df84420414 fix: add is_causal to FmhaKernel __init__ signature
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 10:52:32 +00:00
841a3e87b2 D4: Causal mask on SWA branch
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-26 10:51:33 +00:00
b6b581777a D3: In-kernel SWA sequence length masking
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 17:31:01 +00:00
d6a56342cc D3: add swa_lens parameter to FmhaKernel (in-kernel masking TBD)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 17:28:44 +00:00
e9f476b6dc fix typo: from_dlset → from_dlpack