biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 17:27:37 +00:00
f278348f44 D3: SWA mask with BF16 min pre-masking approach (K[invalid]=BF16_MIN → scores≈-inf)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 17:23:04 +00:00
cfbeb9c454 D3: SWA mask test with zero-masking approach (pre-mask K/V in Python)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 17:20:55 +00:00
68cb0236b5 D3: add SWA sequence length mask test (reference oracle + full-window regression)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 17:18:51 +00:00
7f69979c5f D1.5: add multi-KV-tile attention test with Python KV merge
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 17:16:06 +00:00
8f35b75164 D2: comprehensive head-packed test (n_h=1, 64, 128, hd=64, 128)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 17:15:09 +00:00
dbe2ecbd41 D2: add num_query_heads/batch_size params + batch grid dimension
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 17:13:41 +00:00
7c6fdd151d fix: use reference attn_sum for normalization (kernel LSE per-row may be wrong)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 17:12:01 +00:00
673825c242 rewrite D2 regression test: match existing Stage D1 test pattern with cute.compile + PV tiles
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 17:06:23 +00:00
06cb800242 fix regression test: use normalize=False + external LSE normalization
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 17:04:32 +00:00
13b5afc471 fully revert FmhaKernel changes to debug regression
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 17:03:21 +00:00
0b9f9da2f7 revert grid change to debug regression
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 17:00:59 +00:00
aa66f44ff9 add n_h=1 regression test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 16:54:59 +00:00
efdedab399 fix tests: use 3D tensors (M, hd, 1) matching kernel local_tile expectations
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 16:53:19 +00:00
a4499f5aa8 fix tests: pad Q to 128 rows (M tile size) for all configs
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 16:51:54 +00:00
af136eee27 fix: use CUstream instead of cuStream(0)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 16:50:51 +00:00
4826fa6afb D2: add num_query_heads/batch_size params + head-packed test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 16:42:04 +00:00
d53e0a33a9 NVFP4-3: add use_2cta_instrs conditional to gemm_runner
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 16:25:06 +00:00
22a2fc563e cleanup: remove diagnostic test file
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 16:24:13 +00:00
a064b99d3d fix test 4: use silu(gate)+swiglu interleaved (matching fused kernel output)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 16:23:04 +00:00
e76ea36337 fix test: use proper global_scale from quantize_to_nvfp4 for larger shape test