biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 09:02:28 +00:00
eb5c538c9b Complete multi-PV-tile fixes: pv_n_tile, v_fmha layout, MMA construction, n_corr_tiles
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 09:02:03 +00:00
eedcfd7d21 Fix v_fmha layout to use pv_n_tile instead of head_dim for multi-PV-tile support
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 09:00:20 +00:00
fcdfc4239c D1.4: Add pv_n_tile and n_pv_tiles for multi-PV-tile support (tcgen05 MMA max N=256)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 08:45:27 +00:00
b13da6b7a0 diag: add 2-CTA check + fix LayoutEnum in MMA test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 08:44:39 +00:00
c34291843b fix: remove bad import in NVFP4 diag
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 08:40:26 +00:00
8a8e0c5ed6 fix: import ceil_div in quantize.py (was NameError at runtime)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 08:39:14 +00:00
538dbb0643 fix: use quantize_activation_nvfp4 in diag
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 08:38:22 +00:00
e2f599e4af fix: use correct API for NVFP4-0 diag (sf_vec_size + mma_tiler_mn)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 08:30:44 +00:00
5572b74591 fix: use Sm100BlockScaledPersistentDenseGemmKernel in diag
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 08:29:17 +00:00
6b1330ba47 fix: use randint+view for FP4/FP8 tensors in diag
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 08:28:16 +00:00
3733927f28 fix: NVFP4-0 diag script — import SF_VEC_SIZE from quantize.py
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 08:26:59 +00:00
6d8f7db2dd diag: NVFP4-0 primitive verification script
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 07:39:11 +00:00
d9780c0a0c docs: add NVFP4 precision roadmap to STAGE_D.md (3 honest buckets + speculative bucket)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:55:26 +00:00
74d0822214 shit carmine left dangling
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:43:03 +00:00
3b167a4362 D1.2: TMEM budget verified on B200. Split-PV mandatory at hd=512 (MMA max N=256)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:41:44 +00:00
99000cba8d D1.2: fix probe for hd=512 (MMA max N=256, use pv_n_tile)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:40:57 +00:00
60824b62db D1.2: simplify TMEM budget probe, fix printf args
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:40:07 +00:00
de439bcd75 fix: cuda.CUstream import
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:39:30 +00:00
1c20b826d9 D1.2: TMEM budget probe using @cute.jit for MLIR context
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-23 06:38:09 +00:00
6575e83f6d fix: remove unused v_fmha_layout from probe