biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 16:21:55 +00:00
5290c91c35 fix quantize_nvfp4 kernel: use proven single-thread-per-CTA pattern from deinterleave_quantize.cu
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 16:20:31 +00:00
5508f29625 add GPU quantize diagnostic test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 16:19:08 +00:00
c2e3d15633 NVFP4-1.1 integration: GPU-only quantize kernel + MoE pipeline wiring
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 09:08:11 +00:00
6504f091ca NVFP4-1.1 Step 3: post-SWiGLU quantization test suite (all PASS)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 08:58:30 +00:00
5e8347836f NVFP4-1.1: working BF16→FP4 quantize kernel (cos 0.979)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 03:23:47 +00:00
52d11d7f92 NVFP4-1.1: standalone BF16→FP4 quantize kernel (WIP) + dequantize verification
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 03:17:15 +00:00
1f310defa0 fix: quantize_activation_nvfp4 returns 2 values, not 3
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 03:15:42 +00:00
6dac3bcaf0 NVFP4-1.1: add FP4 quantize round-trip test (step 1 of kernel fusion)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 03:07:55 +00:00
eb46e4d15e NVFP4-0.2-0.4: add FP4 primitives diagnostic test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 02:40:15 +00:00
29ad36934d cleanup: remove D2 diagnostic/experimental files, keep working codebase clean
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 02:36:48 +00:00
d5b69ac122 D2: simpler shape diagnostic using CuTe from Python (no kernel needed)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 02:34:32 +00:00
684e9a85fe fix: use utils.sm100 instead of sm100 in diagnostic
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 02:33:23 +00:00
7599801f57 D2: add flat_divide shape diagnostic kernel for multi-CTA grid
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 01:18:48 +00:00
32850f6974 Update README, STAGE_D, STAGE_D2 with D1 rescale findings and D2 status
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-25 01:09:31 +00:00
6cc151097e Revert D2 multi-CTA attempts - keeping per-head launch approach (works correctly)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 23:44:40 +00:00
34f5beb767 D2: fix gC coordinate to match 5-mode flat_divide result
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 23:43:25 +00:00
a3559538cf D2: try 6-mode coordinate for flat_divide result
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 23:42:06 +00:00
6f371d6b31 D2: add flat_divide shape print, try different coordinate order
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 23:40:39 +00:00
7007a9db79 D2: use flat_divide for runtime coordinate indexing (like CUTLASS)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-24 23:38:49 +00:00
3e340a0eee D2: fix local_tile coordinate for 4D Q (2 rest modes, not 3)