biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:08:15 +00:00
940f37fb6c NVFP4 fused router kernel: full rewrite with proper block-scaled GEMM setup
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:01:04 +00:00
8658c8eca5 fix: add sf_vec_size parameter back to Nvfp4FusedRouterKernel __init__
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 06:56:35 +00:00
b97f30e289 fix: store sf_vec_size as instance variable
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 06:54:51 +00:00
c225d195ea fix: remove tcgen05.mma.Kind (doesn't exist), use make_blockscaled_trivial_tiled_mma
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 06:53:18 +00:00
e6803b450d rewrite: simplified fused router test (reference + import check)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 06:51:49 +00:00
262cec262d fix: add shape assertions to fused router test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 06:50:42 +00:00
db07d17a62 fix: set activation global scale in fused router test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 06:49:45 +00:00
2abb4a19d9 fix: set gs and ws2 fields for Nvfp4Linear in fused router test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 06:48:42 +00:00
61c04f7152 fix: Nvfp4Linear field is sf not scale_b
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 06:47:16 +00:00
982f245c67 fix: use correct Nvfp4Linear field names (fp4, scale_b, gsb)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 06:46:07 +00:00
16af96380f fix: use internal fields for Nvfp4Linear weight setup in test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 06:43:56 +00:00
7f1f224c78 fix: quantize_weight_to_nvfp4 returns 3 values, not 4
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 06:42:15 +00:00
27fd847dd0 fix: correct quantize function name in fused router test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 06:40:48 +00:00
0873d65253 test: add fused router kernel test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 06:40:22 +00:00
90b2581dfe feat: NVFP4 fused router CuTeDSL kernel (WIP)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 06:00:37 +00:00
6c28c57b6a feat: Nvfp4GroupedLinear for o_a_proj (replaces BF16 grouped BMM)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 05:58:58 +00:00
cf2b7ab7ec feat: NVFP4 gate projection for router (replaces BF16 cuBLAS)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 05:55:07 +00:00
9f14cb17d1 test: add compressor position_bias unit test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 05:54:47 +00:00
84ca520bfb fix: move compressor position_bias into CUDA kernel (was Python loop)