biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 08:54:27 +00:00
b94f8d4ed8 Test: fused router kernel vs BF16 reference path
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 08:49:48 +00:00
2433700a69 Fused router kernel: rewrite epilogue with proper CuTeDSL constructs
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 08:37:18 +00:00
d01b4b02de Complete NVFP4 fused router kernel: full MMA + router epilogue
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:55:31 +00:00
25b9a5f32d Fix test: use from_dlpack for c_tensor
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:54:37 +00:00
d2819fc39c Fix test: use as_tensor instead of make_tensor
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:53:44 +00:00
5ea71ebd78 Add NVFP4 CuTeDSL compilation test (verify MmaMXF4NVF4Op compiles)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:53:23 +00:00
fa6dbd4aa2 WIP: Rewrite NVFP4 fused router in CuTeDSL with MmaMXF4NVF4Op (sf_vec_size=16)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:51:34 +00:00
4f706b55d7 Remove raw CUDA C++ fused router and DeepGEMM (MXFP4, wrong instruction)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:44:55 +00:00
424fe6bf2c Fix: use SM100_MMA_MXF8F6F4_SS (not MXF4) to match Nvfp4Linear path
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:41:45 +00:00
2e2caadf7d WIP: NVFP4 fused router kernel in raw CUDA C++ using DeepGEMM primitives
e3ea609ddd Embed DeepGEMM source (not submodule) for SM100 raw CUDA GEMM primitives
dae83723a3 Add DeepGEMM as third-party dependency for SM100 raw CUDA GEMM primitives
Compare 3 commits »
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:29:32 +00:00
ef4c0ad489 Fix BF16 router mma_tiler: use cutlass.Int32 for CuTe DSL compatibility
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:20:55 +00:00
79be9cb8da Fix: hardcode mma_inst_shape_k=32 for NVFP4 (avoids MLIR unpack error in JIT)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:19:21 +00:00
c3a64ceed7 Fix: mma_tiler must use CuTe Ints for static layout construction
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:16:49 +00:00
39b481e52b Ensure mma_tiler contains CuTe Ints for cute.slice_ compatibility
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:14:48 +00:00
57cc20d5ad Fix SFA/SFB SMEM: blockscaled layouts are plain Layout (no .outer/.inner swizzle)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:13:35 +00:00
fcd7680583 Fix CuTe tensor creation: use from_dlpack + mark_layout_dynamic
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:11:44 +00:00
3a8c6daeb3 Fix: cutlass_torch.make_tensor -> as_tensor
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:10:59 +00:00
0553117af6 Simplify fused router test: compare fused vs 2-kernel NVFP4 path
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-06-01 07:08:58 +00:00
44a0e59808 Fix fused router test: use quantize_weight_to_nvfp4 (correct function name)