biondizzle

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 08:54:27 +00:00

b94f8d4ed8 Test: fused router kernel vs BF16 reference path

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 08:49:48 +00:00

2433700a69 Fused router kernel: rewrite epilogue with proper CuTeDSL constructs

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 08:37:18 +00:00

d01b4b02de Complete NVFP4 fused router kernel: full MMA + router epilogue

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 07:55:31 +00:00

25b9a5f32d Fix test: use from_dlpack for c_tensor

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 07:54:37 +00:00

d2819fc39c Fix test: use as_tensor instead of make_tensor

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 07:53:44 +00:00

5ea71ebd78 Add NVFP4 CuTeDSL compilation test (verify MmaMXF4NVF4Op compiles)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 07:53:23 +00:00

fa6dbd4aa2 WIP: Rewrite NVFP4 fused router in CuTeDSL with MmaMXF4NVF4Op (sf_vec_size=16)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 07:51:34 +00:00

4f706b55d7 Remove raw CUDA C++ fused router and DeepGEMM (MXFP4, wrong instruction)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 07:44:55 +00:00

424fe6bf2c Fix: use SM100_MMA_MXF8F6F4_SS (not MXF4) to match Nvfp4Linear path

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 07:41:45 +00:00

2e2caadf7d WIP: NVFP4 fused router kernel in raw CUDA C++ using DeepGEMM primitives

e3ea609ddd Embed DeepGEMM source (not submodule) for SM100 raw CUDA GEMM primitives

dae83723a3 Add DeepGEMM as third-party dependency for SM100 raw CUDA GEMM primitives

Compare 3 commits »

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 07:29:32 +00:00

ef4c0ad489 Fix BF16 router mma_tiler: use cutlass.Int32 for CuTe DSL compatibility

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 07:20:55 +00:00

79be9cb8da Fix: hardcode mma_inst_shape_k=32 for NVFP4 (avoids MLIR unpack error in JIT)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 07:19:21 +00:00

c3a64ceed7 Fix: mma_tiler must use CuTe Ints for static layout construction

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 07:16:49 +00:00

39b481e52b Ensure mma_tiler contains CuTe Ints for cute.slice_ compatibility

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 07:14:48 +00:00

57cc20d5ad Fix SFA/SFB SMEM: blockscaled layouts are plain Layout (no .outer/.inner swizzle)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 07:13:35 +00:00

fcd7680583 Fix CuTe tensor creation: use from_dlpack + mark_layout_dynamic

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 07:11:44 +00:00

3a8c6daeb3 Fix: cutlass_torch.make_tensor -> as_tensor

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 07:10:59 +00:00

0553117af6 Simplify fused router test: compare fused vs 2-kernel NVFP4 path

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-06-01 07:08:58 +00:00

44a0e59808 Fix fused router test: use quantize_weight_to_nvfp4 (correct function name)

biondizzle pushed tag v-nvfp4-fused-router-rewrite-20260601-0715 to biondizzle/nvfp4-megamoe-kernel

2026-06-01 07:08:18 +00:00