biondizzle
  • Joined on 2025-12-10
biondizzle pushed to main at biondizzle/nvfp4-megamoe-kernel 2026-05-13 23:25:22 +00:00
1eb9c43217 Rewrite CUTLASS kernel based on NVIDIA example 72b (nv_float4_t, CollectiveBuilder, OpClassBlockScaledTensorOp)
biondizzle pushed to main at biondizzle/nvfp4-megamoe-kernel 2026-05-13 23:23:03 +00:00
8a9af441dc Fix includes: use cutlass/float_subbyte.h (has float_e2m1_t and float_ue4m3_t), point to latest CUTLASS
biondizzle pushed to main at biondizzle/nvfp4-megamoe-kernel 2026-05-13 23:18:28 +00:00
d789f5e3e0 Add CCCL include path for CUTLASS 3.x
biondizzle pushed to main at biondizzle/nvfp4-megamoe-kernel 2026-05-13 23:17:31 +00:00
12588047fd Fix setup.py: use include_dirs and extra_compile_args (correct PyTorch extension API)
biondizzle pushed to main at biondizzle/nvfp4-megamoe-kernel 2026-05-13 23:14:07 +00:00
1b1c3a42fe Fix setup.py source paths
biondizzle pushed to main at biondizzle/nvfp4-megamoe-kernel 2026-05-13 23:12:48 +00:00
f375c80bfe feat: CUTLASS NVFP4 block-scaled GEMM kernel (native SM100 Blackwell)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-13 23:11:17 +00:00
f375c80bfe feat: CUTLASS NVFP4 block-scaled GEMM kernel (native SM100 Blackwell)
56c7880296 Native NVFP4 TileLang kernel: tcgen05 block-scaled MMA
bf13665dbe Implement TileLang NVFP4 mega_moe L1/L2 kernels
ebc0ab0cac Fix: keep scales as float8_e4m3fn, don't pack to uint32 (min_all_cuda unsupported)
94233c4dd3 Fix __init__.py: remove private imports
Compare 6 commits »
biondizzle pushed to main at biondizzle/nvfp4-megamoe-kernel 2026-05-13 23:02:08 +00:00
56c7880296 Native NVFP4 TileLang kernel: tcgen05 block-scaled MMA
biondizzle pushed to main at biondizzle/nvfp4-megamoe-kernel 2026-05-13 22:37:00 +00:00
bf13665dbe Implement TileLang NVFP4 mega_moe L1/L2 kernels
biondizzle pushed to main at biondizzle/nvfp4-megamoe-kernel 2026-05-13 21:54:42 +00:00
ebc0ab0cac Fix: keep scales as float8_e4m3fn, don't pack to uint32 (min_all_cuda unsupported)
biondizzle pushed to main at biondizzle/nvfp4-megamoe-kernel 2026-05-13 21:43:48 +00:00
94233c4dd3 Fix __init__.py: remove private imports
biondizzle pushed to main at biondizzle/nvfp4-megamoe-kernel 2026-05-13 21:41:45 +00:00
1a452ffabd Fix weight_transform signature to match nightly vLLM finalize_weights call
biondizzle pushed to main at biondizzle/nvfp4-megamoe-kernel 2026-05-13 16:11:11 +00:00
47ca5631d8 Fix __init__.py: only import from package modules
c2b752c2fe Initial: TileLang NVFP4 mega_moe kernel package
Compare 2 commits »
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-13 16:08:39 +00:00
47ca5631d8 Fix __init__.py: only import from package modules
biondizzle created branch master in biondizzle/nvfp4-megamoe-kernel 2026-05-13 15:44:58 +00:00
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-13 15:44:58 +00:00
c2b752c2fe Initial: TileLang NVFP4 mega_moe kernel package
biondizzle pushed to main at biondizzle/nvfp4-megamoe-kernel 2026-05-13 15:22:36 +00:00
673f67681f Add vLLM integration layer and packaging
biondizzle created branch main in biondizzle/nvfp4-megamoe-kernel 2026-05-13 14:51:14 +00:00
biondizzle pushed to main at biondizzle/nvfp4-megamoe-kernel 2026-05-13 14:51:14 +00:00
a4b90b5780 Initial commit: NVFP4 mega_moe kernel in TileLang
biondizzle created repository biondizzle/nvfp4-megamoe-kernel 2026-05-13 14:49:50 +00:00