biondizzle
  • Joined on 2025-12-10
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-18 22:38:43 +00:00
5d37674fb1 Add cutedsl to MoEBackend type in kernel config
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-18 22:33:53 +00:00
7409204d71 Use nightly's deepseek_v4.py + attention as base, add only NVFP4 mapper
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-18 22:29:26 +00:00
a19ed4a18e Remove breakable_cudagraph import (not in nightly)
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-18 22:22:55 +00:00
b007937a68 Fix garbled imports in cutedsl/runner.py
biondizzle created branch proper-nvfp4-integration in biondizzle/nvfp4-megamoe-kernel 2026-05-18 22:19:34 +00:00
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-18 22:19:34 +00:00
a7ed8faec6 Proper NVFP4 integration: use ModelOptNvFp4Config + FusedMoE framework
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 21:38:30 +00:00
48386e34ad Fix torch.compile: use custom autograd Function instead of @torch.compiler.disable
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 21:08:01 +00:00
85e1cd3b69 Fix torch.compile crash: @torch.compiler.disable on all CuTeDSL run()
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 20:54:56 +00:00
a94011ec92 Fix torch.compile crash: remove threading.Lock from LUT cache path
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 20:40:49 +00:00
6326222d68 Fix: add abstract create_weights to CuTeDSLNvfp4LinearMethod
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 20:27:43 +00:00
450793311c Wire CuTeDSL kernels into vLLM: replace all BF16 dequant with native NVFP4
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 20:14:04 +00:00
6ce6a47be9 Add NVFP4 linear runner + attention projection test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 20:10:33 +00:00
f07643791e Fix hidden_size: shared expert uses 7168, not HC_DIM 28672
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 20:09:51 +00:00
70f50a1ec6 Fix scale assembly: use correctly-sized temp buffer for swizzle
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 20:09:20 +00:00
97bdd604e9 Fix scale assembly: reshape swizzled output to 2D
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 20:08:35 +00:00
c1aa4af123 Shared expert: dedicated CuTeDSL runner with proper scale assembly
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 20:05:04 +00:00
b3451c74f8 Update README and CURRENT_BUG.md with current state
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 20:02:21 +00:00
e8b289e30d WIP: CuTeDSL shared expert kernel
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 19:27:50 +00:00
1836e5fdc7 Add shared experts to post-quant BF16 dequant fix
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 19:10:10 +00:00
82ac648563 Patch utils.py the standard way: copy modified file into Docker image