biondizzle

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-18 22:38:43 +00:00

5d37674fb1 Add cutedsl to MoEBackend type in kernel config

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-18 22:33:53 +00:00

7409204d71 Use nightly's deepseek_v4.py + attention as base, add only NVFP4 mapper

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-18 22:29:26 +00:00

a19ed4a18e Remove breakable_cudagraph import (not in nightly)

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-18 22:22:55 +00:00

b007937a68 Fix garbled imports in cutedsl/runner.py

biondizzle created branch proper-nvfp4-integration in biondizzle/nvfp4-megamoe-kernel

2026-05-18 22:19:34 +00:00

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-18 22:19:34 +00:00

a7ed8faec6 Proper NVFP4 integration: use ModelOptNvFp4Config + FusedMoE framework

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 21:38:30 +00:00

48386e34ad Fix torch.compile: use custom autograd Function instead of @torch.compiler.disable

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 21:08:01 +00:00

85e1cd3b69 Fix torch.compile crash: @torch.compiler.disable on all CuTeDSL run()

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 20:54:56 +00:00

a94011ec92 Fix torch.compile crash: remove threading.Lock from LUT cache path

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 20:40:49 +00:00

6326222d68 Fix: add abstract create_weights to CuTeDSLNvfp4LinearMethod

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 20:27:43 +00:00

450793311c Wire CuTeDSL kernels into vLLM: replace all BF16 dequant with native NVFP4

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 20:14:04 +00:00

6ce6a47be9 Add NVFP4 linear runner + attention projection test

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 20:10:33 +00:00

f07643791e Fix hidden_size: shared expert uses 7168, not HC_DIM 28672

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 20:09:51 +00:00

70f50a1ec6 Fix scale assembly: use correctly-sized temp buffer for swizzle

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 20:09:20 +00:00

97bdd604e9 Fix scale assembly: reshape swizzled output to 2D

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 20:08:35 +00:00

c1aa4af123 Shared expert: dedicated CuTeDSL runner with proper scale assembly

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 20:05:04 +00:00

b3451c74f8 Update README and CURRENT_BUG.md with current state

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 20:02:21 +00:00

e8b289e30d WIP: CuTeDSL shared expert kernel

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 19:27:50 +00:00

1836e5fdc7 Add shared experts to post-quant BF16 dequant fix

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 19:10:10 +00:00

82ac648563 Patch utils.py the standard way: copy modified file into Docker image