biondizzle

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 03:18:44 +00:00

0359215ab4 fix: compare kernel vs BF16 in slot-major layout

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 03:17:20 +00:00

ed18638a3c fix: slot-major token layout for grouped GEMM

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 03:15:31 +00:00

5385de3142 fix: layertest tests L1 GEMM only with correct output size

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 03:13:56 +00:00

0cdcc4144a refactor: add cutedsl/bridge.py, rewrite layertest to use it

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 03:04:33 +00:00

2ef71dc21a fix: B tensor K-major strides, scale_b axis swap

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 03:03:02 +00:00

6294b84213 fix: B tensor must be K-major (transpose last 2 dims)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 02:58:56 +00:00

7c882fe2e0 fix: correct weight quantization for CuTeDSL kernel

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 02:57:55 +00:00

ca28f1335d refactor: copy CuTeDSL kernel into repo with local imports

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 02:55:27 +00:00

a3aa2d201e fix: clarify import path setup for CuTeDSL

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 02:55:07 +00:00

f951d284e7 test: add CuTeDSL NVFP4 GEMM test using reference ScaledGroupedGemmKernel

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 02:41:53 +00:00

a2ea836c74 docs: add CuTeDSL rewrite plan + reference files

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 02:29:44 +00:00

c4a262bd54 test: streamline layertest — kernel vs BF16 ref only, exit on fail

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 02:21:18 +00:00

de9b50cbe7 fix: use setup.py install for CUTLASS extension build

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 02:19:41 +00:00

882bff8fb7 fix: also build CUTLASS C++ extension in run_test.sh

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 02:18:54 +00:00

55d9a24bf6 fix: handle model. prefix normalization in checkpoint keys

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 02:17:15 +00:00

bdf9f31ae2 fix: checkpoint keys don't have 'model.' prefix

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 02:15:58 +00:00

ea5ee7c1f7 fix: remove prefix_filter from layer tensor loading

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 02:14:38 +00:00

303b6a8993 cleanup: move useful tests to tests/, nuke stale debug tests

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 02:13:23 +00:00

2114bd11be test: add standalone layer 0 comparison test (no vLLM, no Docker)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-16 01:55:57 +00:00

294e9f98f2 cleanup: rename _ue8m0_to_float32 → _block_scale_to_float32, remove dead code