biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 03:18:44 +00:00
0359215ab4 fix: compare kernel vs BF16 in slot-major layout
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 03:17:20 +00:00
ed18638a3c fix: slot-major token layout for grouped GEMM
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 03:15:31 +00:00
5385de3142 fix: layertest tests L1 GEMM only with correct output size
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 03:13:56 +00:00
0cdcc4144a refactor: add cutedsl/bridge.py, rewrite layertest to use it
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 03:04:33 +00:00
2ef71dc21a fix: B tensor K-major strides, scale_b axis swap
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 03:03:02 +00:00
6294b84213 fix: B tensor must be K-major (transpose last 2 dims)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 02:58:56 +00:00
7c882fe2e0 fix: correct weight quantization for CuTeDSL kernel
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 02:57:55 +00:00
ca28f1335d refactor: copy CuTeDSL kernel into repo with local imports
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 02:55:27 +00:00
a3aa2d201e fix: clarify import path setup for CuTeDSL
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 02:55:07 +00:00
f951d284e7 test: add CuTeDSL NVFP4 GEMM test using reference ScaledGroupedGemmKernel
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 02:41:53 +00:00
a2ea836c74 docs: add CuTeDSL rewrite plan + reference files
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 02:29:44 +00:00
c4a262bd54 test: streamline layertest — kernel vs BF16 ref only, exit on fail
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 02:21:18 +00:00
de9b50cbe7 fix: use setup.py install for CUTLASS extension build
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 02:19:41 +00:00
882bff8fb7 fix: also build CUTLASS C++ extension in run_test.sh
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 02:18:54 +00:00
55d9a24bf6 fix: handle model. prefix normalization in checkpoint keys
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 02:17:15 +00:00
bdf9f31ae2 fix: checkpoint keys don't have 'model.' prefix
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 02:15:58 +00:00
ea5ee7c1f7 fix: remove prefix_filter from layer tensor loading
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 02:14:38 +00:00
303b6a8993 cleanup: move useful tests to tests/, nuke stale debug tests
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 02:13:23 +00:00
2114bd11be test: add standalone layer 0 comparison test (no vLLM, no Docker)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-16 01:55:57 +00:00
294e9f98f2 cleanup: rename _ue8m0_to_float32 → _block_scale_to_float32, remove dead code