nvfp4-megamoe-kernel

biondizzle/nvfp4-megamoe-kernel

Fork 0

Commit Graph

Author	SHA1	Message	Date
biondizzle	ea5ee7c1f7	fix: remove prefix_filter from layer tensor loading	2026-05-16 02:15:55 +00:00
biondizzle	303b6a8993	cleanup: move useful tests to tests/, nuke stale debug tests Kept (moved to tests/): - test_uniform_fp4.py — proves GEMM math (72.0 = 1.5² × K) - test_b_layout.py — proves B matrix column layout - test_quick_rand.py — quick GEMM sanity check Removed (stale SF remap debug artifacts): - test_forward_map.py, test_gemm_sweep.py, test_m1_gemm.py - test_minimal_gemm.py, test_rand_gemm.py, test_sf_check.py - test_sf_remap.py, test_sf_signed.py, test_sf_layout_diag.cu	2026-05-16 02:14:37 +00:00
biondizzle	2114bd11be	test: add standalone layer 0 comparison test (no vLLM, no Docker) tests/layertest.py: - Loads layer 0 expert weights from both original (MXFP4) and NVFP4 checkpoints - Dequantizes both to BF16 for reference comparison - Runs MoE forward pass in pure BF16 (no kernel) - Runs same forward pass through our NVFP4 CUTLASS kernel - Compares cosine similarity: kernel vs BF16 reference tests/run_test.sh: - Creates venv, installs deps, builds kernel from source, runs test Isolates our kernel completely from vLLM's weight loading, tensor parallelism, and MoE routing. If cosine ≈ 1.0, bug is in vLLM. If cosine ≈ 0, bug is in our kernel pipeline.	2026-05-16 02:13:18 +00:00

Author

SHA1

Message

Date

biondizzle

ea5ee7c1f7

fix: remove prefix_filter from layer tensor loading

2026-05-16 02:15:55 +00:00

biondizzle

303b6a8993

cleanup: move useful tests to tests/, nuke stale debug tests

Kept (moved to tests/):
- test_uniform_fp4.py — proves GEMM math (72.0 = 1.5² × K)
- test_b_layout.py — proves B matrix column layout
- test_quick_rand.py — quick GEMM sanity check

Removed (stale SF remap debug artifacts):
- test_forward_map.py, test_gemm_sweep.py, test_m1_gemm.py
- test_minimal_gemm.py, test_rand_gemm.py, test_sf_check.py
- test_sf_remap.py, test_sf_signed.py, test_sf_layout_diag.cu

2026-05-16 02:14:37 +00:00

biondizzle

2114bd11be

test: add standalone layer 0 comparison test (no vLLM, no Docker)

tests/layertest.py:
- Loads layer 0 expert weights from both original (MXFP4) and NVFP4 checkpoints
- Dequantizes both to BF16 for reference comparison
- Runs MoE forward pass in pure BF16 (no kernel)
- Runs same forward pass through our NVFP4 CUTLASS kernel
- Compares cosine similarity: kernel vs BF16 reference

tests/run_test.sh:
- Creates venv, installs deps, builds kernel from source, runs test

Isolates our kernel completely from vLLM's weight loading, tensor
parallelism, and MoE routing. If cosine ≈ 1.0, bug is in vLLM. If
cosine ≈ 0, bug is in our kernel pipeline.

2026-05-16 02:13:18 +00:00

3 Commits