tests/layertest.py: - Loads layer 0 expert weights from both original (MXFP4) and NVFP4 checkpoints - Dequantizes both to BF16 for reference comparison - Runs MoE forward pass in pure BF16 (no kernel) - Runs same forward pass through our NVFP4 CUTLASS kernel - Compares cosine similarity: kernel vs BF16 reference tests/run_test.sh: - Creates venv, installs deps, builds kernel from source, runs test Isolates our kernel completely from vLLM's weight loading, tensor parallelism, and MoE routing. If cosine ≈ 1.0, bug is in vLLM. If cosine ≈ 0, bug is in our kernel pipeline.
3 lines
18 B
Plaintext
3 lines
18 B
Plaintext
torch
|
|
safetensors
|