Commit Graph

6 Commits

Author SHA1 Message Date
a950f978d3 run_test.sh: SIGKILL all children of screen session on cleanup
Deadlocked GPU processes ignore SIGHUP from screen -X quit.
Now kills the entire process group with SIGKILL, plus a catch-all
pkill for any python test_ processes.
2026-05-22 17:08:12 +00:00
b1a37bd2dd Fix quoting in run_test.sh 2026-05-22 17:06:00 +00:00
6594e31db5 Add run_test.sh harness (screen + log) 2026-05-22 17:05:43 +00:00
de9b50cbe7 fix: use setup.py install for CUTLASS extension build 2026-05-16 02:21:17 +00:00
882bff8fb7 fix: also build CUTLASS C++ extension in run_test.sh 2026-05-16 02:19:40 +00:00
2114bd11be test: add standalone layer 0 comparison test (no vLLM, no Docker)
tests/layertest.py:
- Loads layer 0 expert weights from both original (MXFP4) and NVFP4 checkpoints
- Dequantizes both to BF16 for reference comparison
- Runs MoE forward pass in pure BF16 (no kernel)
- Runs same forward pass through our NVFP4 CUTLASS kernel
- Compares cosine similarity: kernel vs BF16 reference

tests/run_test.sh:
- Creates venv, installs deps, builds kernel from source, runs test

Isolates our kernel completely from vLLM's weight loading, tensor
parallelism, and MoE routing. If cosine ≈ 1.0, bug is in vLLM. If
cosine ≈ 0, bug is in our kernel pipeline.
2026-05-16 02:13:18 +00:00