|
|
cc48a5715e
|
Add full layer 0 B200 test: CuTeDSL vs BF16 reference
Tests each attention/FFN projection individually against BF16 dequantized
reference, then runs full layer forward. Identifies exactly where garbage
enters the pipeline.
Key finding: checkpoint uses different names than vLLM:
- q_a_proj, q_b_proj, kv_proj (not fused_wqa_wkv)
- q_a_norm (not q_norm)
- compressor.* (C4A layers only)
- sinks (attn_sink)
|
2026-05-19 07:14:58 +00:00 |
|