nvfp4-megamoe-kernel

Files

biondizzle f74447bfd0 Proper NVFP4 integration: quantized compressor/indexer + mapper fixes

Weight mapper fixes:
- Reorder substr renames: compressor renames first, then .self_attn.compressor.
  → .attn.mla_attn.compressor., then indexer renames (so indexer keys end up
  under mla_attn after the compressor rename already fired)
- Add compressor param renames: kv_proj→wkv, gate_proj→wgate, kv_norm→norm,
  position_bias→ape (checkpoint uses NVFP4 naming, model uses internal names)
- Add indexer param renames: q_b_proj→wq_b, kv_proj→compressor.wkv,
  gate_proj→compressor.wgate, kv_norm→k_norm, position_bias→compressor.ape,
  weights_proj stays (structural: compressor.indexer → indexer.compressor)
- Remove broken suffix renames (already fixed in prior commit)

Model architecture fixes:
- Patch deepseek_compressor.py to pass quant_config (was None, but NVFP4
  checkpoint has quantized compressor weights with input_scale/weight_scale)
- Patch deepseek_v4_attention.py indexer: weights_proj now uses quant_config
  (was None, but checkpoint has quantized weights)
- Add indexer.compressor.fused_wkv_wgate stacking in load_weights

Infrastructure:
- Add deepseek_compressor.py to Dockerfile
- Force MoE backend to flashinfer_cutedsl (was auto-selecting FLASHINFER_TRTLLM)
- Update unit test to 50 cases (compressor + indexer + quantization scales)

2026-05-18 23:20:13 +00:00

cudagraph_test.py

fix: test L2 weight N dim should be hidden_size, not hidden_size//2

2026-05-16 19:07:36 +00:00

debug_output.py

Update CURRENT_BUG.md: current status, outstanding garbage output issue, hypotheses

2026-05-17 16:52:40 +00:00

layertest.py

restore: new bridge/moe_pipeline/layertest

2026-05-16 19:55:19 +00:00

requirements.txt

test: add standalone layer 0 comparison test (no vLLM, no Docker)