nvfp4-megamoe-kernel/tests at 77baca668ed2a694e592b41c07fb955b2882e4d2 - nvfp4-megamoe-kernel - Gitea: Git with a cup of tea

biondizzle/nvfp4-megamoe-kernel

Files

History

biondizzle 77baca668e Patch attention forward: BF16 inv RoPE + BMM wo_a + NVFP4 wo_b

The original attention forward uses fused_inv_rope_fp8_quant +
deepseek_v4_fp8_einsum which requires wo_a to have FP8 weights
and weight_scale_inv. Our checkpoint has wo_a in BF16, so the
original path crashes (produces empty output).

Replace O projection with:
1. _apply_inv_rope_bf16: pure PyTorch inverse RoPE (no FP8)
2. BMM grouped linear for wo_a (BF16)
3. NVFP4 wo_b via CuTeDSL

Also fixes activation global scale bug from previous commit:
- input_global_scale_inv IS the activation gs, don't re-invert
- w13_input_scale_orig (after undoing convert) IS the MoE gs

Test: tests/test_o_projection.py validates inv RoPE roundtrip
and wo_a BMM correctness.

2026-05-19 06:30:18 +00:00

..

cudagraph_test.py

fix: test L2 weight N dim should be hidden_size, not hidden_size//2

2026-05-16 19:07:36 +00:00

debug_output.py

Update CURRENT_BUG.md: current status, outstanding garbage output issue, hypotheses

2026-05-17 16:52:40 +00:00

layertest.py

restore: new bridge/moe_pipeline/layertest

2026-05-16 19:55:19 +00:00

requirements.txt

test: add standalone layer 0 comparison test (no vLLM, no Docker)

2026-05-16 02:13:18 +00:00

run_test.sh

fix: use setup.py install for CUTLASS extension build

2026-05-16 02:21:17 +00:00

test_attention.py

Add NVFP4 linear runner + attention projection test

2026-05-18 20:14:03 +00:00

test_b_layout.py

cleanup: move useful tests to tests/, nuke stale debug tests

2026-05-16 02:14:37 +00:00

test_compile_custom_op.py

Fix compile test: add warmup for activation global scales

2026-05-19 01:57:16 +00:00

test_custom_op.py

Replace autograd.Function with torch.library.custom_op for Dynamo compat

2026-05-19 01:54:48 +00:00

test_cutedsl.py

fix: B tensor K-major strides, scale_b axis swap

2026-05-16 03:04:31 +00:00

test_inv_rope.py

Add unit tests for NVFP4 weight mapper and inverse RoPE BF16

2026-05-19 03:22:00 +00:00

test_multilayer.py

Add MoE scale ratio output

2026-05-17 22:58:27 +00:00

test_nvfp4_mapper.py

Fix hc_head mapping: checkpoint uses hc_head.hc_fn, model params are flat hc_head_fn

2026-05-19 03:58:25 +00:00

test_o_projection.py

Patch attention forward: BF16 inv RoPE + BMM wo_a + NVFP4 wo_b

2026-05-19 06:30:18 +00:00

test_pipeline_real_weights.py

Pipeline test: use max_num_tokens=8192 matching vLLM

2026-05-17 23:04:44 +00:00

test_quick_rand.py

cleanup: move useful tests to tests/, nuke stale debug tests

2026-05-16 02:14:37 +00:00

test_runner_vs_pipeline.py

test: runner vs pipeline comparison + scale assembly comparison

2026-05-17 07:33:20 +00:00

test_scale_assembly.py

fix: separate L1/L2 scale buffers (different K_sf), fix assembly calls

2026-05-17 07:43:05 +00:00

test_scale_debug.py

test: scale assembly debug

2026-05-17 07:37:47 +00:00

test_shared_expert.py

Fix hidden_size: shared expert uses 7168, not HC_DIM 28672

2026-05-18 20:10:32 +00:00

test_uniform_fp4.py

cleanup: move useful tests to tests/, nuke stale debug tests

2026-05-16 02:14:37 +00:00

test_warmup_gs.py

test: use runner's built-in warmup method

2026-05-17 08:24:27 +00:00

test_wo_a_bmm.py

Fix BF16 wo_a: per-group BMM instead of flat linear

2026-05-19 04:10:02 +00:00

test_wo_a.py

Fix test: cos_sin_cache on CUDA device

2026-05-19 02:37:50 +00:00