nvfp4-megamoe-kernel/tests at 4882d8553cc028a8d4b1a90877ec388ed58beae0 - nvfp4-megamoe-kernel - Gitea: Git with a cup of tea

biondizzle/nvfp4-megamoe-kernel

Files

History

biondizzle 1857bdedc3 chore: deprecate prepare_weights_from_dequantized and prepare_weights_direct

Verified that our NVFP4 packing convention (odd<<4|even, round-half-to-even)
matches the DeepSeek-V4 checkpoint exactly: 100% byte-identical round-trip
across all tested experts. The dequantize->requantize path is lossless in
practice but wasteful. Marked both prepare_weights_from_dequantized and
prepare_weights_direct as deprecated in favor of prepare_weights_from_stacked
which loads checkpoint FP4 bytes directly via .view().

Also added test_fp4_roundtrip.py for future reference.

2026-05-20 02:11:40 +00:00

..

cudagraph_test.py

fix: test L2 weight N dim should be hidden_size, not hidden_size//2

2026-05-16 19:07:36 +00:00

debug_output.py

Update CURRENT_BUG.md: current status, outstanding garbage output issue, hypotheses

2026-05-17 16:52:40 +00:00

layertest.py

restore: new bridge/moe_pipeline/layertest

2026-05-16 19:55:19 +00:00

requirements.txt

test: add standalone layer 0 comparison test (no vLLM, no Docker)

2026-05-16 02:13:18 +00:00

run_test.sh

fix: use setup.py install for CUTLASS extension build

2026-05-16 02:21:17 +00:00

test_attention_path_b200.py

Add attention path test: pinpoint FlashMLA failure

2026-05-19 07:54:01 +00:00

test_attention.py

Add NVFP4 linear runner + attention projection test

2026-05-18 20:14:03 +00:00

test_b_layout.py

cleanup: move useful tests to tests/, nuke stale debug tests

2026-05-16 02:14:37 +00:00

test_blackwell_attn_b200.py

Add blackwell_attention module and comprehensive test

2026-05-19 15:30:29 +00:00

test_compile_custom_op.py

Fix compile test: add warmup for activation global scales

2026-05-19 01:57:16 +00:00

test_csa_attention_b200.py

Add CSA/HCA attention kernel (PyTorch SDPA, Blackwell-safe)

2026-05-19 07:58:10 +00:00

test_csa_sparse_attn_b200.py

Fix N for C128A (need 128 tokens)

2026-05-19 16:04:53 +00:00

test_custom_op.py

Replace autograd.Function with torch.library.custom_op for Dynamo compat

2026-05-19 01:54:48 +00:00

test_cutedsl.py

fix: B tensor K-major strides, scale_b axis swap

2026-05-16 03:04:31 +00:00

test_decode_attention_b200.py

Fix attention for decode (1 query vs N cached KVs)

2026-05-19 15:28:52 +00:00

test_decode_vs_prefill_b200.py

Add decode vs prefill consistency test

2026-05-19 16:00:33 +00:00

test_e2e_decode_b200.py

Test with all 61 layers (shared experts only)

2026-05-19 15:55:41 +00:00

test_fp4_roundtrip.py

chore: deprecate prepare_weights_from_dequantized and prepare_weights_direct

2026-05-20 02:11:40 +00:00

test_full_layer_b200.py

Fix checkpoint keys: attn_hc.*, compressor.*, q_a_proj/q_b_proj/kv_proj

2026-05-19 07:17:37 +00:00

test_full_layer_nan_b200.py

Add full layer NaN test (attention + MoE, multi-layer chain)

2026-05-19 18:36:49 +00:00

test_full_model_b200.py

Add full model forward test (WIP), sparse attention test passes

2026-05-19 09:04:19 +00:00

test_inv_rope.py

Add unit tests for NVFP4 weight mapper and inverse RoPE BF16

2026-05-19 03:22:00 +00:00

test_kv_cache_b200.py

Fix kv_ref transpose in KV cache test

2026-05-19 08:58:46 +00:00

test_model_forward_b200.py

Rewrite test: diagnose whether warmup gs matters at inference time

2026-05-19 07:49:41 +00:00

test_moe_nan_b200.py

Fix intermediate size: 3072 not 18432

2026-05-19 18:34:12 +00:00

test_moe_runner_nan_b200.py

Use 16 experts for MoE runner test (fits in memory)

2026-05-19 18:35:40 +00:00

test_multilayer.py

Add MoE scale ratio output

2026-05-17 22:58:27 +00:00

test_nvfp4_attention_b200.py

Fix cos_sin cache shape in NVFP4 attention test

2026-05-19 08:38:55 +00:00

test_nvfp4_attn_gemm_b200.py

Fix NVFP4 attention: slice output to actual N after 128-padding

2026-05-19 08:55:31 +00:00

test_nvfp4_mapper.py

Fix hc_head mapping: checkpoint uses hc_head.hc_fn, model params are flat hc_head_fn

2026-05-19 03:58:25 +00:00

test_o_projection_b200.py

Fix dims: o_groups=16, o_lora_rank=1024 from config

2026-05-19 06:37:25 +00:00

test_o_projection.py

Patch attention forward: BF16 inv RoPE + BMM wo_a + NVFP4 wo_b

2026-05-19 06:30:18 +00:00

test_pipeline_real_weights.py

Pipeline test: use max_num_tokens=8192 matching vLLM

2026-05-17 23:04:44 +00:00

test_quick_rand.py

cleanup: move useful tests to tests/, nuke stale debug tests

2026-05-16 02:14:37 +00:00

test_rope_kv_b200.py

Fix syntax in RoPE KV test

2026-05-19 10:31:07 +00:00

test_runner_vs_pipeline.py

test: runner vs pipeline comparison + scale assembly comparison

2026-05-17 07:33:20 +00:00

test_scale_assembly.py

fix: separate L1/L2 scale buffers (different K_sf), fix assembly calls

2026-05-17 07:43:05 +00:00

test_scale_debug.py

test: scale assembly debug

2026-05-17 07:37:47 +00:00

test_shared_expert.py

Fix hidden_size: shared expert uses 7168, not HC_DIM 28672

2026-05-18 20:10:32 +00:00

test_sparse_attn_b200.py

Add CSA/HCA sparse attention kernel test

2026-05-19 09:02:12 +00:00

test_uniform_fp4.py

cleanup: move useful tests to tests/, nuke stale debug tests

2026-05-16 02:14:37 +00:00

test_v4_attention_b200.py

Add DeepSeek-V4 CSA/HCA attention pipeline test (not MLA)

2026-05-19 08:51:16 +00:00

test_vllm_codepaths_b200.py

Fix imports in vLLM codepaths test

2026-05-19 17:26:50 +00:00

test_warmup_gs.py

test: use runner's built-in warmup method

2026-05-17 08:24:27 +00:00

test_wo_a_bmm.py

Fix BF16 wo_a: per-group BMM instead of flat linear

2026-05-19 04:10:02 +00:00

test_wo_a.py

Fix test: cos_sin_cache on CUDA device

2026-05-19 02:37:50 +00:00