nvfp4-megamoe-kernel/tests at 29f836d711f0cfce345a4771e2ffff48662e54ae - nvfp4-megamoe-kernel - Gitea: Git with a cup of tea

biondizzle/nvfp4-megamoe-kernel

Files

History

biondizzle 794ebaf7e5 P4: Fused RMSNorm + NVFP4 quantize kernel (2 launches vs 6+)

- fused_rmsnorm_quantize.cu: two-kernel approach
  Kernel 1: rmsnorm_amax_gsa — compute RMS + amax of normalized output → gsa per row
  Kernel 2: rmsnorm_quantize_nvfp4 — normalize + quantize using GPU-computed gsa
- Python bridge: rmsnorm_quantize_nvfp4() in ops/quantize.py
- Python bridge: dequantize_nvfp4() in ops/quantize.py
- Unit test: test_fused_rmsnorm_quantize.py (production shapes: 7168 hidden)
- Eliminates ~488 kernel launches per token (122 sites × 4 launches saved)

2026-06-02 16:26:24 +00:00

..

E3: model construction test

2026-05-30 21:22:34 +00:00

Restructure: cutedsl/ -> dsv4/ with proper layering

2026-05-21 17:30:44 +00:00

P4: Fused RMSNorm + NVFP4 quantize kernel (2 launches vs 6+)

2026-06-02 16:26:24 +00:00

check_log.sh

Add check_log.sh convenience script

2026-05-22 17:07:23 +00:00

compare_hf_reference.py

Add HuggingFace reference comparison test

2026-05-31 12:05:19 +00:00

compare_layer0.py

Add HF reference test script

2026-05-31 20:11:37 +00:00

layer_compare.py

Fix remaining mHC API references: layer_compare.py, layer.py comment

2026-05-31 18:38:34 +00:00

production_values_test.py

Add production-value tests: ALL tests use Pro config (61L, HD=512, 384 experts, HCA=128, 1M context)

2026-06-02 04:10:39 +00:00

requirements.txt

test: add standalone layer 0 comparison test (no vLLM, no Docker)

2026-05-16 02:13:18 +00:00

run_test.sh

run_test.sh: SIGKILL all children of screen session on cleanup

2026-05-22 17:08:12 +00:00

test_minimal_e2e.py

Fix mHCBlock import + relax RoPE round-trip threshold (BF16 noise expected)

2026-05-31 09:17:07 +00:00

test_residual_diagnostic.py

Fix expert weight indexing for 1D tensor

2026-05-31 09:23:10 +00:00

validate_layer.py

Fix dtype mismatch in validate_layer: cast flat to float before F.linear

2026-05-31 20:23:18 +00:00

verify_attention.py

fix verify_attention: proper multi-head SDPA + GQA

2026-05-31 05:55:10 +00:00