nvfp4-megamoe-kernel/tests at eb69c3bfb90a534da7d2651ace376927bb1122af - nvfp4-megamoe-kernel - Gitea: Git with a cup of tea

biondizzle/nvfp4-megamoe-kernel

Files

History

biondizzle a4ef6c3454 Add B1 mixed FP8 prefill FMHA kernel (T>1 support)

New files:
- fmha_mixed_fp8_prefill.cuh: kernel supporting T=1..128
  - Sub-batch processing (T_BATCH=32) to fit in 232KB SMEM
  - Multi-row QK TMEM read using tcgen05.ld.32x32b.x8
  - Per-row online softmax
  - Per-row PV MMA (correctness first; batched PV is TODO)
  - Attention sink support
- fmha_mixed_fp8_prefill_capi.cu: C API bridge
- fmha_mixed_fp8_prefill_op.py: Python ctypes loader
- test_b1_mixed_fp8_prefill.py: unit test (T=1..32, N=128..4096)

Also: fix production FMHA layer test (BF16 fallback for o_a_proj,
router gate BF16 quantize path, missing DEVICE constant)

2026-06-03 02:50:27 +00:00

..

Cleanup Step 2: Archive Lineage P code, fix broken imports

2026-06-02 19:27:07 +00:00

Cleanup Step 2: Archive Lineage P code, fix broken imports

2026-06-02 19:27:07 +00:00

Cleanup Step 1: Move root-level files to proper directories

2026-06-02 19:24:39 +00:00

Add B1 mixed FP8 prefill FMHA kernel (T>1 support)

2026-06-03 02:50:27 +00:00

check_log.sh

Add check_log.sh convenience script

2026-05-22 17:07:23 +00:00

compare_hf_reference.py

Add HuggingFace reference comparison test

2026-05-31 12:05:19 +00:00

compare_layer0.py

Add HF reference test script

2026-05-31 20:11:37 +00:00

layer_compare.py

Fix remaining mHC API references: layer_compare.py, layer.py comment

2026-05-31 18:38:34 +00:00

requirements.txt

test: add standalone layer 0 comparison test (no vLLM, no Docker)

2026-05-16 02:13:18 +00:00

run_test.sh

run_test.sh: SIGKILL all children of screen session on cleanup

2026-05-22 17:08:12 +00:00

test_minimal_e2e.py

Fix mHCBlock import + relax RoPE round-trip threshold (BF16 noise expected)

2026-05-31 09:17:07 +00:00

test_residual_diagnostic.py

Fix expert weight indexing for 1D tensor

2026-05-31 09:23:10 +00:00

validate_layer.py

Fix dtype mismatch in validate_layer: cast flat to float before F.linear

2026-05-31 20:23:18 +00:00

verify_attention.py

fix verify_attention: proper multi-head SDPA + GQA

2026-05-31 05:55:10 +00:00