nvfp4-megamoe-kernel

Files

biondizzle 7b3f6cb13c Fix fused router: use run_nvfp4_fused_router wrapper, correct CuTe tensor API

- kernel wrapper converts torch tensors to CuTe tensors with mark_layout_dynamic
- test uses the wrapper instead of calling kernel.run() directly
- mat_b/scale_b are now torch tensors (converted inside wrapper)

2026-06-01 09:19:48 +00:00

e2e

E3: model construction test

2026-05-30 21:22:34 +00:00

integration

Restructure: cutedsl/ -> dsv4/ with proper layering

2026-05-21 17:30:44 +00:00

unit

Fix fused router: use run_nvfp4_fused_router wrapper, correct CuTe tensor API

2026-06-01 09:19:48 +00:00

check_log.sh

Add check_log.sh convenience script

2026-05-22 17:07:23 +00:00

compare_hf_reference.py

Add HuggingFace reference comparison test

2026-05-31 12:05:19 +00:00

compare_layer0.py

Add HF reference test script

2026-05-31 20:11:37 +00:00

layer_compare.py

Fix remaining mHC API references: layer_compare.py, layer.py comment

2026-05-31 18:38:34 +00:00

requirements.txt

test: add standalone layer 0 comparison test (no vLLM, no Docker)

2026-05-16 02:13:18 +00:00

run_test.sh

run_test.sh: SIGKILL all children of screen session on cleanup

2026-05-22 17:08:12 +00:00

test_minimal_e2e.py

Fix mHCBlock import + relax RoPE round-trip threshold (BF16 noise expected)

2026-05-31 09:17:07 +00:00

test_residual_diagnostic.py

Fix expert weight indexing for 1D tensor

2026-05-31 09:23:10 +00:00

validate_layer.py

Fix dtype mismatch in validate_layer: cast flat to float before F.linear

2026-05-31 20:23:18 +00:00

verify_attention.py

fix verify_attention: proper multi-head SDPA + GQA

2026-05-31 05:55:10 +00:00