nvfp4-megamoe-kernel/tests at 56769cdbf58cd672becaea3b225a4a3e3bc47ce6 - nvfp4-megamoe-kernel - Gitea: Git with a cup of tea

biondizzle/nvfp4-megamoe-kernel

Files

History

biondizzle f80f8eb38f Clean up debug prints, set kv_coord as Int32(0)

Key findings to relay to CUTLASS LLM:
- kv_coord=Int32(1) hardcode CHANGES the output (TMA CAN load from different tiles)
- kv_coord=Int32(0) + kv_coord += 1 does NOT increment at runtime
  (all multi-tile outputs identical to kv_coord=0)
- kv_coord=0 (plain Python int) also doesn't work
- Pipeline handle .count doesn't work either
- The TMA GMEM tile coordinate must be dynamic at kernel runtime,
  but CuTeDSL appears to constant-fold or not propagate the increment

2026-05-22 17:39:27 +00:00

..

Restructure: cutedsl/ -> dsv4/ with proper layering

2026-05-21 17:30:44 +00:00

Restructure: cutedsl/ -> dsv4/ with proper layering

2026-05-21 17:30:44 +00:00

Restructure: cutedsl/ -> dsv4/ with proper layering

2026-05-21 17:30:44 +00:00

Clean up debug prints, set kv_coord as Int32(0)

2026-05-22 17:39:27 +00:00

check_log.sh

Add check_log.sh convenience script

2026-05-22 17:07:23 +00:00

fmha_v3_stage_c_example1.py

restore tBgK to kh.count indexing (single-tile working), add TODO for multi-tile

2026-05-22 15:54:03 +00:00

fmha_v3_stage_c_example2.py

FMHA Stage-C multi-tile: combined K+V barrier, final_o_bar, acc_pipe producer

2026-05-22 16:23:36 +00:00

fmha_v3_stage_c_example3.py

Stage C: integrate example3 multi-tile fixes into unit test

2026-05-22 16:39:45 +00:00

fmha_v3_stage_c_example4.py

Add example4: manual kv_coord Int32 for GMEM tile indexing

2026-05-22 17:24:38 +00:00

native_stage_c_patch.py

more stuff

2026-05-22 08:57:38 +00:00

requirements.txt

test: add standalone layer 0 comparison test (no vLLM, no Docker)

2026-05-16 02:13:18 +00:00

run_test.sh

run_test.sh: SIGKILL all children of screen session on cleanup

2026-05-22 17:08:12 +00:00