nvfp4-megamoe-kernel/tests at ba31a1cedf2d611f8bf311a9de3387cbf2853c5a - nvfp4-megamoe-kernel - Gitea: Git with a cup of tea

biondizzle/nvfp4-megamoe-kernel

Files

History

biondizzle ba31a1cedf Add per-tile O rescale (O *= acc_scale) to softmax loop

- Moves correction_rescale atom setup before softmax loop (needed for O rescale)
- Adds O *= acc_scale for kt > 0, before softmax_done_bar.arrive()
- Uses same paired Ld32x32bOp/St32x32bOp(corr_tile_size=16) atoms as final normalize
- Final normalize (O *= 1/row_sum) uses same atoms, no duplicate setup
- Fixes softmax loop to use self.n_kv_tiles (Python int) not n_kv_tiles (CuTeDSL symbolic)
- This should fix n=256 cos 0.71 → 0.9999

2026-05-23 00:22:12 +00:00

..

Clean up: archive diagnostics and superseded tests

2026-05-23 00:17:07 +00:00

Restructure: cutedsl/ -> dsv4/ with proper layering

2026-05-21 17:30:44 +00:00

Restructure: cutedsl/ -> dsv4/ with proper layering

2026-05-21 17:30:44 +00:00

Add per-tile O rescale (O *= acc_scale) to softmax loop

2026-05-23 00:22:12 +00:00

check_log.sh

Add check_log.sh convenience script

2026-05-22 17:07:23 +00:00

fmha_v3_stage_c_example9.py

auto: pre-test commit

2026-05-23 00:05:07 +00:00

fmha_v3_stage_c_example10.py

FIX: (None,0,None,0) for ALL tma_partition outputs — verified shapes on B200

2026-05-22 23:35:55 +00:00

requirements.txt

test: add standalone layer 0 comparison test (no vLLM, no Docker)

2026-05-16 02:13:18 +00:00

run_test.sh

run_test.sh: SIGKILL all children of screen session on cleanup

2026-05-22 17:08:12 +00:00

working_softmax_maybe.py

Clean up: archive diagnostics and superseded tests

2026-05-23 00:17:07 +00:00