nvfp4-megamoe-kernel

Files

biondizzle 4826fa6afb D2: add num_query_heads/batch_size params + head-packed test

- FmhaKernel.__init__: add num_query_heads=1, batch_size=1
- Grid: (ceil_div(n_h*T, 128), 1, batch) for multi-CTA
- Test: head-packed multi-head (Q reshaped to (n_h*T, hd))
- n_h=1 regression, n_h=128 Pro decode, n_h=64 Flash, hd=128

2026-05-25 16:50:49 +00:00

e2e

Restructure: cutedsl/ -> dsv4/ with proper layering

2026-05-21 17:30:44 +00:00

integration

Restructure: cutedsl/ -> dsv4/ with proper layering

2026-05-21 17:30:44 +00:00

unit

D2: add num_query_heads/batch_size params + head-packed test

2026-05-25 16:50:49 +00:00

check_log.sh

Add check_log.sh convenience script

2026-05-22 17:07:23 +00:00

requirements.txt

test: add standalone layer 0 comparison test (no vLLM, no Docker)

2026-05-16 02:13:18 +00:00

run_test.sh

run_test.sh: SIGKILL all children of screen session on cleanup

2026-05-22 17:08:12 +00:00

working_softmax_maybe.py

Clean up: archive diagnostics and superseded tests

2026-05-23 00:17:07 +00:00