nvfp4-megamoe-kernel

Files

biondizzle b9f15c250f Stage E: head-packed MQA/GQA, batch dim, custom_op, integration API

- production.py: head-packed M dimension for MQA/GQA (q_per_kv*T rows
  in single launch per KV group, eliminating redundant K/V TMA loads)
- production.py: batch dimension support (outer Python loop)
- production.py: warmup_attention_kernels() for pre-compilation
- production.py: dsv4_attention_per_head() for exact per-head sink bias
- __init__.py: sparse_fmha_with_swa, dense_fmha_with_swa, swa_only_fmha
  integration functions bridging AttentionSubBlock → production FMHA
- custom_ops.py: dsv4::sparse_fmha_with_swa custom_op registration
- test_production.py: comprehensive tests (MHA/MQA/GQA, head-packed vs
  per-head parity, multi-segment KV, SWA+causal+sink, batch, edge cases)

2026-05-27 15:15:03 +00:00

__init__.py

Restructure: cutedsl/ -> dsv4/ with proper layering

2026-05-21 17:30:44 +00:00

custom_ops.py

Stage E: head-packed MQA/GQA, batch dim, custom_op, integration API

2026-05-27 15:15:03 +00:00

decode_sparse.py

Restructure: cutedsl/ -> dsv4/ with proper layering

2026-05-21 17:30:44 +00:00

decode_swa.py

Restructure: cutedsl/ -> dsv4/ with proper layering