nvfp4-megamoe-kernel

Author	SHA1	Message	Date
biondizzle	4b9eed02e1	Cleanup C1-C7: delete dead CuTeDSL FMHA, test probes, scratch files - Deleted fmha.py (CuTeDSL slow path), FmhaKernel, Python KV merge - Deleted fmha_sm100.cuh, fmha_sm100_tc.cuh, fmha_sm100_launch.cu, fmha_epilogue_sm100.cuh - Moved fmha_qk_verify.cuh to tests/unit/qk_verify_kernel.cuh - Deleted decode_sparse.py, decode_swa.py, kernels/decode/ - Deleted 46 test_d.py probes, test_smem_, test_cotiled_, test_tmem_, test_smem_p_, test_ultra_minimal, test_fmha_pv16, test_working_softmax_maybe - Deleted root scratch: debug_linear.py, test_mapping.py, run_router_tests.py - Moved archive/ to archived_plans/code_archive/ - Rewrote production.py: single fast path via 6-warp multi-tile kernel - Added STATUS.md, audit_attention_live.md - Moved NEXT_PRIORITIES.md to archived_plans/	2026-05-30 21:08:12 +00:00
biondizzle	b9f15c250f	Stage E: head-packed MQA/GQA, batch dim, custom_op, integration API - production.py: head-packed M dimension for MQA/GQA (q_per_kv*T rows in single launch per KV group, eliminating redundant K/V TMA loads) - production.py: batch dimension support (outer Python loop) - production.py: warmup_attention_kernels() for pre-compilation - production.py: dsv4_attention_per_head() for exact per-head sink bias - __init__.py: sparse_fmha_with_swa, dense_fmha_with_swa, swa_only_fmha integration functions bridging AttentionSubBlock → production FMHA - custom_ops.py: dsv4::sparse_fmha_with_swa custom_op registration - test_production.py: comprehensive tests (MHA/MQA/GQA, head-packed vs per-head parity, multi-segment KV, SWA+causal+sink, batch, edge cases)	2026-05-27 15:15:03 +00:00
biondizzle	2412a5431b	MQA/GQA: batch Q heads into kernel batch dim, shared K/V per KV group	2026-05-27 08:31:23 +00:00
biondizzle	06a895ff99	Clean test suite for production attention (1/2/4 segments, multi-head)	2026-05-27 07:12:02 +00:00
biondizzle	3a25c7feff	Test multi-KV merge (2 segments) separately from multi-head	2026-05-27 06:54:16 +00:00
biondizzle	e45b94c01b	Test: compare both normalized and un-normalized reference	2026-05-27 06:44:37 +00:00
biondizzle	98c93c1cd8	Stage E: production attention wrapper + Python KV merge, clean fmha_smem_acc	2026-05-27 06:34:10 +00:00

7 Commits