Files
nvfp4-megamoe-kernel/tests
biondizzle 3de75c4e37 Add CSA/HCA attention kernel (PyTorch SDPA, Blackwell-safe)
Replaces vLLM's broken FlashMLA sparse attention which doesn't work on
SM100 (Blackwell). Uses torch.nn.functional.scaled_dot_product_attention
which works on all GPUs.

Architecture:
- CSA (C128A): Batched sparse gather + SDPA on top-k positions
- HCA (C4A): Same with compressed KV + per-layer indexer
- SWA: Sliding window attention
- Full reference: standard SDPA for testing without compression

Also adds test_csa_attention_b200.py to verify the full attention path.
2026-05-19 07:58:10 +00:00
..
2026-05-17 22:58:27 +00:00
2026-05-17 07:37:47 +00:00