biondizzle
bbba289bd8
feat: GPU-native SWA + sparse decode attention kernels (CuTeDSL)
- native_swa_decode.py: BlackwellSWADecodeKernel
- CTA mapping: 1 CTA per (decode_token, q_head_group)
- Online softmax with KV tile streaming (16 tokens/tile)
- Pre-dequantized bf16 KV (fp8 dequant on host - MLIR cvt_fpext
requires 32-bit aligned vector, no scalar fp8->bf16 support)
- Cosine 0.9999+ vs PyTorch batched SDPA reference
- Fallback _fallback_batched_sdp when CuTeDSL unavailable
- native_sparse_decode.py: BlackwellSparseDecodeKernel
- Combined SWA + compressed KV in single attention pass
- Supports CSA (cr=4) and HCA (cr=128) layers
- Sink weight merge on host side
- Cosine 0.9999+ vs combined SDPA reference
- fp8_bf16.py: Documents MLIR limitation (cvt_fpext requires
vector<4xf8>, no scalar support). Pre-dequant is the workaround.
- vLLM wiring (attention.py):
- SWA-only layers: native_swa_decode_attention
- CSA/HCA layers: native_sparse_decode_attention with topk + attn_sink
- csa_attention.py updated to use native kernels
- Tests: test_decode_pipeline.py, test_sparse_decode.py both passing
2026-05-20 05:46:15 +00:00
..
2026-05-16 19:07:36 +00:00
2026-05-17 16:52:40 +00:00
2026-05-20 04:13:52 +00:00
2026-05-16 02:13:18 +00:00
2026-05-16 02:21:17 +00:00
2026-05-19 07:54:01 +00:00
2026-05-18 20:14:03 +00:00
2026-05-16 02:14:37 +00:00
2026-05-19 15:30:29 +00:00
2026-05-19 01:57:16 +00:00
2026-05-19 07:58:10 +00:00
2026-05-19 16:04:53 +00:00
2026-05-19 01:54:48 +00:00
2026-05-16 03:04:31 +00:00
2026-05-19 15:28:52 +00:00
2026-05-20 05:46:15 +00:00
2026-05-19 16:00:33 +00:00
2026-05-19 15:55:41 +00:00
2026-05-20 02:11:40 +00:00
2026-05-19 07:17:37 +00:00
2026-05-19 18:36:49 +00:00
2026-05-19 09:04:19 +00:00
2026-05-20 03:10:56 +00:00
2026-05-20 03:04:38 +00:00
2026-05-20 03:04:38 +00:00
2026-05-19 03:22:00 +00:00
2026-05-19 08:58:46 +00:00
2026-05-19 07:49:41 +00:00
2026-05-19 18:34:12 +00:00
2026-05-19 18:35:40 +00:00
2026-05-17 22:58:27 +00:00
2026-05-19 08:38:55 +00:00
2026-05-19 08:55:31 +00:00
2026-05-19 03:58:25 +00:00
2026-05-19 06:37:25 +00:00
2026-05-19 06:30:18 +00:00
2026-05-17 23:04:44 +00:00
2026-05-16 02:14:37 +00:00
2026-05-19 10:31:07 +00:00
2026-05-17 07:33:20 +00:00
2026-05-17 07:43:05 +00:00
2026-05-17 07:37:47 +00:00
2026-05-18 20:10:32 +00:00
2026-05-20 03:10:56 +00:00
2026-05-19 09:02:12 +00:00
2026-05-20 05:46:15 +00:00
2026-05-20 03:26:20 +00:00
2026-05-20 03:26:20 +00:00
2026-05-16 02:14:37 +00:00
2026-05-19 08:51:16 +00:00
2026-05-19 17:26:50 +00:00
2026-05-17 08:24:27 +00:00
2026-05-19 04:10:02 +00:00
2026-05-19 02:37:50 +00:00