Files
nvfp4-megamoe-kernel/vllm
biondizzle 7d5c093c99 Fix KV cache crash: skip SWA cache write on Blackwell
The SWA KV cache uses fp8_ds_mla packed layout (37376 bytes per slot,
not 512). Our naive FP8 quant + write had a shape mismatch.

Fix: skip the SWA cache write entirely. The compressor (Triton)
handles the compressed cache. For full SDPA attention, we use the
raw kv tensor directly — we don't need the paged cache at all
during prefill.
2026-05-19 08:21:57 +00:00
..