biondizzle
  • Joined on 2025-12-10
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:35:46 +00:00
2672e98e4c Remove VLLM_NVFP4_GEMM_BACKEND env var - CuTeDSL auto-selects on Blackwell
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:26:18 +00:00
914d27fee7 Update README + CURRENT_BUG: full CuTeDSL NVFP4 plan, no more PyTorch fallbacks
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:21:59 +00:00
7d5c093c99 Fix KV cache crash: skip SWA cache write on Blackwell
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:19:26 +00:00
e1a642452a Fix Blackwell: skip FlashMLA assertion + force CuTeDSL kernel
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:12:01 +00:00
2856323360 Fix torch.compile crash: move Blackwell path inside custom op boundary
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:04:09 +00:00
a782ac00ce Integrate CSA/SDPA attention into vLLM for Blackwell
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:02:02 +00:00
81931614e9 Update CURRENT_BUG: CSA kernel works, plan vLLM integration
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:01:33 +00:00
9d067add90 Fix device reference in full_attention_reference
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:01:10 +00:00
3e3e998578 Fix attention: manual causal mask for batched single-query
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:00:41 +00:00
1e675ccc9a Fix causal mask shape for SDPA: (1,1,T,T) broadcast
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 08:00:09 +00:00
57615029a4 Fix KV expand for SDPA: (T,HD) → (T*NH, T, HD)
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:59:30 +00:00
dd3a12bbda Fix full_attention_reference: broadcast KV to all heads+positions
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:58:43 +00:00
910015c47e Fix kv shape: expand to (T, NH, HD) before reshape
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:58:12 +00:00
3de75c4e37 Add CSA/HCA attention kernel (PyTorch SDPA, Blackwell-safe)
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:54:03 +00:00
65f48be38c Add attention path test: pinpoint FlashMLA failure
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:51:12 +00:00
90d1098935 Update CURRENT_BUG: warmup gs is irrelevant, bug is in vLLM pipeline
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:49:42 +00:00
04ad6409e5 Rewrite test: diagnose whether warmup gs matters at inference time
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:48:11 +00:00
496848e158 Fix ffn_hc.scale key name
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:47:50 +00:00
5a4e355d3a Add model forward test: reproduce vLLM empty output outside container
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 07:35:45 +00:00
f5ce728ef2 Fix OOM: add --max-model-len=876544 + revert CPU dummy weight