biondizzle

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:35:46 +00:00

2672e98e4c Remove VLLM_NVFP4_GEMM_BACKEND env var - CuTeDSL auto-selects on Blackwell

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:26:18 +00:00

914d27fee7 Update README + CURRENT_BUG: full CuTeDSL NVFP4 plan, no more PyTorch fallbacks

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:21:59 +00:00

7d5c093c99 Fix KV cache crash: skip SWA cache write on Blackwell

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:19:26 +00:00

e1a642452a Fix Blackwell: skip FlashMLA assertion + force CuTeDSL kernel

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:12:01 +00:00

2856323360 Fix torch.compile crash: move Blackwell path inside custom op boundary

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:04:09 +00:00

a782ac00ce Integrate CSA/SDPA attention into vLLM for Blackwell

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:02:02 +00:00

81931614e9 Update CURRENT_BUG: CSA kernel works, plan vLLM integration

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:01:33 +00:00

9d067add90 Fix device reference in full_attention_reference

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:01:10 +00:00

3e3e998578 Fix attention: manual causal mask for batched single-query

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:00:41 +00:00

1e675ccc9a Fix causal mask shape for SDPA: (1,1,T,T) broadcast

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 08:00:09 +00:00

57615029a4 Fix KV expand for SDPA: (T,HD) → (T*NH, T, HD)

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:59:30 +00:00

dd3a12bbda Fix full_attention_reference: broadcast KV to all heads+positions

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:58:43 +00:00

910015c47e Fix kv shape: expand to (T, NH, HD) before reshape

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:58:12 +00:00

3de75c4e37 Add CSA/HCA attention kernel (PyTorch SDPA, Blackwell-safe)

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:54:03 +00:00

65f48be38c Add attention path test: pinpoint FlashMLA failure

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:51:12 +00:00

90d1098935 Update CURRENT_BUG: warmup gs is irrelevant, bug is in vLLM pipeline

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:49:42 +00:00

04ad6409e5 Rewrite test: diagnose whether warmup gs matters at inference time

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:48:11 +00:00

496848e158 Fix ffn_hc.scale key name

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:47:50 +00:00

5a4e355d3a Add model forward test: reproduce vLLM empty output outside container

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 07:35:45 +00:00

f5ce728ef2 Fix OOM: add --max-model-len=876544 + revert CPU dummy weight