biondizzle
  • Joined on 2025-12-10
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 20:22:11 +00:00
02b57071be Update README.md and CURRENT_BUG.md: eliminate stale issues, document NaN investigation, clarify our kernels are clean
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 18:36:51 +00:00
7070fadf72 Add full layer NaN test (attention + MoE, multi-layer chain)
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 18:35:42 +00:00
152b0749df Use 16 experts for MoE runner test (fits in memory)
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 18:34:57 +00:00
daa59a7c75 Add MoE runner NaN test (grouped GEMM with real weights)
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 18:34:14 +00:00
9308634e65 Fix intermediate size: 3072 not 18432
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 18:33:59 +00:00
2b91bb1b71 Rewrite MoE NaN test: per-expert format, activation quantization, grouped GEMM
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 18:32:52 +00:00
8904d409f8 Fix MoE weight key names, add fallback
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 18:32:16 +00:00
e45ceb2226 Add MoE NaN reproduction test, update CURRENT_BUG.md with NaN tracing and test plan
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 18:16:02 +00:00
22ec43e685 Add input NaN debug to trace where NaN starts
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 18:04:19 +00:00
b86d0d2dee Add prefill inputs NaN debug
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 17:55:36 +00:00
45a2d8851d Add prefill attention value debug check
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 17:37:28 +00:00
1589b79137 Use module-level Blackwell flag in compressor (works during torch.compile)
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 17:27:38 +00:00
658b12cb3d CRITICAL FIX: Remove double Q normalization and fix RoPE sin slice
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 17:26:52 +00:00
facc6509e7 Fix imports in vLLM codepaths test
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 17:26:43 +00:00
835e1a0590 Fix f-string syntax
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 17:26:11 +00:00
9c30168202 Add test for exact vLLM codepaths (fused_qnorm, kv_write, decode)
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 17:09:00 +00:00
8f80991fdf CRITICAL FIX: Properly dequantize fp8 KV in decode using per-token inv_scale
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 16:55:46 +00:00
d67d8613af FIX: Use vLLM's decode_swa_indices for correct paged KV cache access during decode
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 16:43:29 +00:00
3b204c4772 Fix UnboundLocalError: move num_decode_tokens before debug print
biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel 2026-05-19 16:35:08 +00:00
30890b621d CRITICAL FIX: Skip compressor fused attention kernel on Blackwell — it bypasses our attention path