biondizzle

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 20:22:11 +00:00

02b57071be Update README.md and CURRENT_BUG.md: eliminate stale issues, document NaN investigation, clarify our kernels are clean

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 18:36:51 +00:00

7070fadf72 Add full layer NaN test (attention + MoE, multi-layer chain)

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 18:35:42 +00:00

152b0749df Use 16 experts for MoE runner test (fits in memory)

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 18:34:57 +00:00

daa59a7c75 Add MoE runner NaN test (grouped GEMM with real weights)

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 18:34:14 +00:00

9308634e65 Fix intermediate size: 3072 not 18432

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 18:33:59 +00:00

2b91bb1b71 Rewrite MoE NaN test: per-expert format, activation quantization, grouped GEMM

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 18:32:52 +00:00

8904d409f8 Fix MoE weight key names, add fallback

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 18:32:16 +00:00

e45ceb2226 Add MoE NaN reproduction test, update CURRENT_BUG.md with NaN tracing and test plan

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 18:16:02 +00:00

22ec43e685 Add input NaN debug to trace where NaN starts

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 18:04:19 +00:00

b86d0d2dee Add prefill inputs NaN debug

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 17:55:36 +00:00

45a2d8851d Add prefill attention value debug check

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 17:37:28 +00:00

1589b79137 Use module-level Blackwell flag in compressor (works during torch.compile)

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 17:27:38 +00:00

658b12cb3d CRITICAL FIX: Remove double Q normalization and fix RoPE sin slice

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 17:26:52 +00:00

facc6509e7 Fix imports in vLLM codepaths test

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 17:26:43 +00:00

835e1a0590 Fix f-string syntax

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 17:26:11 +00:00

9c30168202 Add test for exact vLLM codepaths (fused_qnorm, kv_write, decode)

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 17:09:00 +00:00

8f80991fdf CRITICAL FIX: Properly dequantize fp8 KV in decode using per-token inv_scale

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 16:55:46 +00:00

d67d8613af FIX: Use vLLM's decode_swa_indices for correct paged KV cache access during decode

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 16:43:29 +00:00

3b204c4772 Fix UnboundLocalError: move num_decode_tokens before debug print

biondizzle pushed to proper-nvfp4-integration at biondizzle/nvfp4-megamoe-kernel

2026-05-19 16:35:08 +00:00

30890b621d CRITICAL FIX: Skip compressor fused attention kernel on Blackwell — it bypasses our attention path