biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 11:13:34 +00:00
8758bc93ca crap shoot
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-18 02:20:17 +00:00
b8df4a8cc5 Fix NaN check: use os.environ gate instead of is_current_stream_capturing
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 23:37:13 +00:00
0c02d84514 Add NaN/Inf detection in DeepseekV4Model.forward layer loop
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 23:04:45 +00:00
bedcfc4dab Pipeline test: use max_num_tokens=8192 matching vLLM
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 22:58:28 +00:00
c45364b3a8 Add MoE scale ratio output
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 22:56:57 +00:00
bf99ad49ec Print both MoE and residual cosine
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 22:55:42 +00:00
8637020487 Fix multi-layer test: add residual connections
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 22:53:30 +00:00
11dce13afe Add multi-layer pipeline test to check error accumulation
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 22:28:35 +00:00
87582fc9f7 HOTFIX: remove NaN checks from run() — torch.isnan().any() does CPU-GPU sync, breaks cudagraph
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 22:03:49 +00:00
8717e0e411 Fix warmup: use same padded GEMM path as run(), add swiglu_limit clamping
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 22:02:26 +00:00
d332f4f900 Add NaN debug checks after L1 and L2 GEMM
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 21:36:26 +00:00
e65f2b2ba2 Update CURRENT_BUG.md with Bug 26 fix
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 21:29:17 +00:00
72628fb689 Full pipeline test: runner vs BF16 reference
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 21:28:05 +00:00
2796bd81e8 Fix: scatter FP4 as uint8 (float4 doesn't support index_put)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 21:26:48 +00:00
364f8372bb Fix FP4 buffer shapes: D//2 for packed dimensions
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 21:25:59 +00:00
5e4d674736 Test fix: quantize slot_hidden, scatter FP4, pass slot_x_sf
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 21:25:06 +00:00
803e7160d8 Fix: allocate FP4 buffers as uint8 then view-cast
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 21:24:45 +00:00
7256070dd3 FIX Bug 26: quantize slot tokens, not padded buffer
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 21:22:52 +00:00
4d0b6d889d Set runner weights before _ensure_stacked
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 21:22:32 +00:00
b7acac5e4e Call _ensure_stacked() before using runner buffers