biondizzle

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 11:13:34 +00:00

8758bc93ca crap shoot

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-18 02:20:17 +00:00

b8df4a8cc5 Fix NaN check: use os.environ gate instead of is_current_stream_capturing

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 23:37:13 +00:00

0c02d84514 Add NaN/Inf detection in DeepseekV4Model.forward layer loop

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 23:04:45 +00:00

bedcfc4dab Pipeline test: use max_num_tokens=8192 matching vLLM

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 22:58:28 +00:00

c45364b3a8 Add MoE scale ratio output

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 22:56:57 +00:00

bf99ad49ec Print both MoE and residual cosine

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 22:55:42 +00:00

8637020487 Fix multi-layer test: add residual connections

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 22:53:30 +00:00

11dce13afe Add multi-layer pipeline test to check error accumulation

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 22:28:35 +00:00

87582fc9f7 HOTFIX: remove NaN checks from run() — torch.isnan().any() does CPU-GPU sync, breaks cudagraph

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 22:03:49 +00:00

8717e0e411 Fix warmup: use same padded GEMM path as run(), add swiglu_limit clamping

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 22:02:26 +00:00

d332f4f900 Add NaN debug checks after L1 and L2 GEMM

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 21:36:26 +00:00

e65f2b2ba2 Update CURRENT_BUG.md with Bug 26 fix

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 21:29:17 +00:00

72628fb689 Full pipeline test: runner vs BF16 reference

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 21:28:05 +00:00

2796bd81e8 Fix: scatter FP4 as uint8 (float4 doesn't support index_put)

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 21:26:48 +00:00

364f8372bb Fix FP4 buffer shapes: D//2 for packed dimensions

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 21:25:59 +00:00

5e4d674736 Test fix: quantize slot_hidden, scatter FP4, pass slot_x_sf

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 21:25:06 +00:00

803e7160d8 Fix: allocate FP4 buffers as uint8 then view-cast

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 21:24:45 +00:00

7256070dd3 FIX Bug 26: quantize slot tokens, not padded buffer

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 21:22:52 +00:00

4d0b6d889d Set runner weights before _ensure_stacked

biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel

2026-05-17 21:22:32 +00:00

b7acac5e4e Call _ensure_stacked() before using runner buffers