biondizzle
  • Joined on 2025-12-10
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 08:22:53 +00:00
7073daaffa fix: allocate token_indices on CPU, move to GPU AFTER JIT compilation
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 08:22:14 +00:00
0e7b06b55c debug: clone + sync token indices before JIT
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 08:20:42 +00:00
70c0618361 fix: allocate token_indices before CuTeDSL JIT compilation
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 08:19:47 +00:00
2bbe04efd8 debug: remove assert, test token corruption
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 08:18:38 +00:00
66627926c5 debug: int32 token indices with sync verify
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 08:16:11 +00:00
da02a5dc11 debug: assert token indices are correct after allocation
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 08:11:03 +00:00
c0d016a472 feat: compute_activation_global_scales warmup method
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 08:07:12 +00:00
8c9a51e006 fix: call _ensure_stacked in warmup test
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 08:06:34 +00:00
5ba77e355f test: warmup gs computation with safety margin sweep
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 07:59:02 +00:00
ae6b879d38 fix: pass expert_offsets without leading 0 to GEMM (matches pipeline)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 07:57:02 +00:00
a1e6f5f891 fix: searchsorted right=True for correct expert assignment
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 07:53:59 +00:00
ddffb7d8df docs: current bug analysis — scale_a layout vs expert_offsets mismatch
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 07:47:16 +00:00
ed90341ea9 fix: scatter+per-expert-swizzle scale assembly (cudagraph-safe)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 07:43:11 +00:00
37fecb588f fix: separate L1/L2 scale buffers (different K_sf), fix assembly calls
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 07:39:51 +00:00
b824b838a9 fix: 128-row-align each expert's scales in padded buffer
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 07:37:48 +00:00
8dadd9a723 test: scale assembly debug
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 07:37:05 +00:00
8642946274 fix: padded x_sf buffer for fixed-shape scale assembly
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 07:35:50 +00:00
418e29f7f5 fix: per-expert scale assembly (match assemble_scales_2d_side)
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 07:33:21 +00:00
7b95e76723 test: runner vs pipeline comparison + scale assembly comparison
biondizzle pushed to master at biondizzle/nvfp4-megamoe-kernel 2026-05-17 07:15:01 +00:00
366a0240a5 vllm tweaks