biondizzle
  • Joined on 2025-12-10
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 19:37:12 +00:00
2c09545faa diag: force block_m=128 to test UMMA_N=192 validity for mxf4nvf4
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 19:13:04 +00:00
f0652693a6 dangit again
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 19:06:31 +00:00
c1cbe488f3 diag: force a_format/b_format=5 (MXF8F6F4Format::E2M1), re-enable MMA, dump k=0+k=1
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 18:42:43 +00:00
054792c84e dangit
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 18:38:04 +00:00
3b8aa5fd4d diag: stub MMA + dump descriptors for ILLEGAL_INSTRUCTION debug
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 18:26:41 +00:00
de055b1e77 syupid clankers
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 18:05:13 +00:00
c56f5dda7e fix: use UINT8 TMA for packed FP4 instead of 16U4_ALIGN8B
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 17:40:25 +00:00
b0094175a2 fix: restore elem_size declaration for TMA desc build
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 17:39:08 +00:00
48b5b2b702 fix: TMA dimensions for packed FP4 must be in individual FP4 values (not bytes)
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 17:14:46 +00:00
75f1c8544b fix: remove smem_inner_dim doubling for packed FP4 TMA — must match MMA row width (BLOCK_K/2)
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 17:11:21 +00:00
b95f9eb446 revert: remove SMEM warp transpose (deadlock in elect_one_sync, not needed with transform_sf_token_idx)
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 16:48:08 +00:00
54a7de03a0 fix: add UTCCP SMEM warp transpose for NVFP4 packed UE4M3 scales
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 16:20:12 +00:00
8a53228745 fix: no GPU tensor ops in crash handler (CUDA is broken after 715)
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 16:06:02 +00:00
9115f83afb debug: try/catch around mega_moe kernel with data diagnostics on crash
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 15:44:51 +00:00
758389645a fix: contiguous copy for SF byte view sanity check
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 15:30:39 +00:00
cc3e3da45c debug: check for zero/NaN/Inf in weight SF values
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 15:14:42 +00:00
307574bc91 test: signal alarm timeout for kernel hang
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 15:13:17 +00:00
fcd6de0a60 test: simplify SF fill to avoid shape mismatch
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 15:12:37 +00:00
d4c557fddc test: fix float8 randn + SF int32 packing
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 15:11:44 +00:00
28afc2406b test: add random FP4 data and kernel timeout