biondizzle
  • Joined on 2025-12-10
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-13 12:17:28 +00:00
6a348d543d fix: use raw cudaDeviceSynchronize instead of DG_CUDA_CHECK macro
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-13 12:15:57 +00:00
c08a28888d debug: sync + printf before mega_moe kernel launch
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 23:16:48 +00:00
ad335c38fb tweax n shit
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 23:16:37 +00:00
f08bcd456b tweax n shit
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 22:34:03 +00:00
2bdda36bb7 fucken aye
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 21:58:13 +00:00
fa825c16b9 fucken ay
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 21:52:51 +00:00
dbf1d11f9f ayee
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 21:49:46 +00:00
ef3edb3481 ba fongol again4
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 21:48:06 +00:00
b74dc7121a ba fongol again3
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 21:47:40 +00:00
7bbbdbcc79 ba fongol again
2e674f87c1 ba fongol again
5d127d8294 ba fongol again
Compare 3 commits »
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 21:46:47 +00:00
52cf3f2e25 ba fongol again
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 21:34:11 +00:00
02decb486e ba fongol
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 21:30:40 +00:00
48f1f9dc5e clanker nonsense again
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 21:30:03 +00:00
5cabc1f7d9 clanker nonsense again
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 21:28:53 +00:00
25a2d4e6ad clanker nonsense
d88ea9842b fix: add missing staging_kernel.py to Dockerfile — BF16→E2M1+UE4M3 quantization was never in container
91d7d9bad7 fucken a
Compare 3 commits »
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-12 20:57:08 +00:00
d68e113af1 remove spammy shit
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 20:56:39 +00:00
8b27e85ee5 fix: advance TMEM SF start column per UMMA atom for scale_vec::4X
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 20:26:27 +00:00
74bf612771 NVFP4 mega MoE: sf_id=0 fix for scale_vec::4X + UINT8 TMA + SF pipeline + interleaving
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 20:07:22 +00:00
698634dea5 fix: sf_id must be 0 for scale_vec::4X — passing sf_id=k was ILLEGAL_INSTRUCTION root cause
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-12 20:01:40 +00:00
4442c06ba8 diag: remove format=5 override, keep block_m=128 baseline test