biondizzle
  • Joined on 2025-12-10
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 20:29:38 +00:00
7a4403fa98 feat: FP4 staging kernel - BF16 → E2M1 packed + UE4M3 block16 scales
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 20:29:09 +00:00
b3d1aae038 feat: full FP4 activations for mxf4nvf4 - E2M1 packed A side + UE4M3 scales
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 19:55:43 +00:00
0fd2d4f078 diag: add weight_scale uint8 histogram to verify E8M0 vs E4M3 format
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 19:51:50 +00:00
50a945bde4 bump cache buster 29
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 19:51:47 +00:00
48b905406a diag: add CUDA sync after mega_moe finalize + forward to catch errors
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 19:40:10 +00:00
35f6b66678 fix: UE8M0 reinterpret in DeepGEMM fold_global_scale + bump cache
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 19:40:06 +00:00
2cd86ff5e7 fix: UE8M0→float32 reinterpret in fold_global_scale (Bug #7)
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 19:26:23 +00:00
f32d6b5b48 bump cache buster to 27
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 19:26:14 +00:00
cd24182e36 diag: add NaN/Inf + FP8-dtype checks after NVFP4 dequant
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 18:48:08 +00:00
8ae2214bad fix: reorder Dockerfile ARG before COPY for proper cache busting
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 16:38:54 +00:00
c4891e9ee2 fix: manual FP32→UE4M3 quant in Triton staging kernel
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 16:25:10 +00:00
47621bb990 add NVFP4SymmBuffer + get_symm_buffer_for_nvfp4_mega_moe Python wrapper
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 16:12:57 +00:00
436109081c bump cache buster to 24
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 16:12:40 +00:00
5faf9916eb fix: UE4M3 activation scales + group_size=16 for NVFP4 mega_moe
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 16:11:14 +00:00
86a1263f44 fix: gran_k=16 in transform_sf + sm_100a arch for NVFP4 mega_moe
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 15:04:50 +00:00
fbdddaccf4 revert: restore mxf4nvf4/block16 code (correct path for sm_100a)
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 14:24:56 +00:00
e80fe9af60 docs: CORRECTED — mxf4nvf4 IS supported on sm_100a (B200)
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 14:24:14 +00:00
220649c188 docs: CORRECTED — mxf4nvf4 IS supported on sm_100a (B200)
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 13:55:19 +00:00
c2f4a30780 docs: comprehensive README update through build 22
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 13:53:42 +00:00
cfead0012d docs: comprehensive README update through build 22