biondizzle
  • Joined on 2025-12-10
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 09:45:55 +00:00
57c629ed1b fix: cast to int32 before >> 23 (uint32 doesn't support right-shift)
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 09:42:05 +00:00
6d7231a50e fix: reinterpret float32 bits as uint32 before >> 23 for UE8M0
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 09:30:46 +00:00
f44ff7f6ca docs: document SM100 hardware constraint and full debugging log
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 09:28:47 +00:00
03b8c99ee1 fix: use mxf8f6f4 (UE8M0) on SM100 — mxf4nvf4 requires SM103+
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 09:09:33 +00:00
b856c57ba6 fix: kGranK=32 in C++ binding (was still 16 from old block16 code)
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 08:54:16 +00:00
cd7a612175 debug: add shape logging to SF packing
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 08:37:04 +00:00
dcebe033e2 fix: use scale_vec::2X (block32) for SM100 B200 compatibility
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 08:05:52 +00:00
8cb23bdb78 fix: import NVFP4 SymmBuffer from deep_gemm.mega
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 08:05:23 +00:00
deff80c9c1 fix: add Python wrapper for NVFP4 SymmBuffer allocation
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 07:49:13 +00:00
ff579c9767 fix: use NVFP4 SymmBuffer (2x SF size for group_size=16)
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 07:33:03 +00:00
acbe006498 docs: update debugging log in README
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 07:32:12 +00:00
8d02eb38fa fix: transpose SF to MN-major layout before TMA stride checks
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 07:31:01 +00:00
7154500f22 fix: reshape SF to 2D before transform_sf_into_required_layout
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 07:19:16 +00:00
1da40c53da fix: add patch cache buster to Dockerfile
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 07:13:46 +00:00
b532742530 debug: add shape/dtype logging to finalize_weights
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 07:13:01 +00:00
f98c1f7fd5 fix: add gran_k=16 (NVFP4) support to transform_sf_into_required_layout
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 07:05:12 +00:00
388fd8dcfd fix: pack UE4M3 into int32 before transform_sf_into_required_layout
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 06:54:37 +00:00
acae75e109 fix: use transform_sf_into_required_layout for proper TMA-aligned SF
biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM 2026-05-11 06:36:35 +00:00
5cb4fcaef3 fix: cast uint8 weights to int8 (kPackedFP4) for DeepGEMM compatibility
biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant 2026-05-11 06:22:17 +00:00
b1cf4232ee feat: wire DeepGEMM NVFP4 mega_moe kernel into vLLM patch