biondizzle

biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM

2026-05-11 09:45:55 +00:00

57c629ed1b fix: cast to int32 before >> 23 (uint32 doesn't support right-shift)

biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM

2026-05-11 09:42:05 +00:00

6d7231a50e fix: reinterpret float32 bits as uint32 before >> 23 for UE8M0

biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM

2026-05-11 09:30:46 +00:00

f44ff7f6ca docs: document SM100 hardware constraint and full debugging log

biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM

2026-05-11 09:28:47 +00:00

03b8c99ee1 fix: use mxf8f6f4 (UE8M0) on SM100 — mxf4nvf4 requires SM103+

biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM

2026-05-11 09:09:33 +00:00

b856c57ba6 fix: kGranK=32 in C++ binding (was still 16 from old block16 code)

biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM

2026-05-11 08:54:16 +00:00

cd7a612175 debug: add shape logging to SF packing

biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM

2026-05-11 08:37:04 +00:00

dcebe033e2 fix: use scale_vec::2X (block32) for SM100 B200 compatibility

biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-11 08:05:52 +00:00

8cb23bdb78 fix: import NVFP4 SymmBuffer from deep_gemm.mega

biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM

2026-05-11 08:05:23 +00:00

deff80c9c1 fix: add Python wrapper for NVFP4 SymmBuffer allocation

biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-11 07:49:13 +00:00

ff579c9767 fix: use NVFP4 SymmBuffer (2x SF size for group_size=16)

biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM

2026-05-11 07:33:03 +00:00

acbe006498 docs: update debugging log in README

biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM

2026-05-11 07:32:12 +00:00

8d02eb38fa fix: transpose SF to MN-major layout before TMA stride checks

biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM

2026-05-11 07:31:01 +00:00

7154500f22 fix: reshape SF to 2D before transform_sf_into_required_layout

biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-11 07:19:16 +00:00

1da40c53da fix: add patch cache buster to Dockerfile

biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-11 07:13:46 +00:00

b532742530 debug: add shape/dtype logging to finalize_weights

biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM

2026-05-11 07:13:01 +00:00

f98c1f7fd5 fix: add gran_k=16 (NVFP4) support to transform_sf_into_required_layout

biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM

2026-05-11 07:05:12 +00:00

388fd8dcfd fix: pack UE4M3 into int32 before transform_sf_into_required_layout

biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM

2026-05-11 06:54:37 +00:00

acae75e109 fix: use transform_sf_into_required_layout for proper TMA-aligned SF

biondizzle pushed to nvfp4-mega-moe at biondizzle/DeepGEMM

2026-05-11 06:36:35 +00:00

5cb4fcaef3 fix: cast uint8 weights to int8 (kPackedFP4) for DeepGEMM compatibility

biondizzle pushed to mega-moe-nvfp4 at biondizzle/deepseek-v4-quant

2026-05-11 06:22:17 +00:00

b1cf4232ee feat: wire DeepGEMM NVFP4 mega_moe kernel into vLLM patch