This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 09:45:55 +00:00
57c629ed1b
fix: cast to int32 before >> 23 (uint32 doesn't support right-shift)
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 09:42:05 +00:00
6d7231a50e
fix: reinterpret float32 bits as uint32 before >> 23 for UE8M0
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 09:30:46 +00:00
f44ff7f6ca
docs: document SM100 hardware constraint and full debugging log
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 09:28:47 +00:00
03b8c99ee1
fix: use mxf8f6f4 (UE8M0) on SM100 — mxf4nvf4 requires SM103+
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 09:09:33 +00:00
b856c57ba6
fix: kGranK=32 in C++ binding (was still 16 from old block16 code)
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 08:54:16 +00:00
cd7a612175
debug: add shape logging to SF packing
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 08:37:04 +00:00
dcebe033e2
fix: use scale_vec::2X (block32) for SM100 B200 compatibility
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-11 08:05:52 +00:00
8cb23bdb78
fix: import NVFP4 SymmBuffer from deep_gemm.mega
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 08:05:23 +00:00
deff80c9c1
fix: add Python wrapper for NVFP4 SymmBuffer allocation
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-11 07:49:13 +00:00
ff579c9767
fix: use NVFP4 SymmBuffer (2x SF size for group_size=16)
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 07:33:03 +00:00
acbe006498
docs: update debugging log in README
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 07:32:12 +00:00
8d02eb38fa
fix: transpose SF to MN-major layout before TMA stride checks
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 07:31:01 +00:00
7154500f22
fix: reshape SF to 2D before transform_sf_into_required_layout
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-11 07:19:16 +00:00
1da40c53da
fix: add patch cache buster to Dockerfile
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-11 07:13:46 +00:00
b532742530
debug: add shape/dtype logging to finalize_weights
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 07:13:01 +00:00
f98c1f7fd5
fix: add gran_k=16 (NVFP4) support to transform_sf_into_required_layout
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 07:05:12 +00:00
388fd8dcfd
fix: pack UE4M3 into int32 before transform_sf_into_required_layout
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 06:54:37 +00:00
acae75e109
fix: use transform_sf_into_required_layout for proper TMA-aligned SF
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 06:36:35 +00:00
5cb4fcaef3
fix: cast uint8 weights to int8 (kPackedFP4) for DeepGEMM compatibility
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-11 06:22:17 +00:00
b1cf4232ee
feat: wire DeepGEMM NVFP4 mega_moe kernel into vLLM patch
First
Previous
...
136
137
138
139
140
...
Next
Last