This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-12 05:52:32 +00:00
74af9984f6
Bug fixes: UE4M3 scale conversion, staging kernel SF/E2M1 packing, wo_a UE4M3, README overhaul
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 23:58:09 +00:00
af092fa7ba
fix: double SMEM SF allocation for NVFP4 group=16 + clean stale comments
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 23:44:13 +00:00
aa97a3f949
fix: correct TMEM column layout for scale_vec::4X
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 23:17:53 +00:00
d6551617c0
fix: 4 kernel compilation fixes for packed FP4
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 22:55:30 +00:00
49e5646b42
fix: remove duplicate kInt8 case — kPackedFP4 is already kInt8
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 22:54:51 +00:00
80df24a641
fix: add kInt8 dtype support to TMA descriptor + change activation tensors to kInt8
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 22:40:11 +00:00
e608a20dec
docs: major README update — packed FP4 SMEM layout, L1 epilogue, TMA descriptors
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-11 22:39:40 +00:00
a36bf47f11
fix: use tl.split instead of indexing for E2M1 pair packing
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-11 22:23:12 +00:00
27dbf2850f
fix: replace nested tl.where with sum-of-comparisons for E2M1 quantization
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-11 22:08:51 +00:00
3d1f3de190
fix: syntax error — move triton imports before docstring, remove orphan @triton.jit
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-11 21:59:58 +00:00
79d866995f
bump cache buster 32 for packed FP4 mxf4nvf4 fix
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 21:59:39 +00:00
30d72e7ef5
fix: packed FP4 for mxf4nvf4 — correct SMEM layout, UMMA descriptors, L1 epilogue
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-11 21:29:35 +00:00
c85b84b0fe
fix: staging kernel outputs unpacked E2M1 (1 byte/element, not packed 2/byte)
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 21:27:36 +00:00
0ac73a82f9
fix: L1 output uses unpacked E2M1 (1 byte/element) like FP8
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-11 21:05:56 +00:00
01cfd02759
fix: same reshape fix in main patch file
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-11 21:04:55 +00:00
076d325c97
fix: use reshape instead of risky [0::2] slicing for E2M1 packing
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-11 21:02:22 +00:00
8dc917c498
fix: topk_weights_out store missing topk_offsets multiplier
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 20:57:36 +00:00
091b974736
fix: L1 epilogue uses STSM with XOR swizzle for E2M1 FP4 output
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-11 20:48:05 +00:00
a554de8b24
fix: dispatch TMA byte counts for FP4 (kHidden/2), rename fp8→fp4 layout refs
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-11 20:30:16 +00:00
17ba5a9d7b
bump cache buster 30 for FP4 staging + DeepGEMM FP4 activations
First
Previous
...
134
135
136
137
138
...
Next
Last