This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-12 15:07:37 +00:00
787d427847
test: fix NVFP4 mega_moe test dimensions for SMEM alignment
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 15:04:26 +00:00
94b30dc2bc
revert: block_n/4 was correct (SwiGLU halving × FP4 packing)
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 14:58:12 +00:00
c71fb97687
fix: L1 output TMA smem_inner_dim was block_n/4, should be block_n/2
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-12 14:53:49 +00:00
8737fd57c0
remove crap
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 14:31:41 +00:00
d8ae7a3225
debug: print SF shape/strides before interleave
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 14:23:04 +00:00
e498a2c729
fix: single transpose back to MN-major, don't double-transpose
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 14:22:07 +00:00
916f03d528
debug: add transform output shape/stride prints
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 14:11:56 +00:00
1f13b24354
debug: add strides to SF debug prints
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 14:02:00 +00:00
bfe612969b
fix: preserve MN-major layout when interleaving L1 SF tensors
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 13:48:47 +00:00
76220ac6ee
fix: force contiguous on SF tensors before C++ call
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 13:28:33 +00:00
bf5bf8d995
fix: unpack weight tuples before printing debug info
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-12 13:10:44 +00:00
52c3aefe73
bump cache busters to 33 for debug build
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 13:10:34 +00:00
5ac151d0a5
debug: print tensor dtypes/shapes at C++ call boundary in fp8_nvfp4_mega_moe
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-12 12:23:46 +00:00
ca1d306890
fix: use torch.int8 for packed FP4 tensors (kPackedFP4=kInt8, not uint8)
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-12 11:15:07 +00:00
b8f95ffad3
docker: add OMP_NUM_THREADS=64, remove --tool initcheck, mount cubin cache
5840291ea3
fix staging kernel packed_k_mask double-count
Compare 2 commits »
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 08:08:21 +00:00
26a8ab75a1
NVFP4: fix SF pipeline — 2 K-cols per BLOCK_K for group=16
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-12 07:24:37 +00:00
5ea5b579c3
Trim banner, no code changes
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 07:08:09 +00:00
680874d067
NVFP4 L1 epilogue: group_size=16 SF layout
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 06:51:42 +00:00
c0850a6859
Fix weight TMA descriptors: packed E2M1 needs K/2, block_k/2, swizzle/2
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 05:52:35 +00:00
fbfeb54c9a
Fix fold_global_scale: UE4M3 scales use .to(float32), not shift-by-23
First
Previous
...
133
134
135
136
137
...
Next
Last