This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 19:37:12 +00:00
2c09545faa
diag: force block_m=128 to test UMMA_N=192 validity for mxf4nvf4
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-12 19:13:04 +00:00
f0652693a6
dangit again
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 19:06:31 +00:00
c1cbe488f3
diag: force a_format/b_format=5 (MXF8F6F4Format::E2M1), re-enable MMA, dump k=0+k=1
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-12 18:42:43 +00:00
054792c84e
dangit
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 18:38:04 +00:00
3b8aa5fd4d
diag: stub MMA + dump descriptors for ILLEGAL_INSTRUCTION debug
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-12 18:26:41 +00:00
de055b1e77
syupid clankers
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 18:05:13 +00:00
c56f5dda7e
fix: use UINT8 TMA for packed FP4 instead of 16U4_ALIGN8B
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 17:40:25 +00:00
b0094175a2
fix: restore elem_size declaration for TMA desc build
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 17:39:08 +00:00
48b5b2b702
fix: TMA dimensions for packed FP4 must be in individual FP4 values (not bytes)
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 17:14:46 +00:00
75f1c8544b
fix: remove smem_inner_dim doubling for packed FP4 TMA — must match MMA row width (BLOCK_K/2)
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 17:11:21 +00:00
b95f9eb446
revert: remove SMEM warp transpose (deadlock in elect_one_sync, not needed with transform_sf_token_idx)
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 16:48:08 +00:00
54a7de03a0
fix: add UTCCP SMEM warp transpose for NVFP4 packed UE4M3 scales
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 16:20:12 +00:00
8a53228745
fix: no GPU tensor ops in crash handler (CUDA is broken after 715)
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 16:06:02 +00:00
9115f83afb
debug: try/catch around mega_moe kernel with data diagnostics on crash
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 15:44:51 +00:00
758389645a
fix: contiguous copy for SF byte view sanity check
biondizzle
pushed to
nvfp4-mega-moe
at
biondizzle/DeepGEMM
2026-05-12 15:30:39 +00:00
cc3e3da45c
debug: check for zero/NaN/Inf in weight SF values
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-12 15:14:42 +00:00
307574bc91
test: signal alarm timeout for kernel hang
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-12 15:13:17 +00:00
fcd6de0a60
test: simplify SF fill to avoid shape mismatch
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-12 15:12:37 +00:00
d4c557fddc
test: fix float8 randn + SF int32 packing
biondizzle
pushed to
mega-moe-nvfp4
at
biondizzle/deepseek-v4-quant
2026-05-12 15:11:44 +00:00
28afc2406b
test: add random FP4 data and kernel timeout
First
Previous
...
132
133
134
135
136
...
Next
Last