This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
0 Followers
·
0 Following
Joined on
2025-12-10
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
User to block:
Optional note:
The note is not visible to the blocked user.
Cancel
Block
Repositories
25
Projects
Packages
Public Activity
Starred Repositories
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 20:19:23 +00:00
647c03b2ee
fix: make_b_k_major must preserve shape — use double-permute trick
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 20:18:20 +00:00
ed4f501bba
fix: make_b_k_major stride check — K-major means stride[1]==1, not stride[2]==1
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 20:16:31 +00:00
2162cee4ad
fix: restore proper quantize_weight_to_nvfp4 — K is the packed dim, not N
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 20:09:03 +00:00
10f1dca982
fix: import ceil_div from correct module
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 20:08:24 +00:00
81632e2f21
fix: correct cutlass_torch import (cutlass.torch, not top-level)
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 20:06:39 +00:00
16c4fad025
fix: remove cutlass.cute.backend import
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 20:06:00 +00:00
44b40d41fe
fix: compile CuTeDSL kernel with real tensors, not dummy shapes
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 20:00:37 +00:00
79281b6fda
fix: compute K_packed/N_packed before passing to _get_compiled_kernel
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 19:59:45 +00:00
caf93d6c45
fix: pass K_packed/N_packed to _get_compiled_kernel
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 19:58:14 +00:00
ecc7b83334
fix: compile CuTeDSL kernel with actual tensor shapes, not dummy 256x256
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 19:55:21 +00:00
cc75a55bd9
restore: new bridge/moe_pipeline/layertest
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 19:54:06 +00:00
0c878b3a9e
temp: restore old layertest+bridge for cosine comparison
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 19:38:34 +00:00
0069769d12
debug: print global scales
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 19:31:56 +00:00
84589fe984
debug: more prints
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 19:29:43 +00:00
fa2d5708c5
debug: add L1 GEMM and SiLU output debug prints
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 19:28:24 +00:00
4c06c51ec3
fix: moe_pipeline.py gate/up split — L1 output is 2*intermediate, not intermediate
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 19:23:45 +00:00
da31ce7e1a
allow for cuda graphs again
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 19:07:37 +00:00
d15c43294b
fix: test L2 weight N dim should be hidden_size, not hidden_size//2
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 19:07:10 +00:00
28788c6f55
fix: L1 weight N dimension is 2*intermediate (gate+up), not intermediate
biondizzle
pushed to
master
at
biondizzle/nvfp4-megamoe-kernel
2026-05-16 18:55:49 +00:00
f7e29fdf1e
docs: update README with cudagraph compatibility work and decisions
First
Previous
...
116
117
118
119
120
...
Next
Last