This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
/
nvfp4-megamoe-kernel
Watch
1
Star
0
Fork
0
You've already forked nvfp4-megamoe-kernel
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
Files
6f4bbc997ad762fef69bf2f2acd0c257bc7c45f6
nvfp4-megamoe-kernel
/
dsv4
History
biondizzle
5493a8727e
P7: compressor early return + decode buffering (skip GEMMs when n_complete=0); sampler SMEM fix (LK=24 fits 48KB default); topk on float not bf16
2026-06-01 22:29:56 +00:00
..
cache
E1: Wire LayerCacheHandle gather methods + CUDA gather kernels
2026-05-30 21:09:21 +00:00
kernels
P7: compressor early return + decode buffering (skip GEMMs when n_complete=0); sampler SMEM fix (LK=24 fits 48KB default); topk on float not bf16
2026-06-01 22:29:56 +00:00
layers
P0 complete: Kill .item() in grouped_linear, reduce hot-path syncs
2026-06-01 22:21:12 +00:00
loader
Restructure: cutedsl/ -> dsv4/ with proper layering
2026-05-21 17:30:44 +00:00
model
P0 COMPLETE: Eliminate ALL .item() CPU-GPU syncs from NVFP4 activation path
2026-06-01 21:05:03 +00:00
ops
Fix gsa_buffer shape mismatch for MoE (M>1 rows)
2026-06-01 21:33:59 +00:00
reference
Restructure: cutedsl/ -> dsv4/ with proper layering
2026-05-21 17:30:44 +00:00
__init__.py
Restructure: cutedsl/ -> dsv4/ with proper layering
2026-05-21 17:30:44 +00:00