This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
/
nvfp4-megamoe-kernel
Watch
1
Star
0
Fork
0
You've already forked nvfp4-megamoe-kernel
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
Files
9a43e9aa7747f7ac3fb52e782dc09d4b5ceed3bc
nvfp4-megamoe-kernel
/
dsv4
History
biondizzle
1c18c16c68
Fix production rope.py: FP32 arithmetic for forward_rope_partial + inverse_rope_bf16
2026-05-31 09:17:36 +00:00
..
cache
E1: Wire LayerCacheHandle gather methods + CUDA gather kernels
2026-05-30 21:09:21 +00:00
kernels
E5: Fold batch loop into native kernel grid (blockIdx.z)
2026-05-30 21:21:02 +00:00
layers
E2/E3: compressor bridge, indexer bridge, flush pipeline wiring
2026-05-30 21:16:54 +00:00
loader
Restructure: cutedsl/ -> dsv4/ with proper layering
2026-05-21 17:30:44 +00:00
model
E3: Implement DSV4Model — full model class
2026-05-30 21:15:57 +00:00
ops
Fix production rope.py: FP32 arithmetic for forward_rope_partial + inverse_rope_bf16
2026-05-31 09:17:36 +00:00
reference
Restructure: cutedsl/ -> dsv4/ with proper layering
2026-05-21 17:30:44 +00:00
__init__.py
Restructure: cutedsl/ -> dsv4/ with proper layering
2026-05-21 17:30:44 +00:00