This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
/
nvfp4-megamoe-kernel
Watch
1
Star
0
Fork
0
You've already forked nvfp4-megamoe-kernel
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
Files
911a80e7217c585dd9d4e0ff591c596d37335768
nvfp4-megamoe-kernel
/
dsv4
/
kernels
History
biondizzle
911a80e721
D1.3: Fix tOrP0 for SMEM-P - skip make_tensor when offset is 0
...
CuTeDSL doesn't support OpResult + int. When offset is 0 (SMEM-P), just use tOrP directly.
2026-05-23 21:03:00 +00:00
..
attention
D1.3: Fix tOrP0 for SMEM-P - skip make_tensor when offset is 0
2026-05-23 21:03:00 +00:00
cache
KV Cache: schema, allocator, pools, manager, append_swa kernel
2026-05-22 00:08:38 +00:00
compressor
Restructure: cutedsl/ -> dsv4/ with proper layering
2026-05-21 17:30:44 +00:00
cuda
Fix indexer score kernel: use static shared memory, correct FP4 head offsets
2026-05-22 01:45:05 +00:00
decode
Restructure: cutedsl/ -> dsv4/ with proper layering
2026-05-21 17:30:44 +00:00
gemm
fix: add SwiGLU clamping to fused kernel (paper §4.2.3, CG-1)
2026-05-23 06:32:54 +00:00
indexer
Indexer: score+topk kernel, gather KV, compute_valid_lens
2026-05-22 01:20:39 +00:00
router
Router: Blackwell-native fused decode kernel — real CuTeDSL implementation
2026-05-21 22:04:20 +00:00
__init__.py
Restructure: cutedsl/ -> dsv4/ with proper layering
2026-05-21 17:30:44 +00:00