Files
nvfp4-megamoe-kernel/tests
biondizzle c1aa4af123 Shared expert: dedicated CuTeDSL runner with proper scale assembly
- CuTeDSLSharedExpertRunner: num_groups=1 GEMM, no scatter/routing
- _assemble_scales_single_group: pad to 128 rows + Blackwell swizzle
- All buffers pre-allocated for cudagraph compatibility
- Updated test to use dedicated runner instead of MoE runner hack
2026-05-18 20:08:34 +00:00
..
2026-05-17 22:58:27 +00:00
2026-05-17 07:37:47 +00:00