Files
nvfp4-megamoe-kernel/src
biondizzle b7c7e9fb50 refactor: clean up slot_token handling in cutlass_grouped_nvfp4_gemm
- Split provided_slot_token vs slot_token_out (returned to caller)
- No gather when slot_token=None (L2 path), no unnecessary alloc
- .contiguous() on gathered tensors for CUTLASS alignment
- Return slot_token_out consistently
2026-05-15 10:11:40 +00:00
..