Files
nvfp4-megamoe-kernel/dsv4/layers
biondizzle 518a1d3f95 CUDA graph: Fix MoE scatter_add_ index dtype + fix second bincount
1. scatter_add_ requires int64 indices — ensure sorted_ids is .long()
2. Fixed the SECOND torch.bincount call (line 590) — same scatter_add_ pattern
3. Both code paths now use pre-allocated _tokens_per_expert_buf
2026-06-03 17:53:40 +00:00
..