nvfp4-megamoe-kernel

Files

biondizzle 2687d1fc53 fix: convert global expert IDs to local before GEMM

vLLM's symm_buffer stores topk_ids as GLOBAL expert IDs (0..383).
Our weight tensors are indexed by LOCAL IDs (0..47 per rank).
Each rank r handles experts [r*48, r*48+47]. Without conversion,
topk_ids like 137, 222, 378 would index way out of bounds in the
weight tensor (shape (48, N, K)), producing garbage.

Derive experts_start_idx from the topk_ids and subtract to get
local IDs. This was why all ranks except rank 0 produced zero
expert matches → zero output → garbage text.

2026-05-14 17:43:58 +00:00

nvfp4_megamoe_kernel

fix: convert global expert IDs to local before GEMM

2026-05-14 17:43:58 +00:00

nvfp4_megamoe_kernel.egg-info

Implement TileLang NVFP4 mega_moe L1/L2 kernels

2026-05-13 22:36:58 +00:00