nvfp4-megamoe-kernel

Files

biondizzle 9fec7d609e Fix gsa_buffer shape mismatch for MoE (M>1 rows)

compute_amax_gsa returns a scalar, but quantize_from_buffer expects (M,).
Broadcast the scalar gsa to (M,) — all rows use the same gsa (global max).

2026-06-01 21:33:59 +00:00

__init__.py

2026-05-21 17:30:44 +00:00

custom_ops.py

2026-05-27 15:15:03 +00:00

gemm_runner.py

2026-06-01 00:04:48 +00:00

layouts.py

2026-05-21 17:30:44 +00:00

quantize.py

2026-06-01 21:33:59 +00:00

rope.py

2026-05-31 09:17:36 +00:00

router.py

2026-06-01 00:00:07 +00:00

topk_select.py

2026-05-21 21:54:05 +00:00

topk.py

2026-05-21 17:30:44 +00:00