Files
nvfp4-megamoe-kernel/pyproject.toml
biondizzle 9ff1679064 Replace MHC TileLang kernels with pure PyTorch
TileLang kernels (mhc_pre_big_fuse_tilelang, mhc_fused_tilelang) don't
work correctly on Blackwell SM100 and cause empty model output.

Replace with pure PyTorch implementations:
- mhc_pre_torch: Sinkhorn-normalized HC residual mixing
- mhc_post_torch: HC post block (einsum residual + post layer mix)
- mhc_fused_post_pre_torch: Fused post+pre (composition of above)
- hc_head_fused_torch: RMS norm + linear + sigmoid + weighted sum

Patch both layers/mhc.py (CustomOp dispatch) and kernels/mhc/__init__.py
(no tilelang import). Also remove tilelang from pyproject.toml deps.
2026-05-19 05:07:41 +00:00

350 B