- fused_swiglu_grouped_mm.py: copypaste of torch_scaled_grouped_mm.py with class rename and fused_swiglu/swiglu_limit params added - bridge.py: added interleave_l1_weights, deinterleave_l1_weights, warmup_fused_swiglu_compilation - Pure-PyTorch interleave invariant passes (A@cat vs deinterleave(A@interleave)) - Standalone GEMM interleave test fails due to kernel-internal N-tiling layout (expected, skipping per plan) - FUSED_EPILOGUE_PLAN.md updated with register layout, amax shuffle plan, 4-step implementation strategy
5.4 KiB
5.4 KiB