|
|
2f053f674e
|
wip: fused SwiGLU kernel scaffold + bridge interleave + plan
- fused_swiglu_grouped_mm.py: copypaste of torch_scaled_grouped_mm.py with
class rename and fused_swiglu/swiglu_limit params added
- bridge.py: added interleave_l1_weights, deinterleave_l1_weights,
warmup_fused_swiglu_compilation
- Pure-PyTorch interleave invariant passes (A@cat vs deinterleave(A@interleave))
- Standalone GEMM interleave test fails due to kernel-internal N-tiling
layout (expected, skipping per plan)
- FUSED_EPILOGUE_PLAN.md updated with register layout, amax shuffle plan,
4-step implementation strategy
|
2026-05-20 03:04:38 +00:00 |
|