Relax Transformers modeling backend MoE experts check (#28952)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
Harry Mellor
2025-11-19 14:50:30 +01:00
committed by GitHub
parent 09540cd918
commit 4f5299f717
2 changed files with 11 additions and 2 deletions

View File

@@ -79,7 +79,9 @@ To make your model compatible with the Transformers modeling backend, it needs:
1. Add `is_causal = False` to `MyAttention`.
- If your model is mixture-of-experts (MoE):
1. Your sparse MoE block must have an attribute called `experts`.
2. The class of `experts` (`MyExperts`) must inherit from `nn.ModuleList`.
2. The class of `experts` (`MyExperts`) must either:
- Inherit from `nn.ModuleList` (naive).
- Or contain all 3D `nn.Parameters` (packed).
3. `MyExperts.forward` must accept `hidden_states`, `top_k_index`, `top_k_weights`.
2. `MyAttention` must use `ALL_ATTENTION_FUNCTIONS` to call attention.
3. `MyModel` must contain `_supports_attention_backend = True`.