[Feature] Enable TRITON_ATTN for Batch Invariance (#33688)

Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
This commit is contained in:
Frank Wang
2026-02-03 21:27:34 -08:00
committed by GitHub
parent 5e1e0a0fbd
commit 45f8fd6f97
4 changed files with 13 additions and 4 deletions

View File

@@ -108,6 +108,7 @@ Batch invariance has been tested and verified on the following models:
- **Qwen3 (MoE)**: `Qwen/Qwen3-30B-A3B`, `Qwen/Qwen3-Next-80B-A3B-Instruct`
- **Qwen2.5**: `Qwen/Qwen2.5-0.5B-Instruct`, `Qwen/Qwen2.5-1.5B-Instruct`, `Qwen/Qwen2.5-3B-Instruct`, `Qwen/Qwen2.5-7B-Instruct`, `Qwen/Qwen2.5-14B-Instruct`, `Qwen/Qwen2.5-32B-Instruct`
- **Llama 3**: `meta-llama/Llama-3.1-8B-Instruct`, `meta-llama/Llama-3.2-1B-Instruct`
- **GPT-OSS**: `openai/gpt-oss-20b`, `openai/gpt-oss-120b`
Other models may also work, but these have been explicitly validated. If you encounter issues with a specific model, please report them on the [GitHub issue tracker](https://github.com/vllm-project/vllm/issues/new/choose).