[Model] Add transcription support for Qwen3-Omni (#29828)

Signed-off-by: Muhammad Hashmi <mhashmi@berkeley.edu> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: NickLucche <nlucches@redhat.com>
2026-02-04 13:17:47 -08:00
parent 4292c90a2a
commit 535de06cb1
3 changed files with 104 additions and 2 deletions
--- a/docs/contributing/model/transcription.md
+++ b/docs/contributing/model/transcription.md
@@ -251,6 +251,7 @@ No extra registration is required beyond having your model class available via t
 - Whisper encoder–decoder (audio-only): [vllm/model_executor/models/whisper.py](../../../vllm/model_executor/models/whisper.py)
 - Voxtral decoder-only (audio embeddings + LLM): [vllm/model_executor/models/voxtral.py](../../../vllm/model_executor/models/voxtral.py). Make sure to have installed `mistral-common[audio]`.
 - Gemma3n decoder-only with fixed instruction prompt: [vllm/model_executor/models/gemma3n_mm.py](../../../vllm/model_executor/models/gemma3n_mm.py)
+- Qwen3-Omni multimodal with audio embeddings: [vllm/model_executor/models/qwen3_omni_moe_thinker.py](../../../vllm/model_executor/models/qwen3_omni_moe_thinker.py)

 ## Test with the API