[Frontend] Add Support for MM Encoder/Decoder Beam Search (Online Transcriptions) (#36160)

Signed-off-by: Alex Brooks <albrooks@redhat.com>
This commit is contained in:
Alex Brooks
2026-03-08 23:46:23 -06:00
committed by GitHub
parent 217f27598d
commit 65a4da1504
6 changed files with 216 additions and 25 deletions

View File

@@ -439,6 +439,8 @@ you can use the [official OpenAI Python client](https://github.com/openai/openai
Code example: [examples/online_serving/openai_transcription_client.py](../../examples/online_serving/openai_transcription_client.py)
NOTE: beam search is currently supported in the transcriptions endpoint for encoder-decoder multimodal models, e.g., whisper, but highly inefficient as work for handling the encoder/decoder cache is actively ongoing. This is an active point of ongoing optimization and will be handled properly in the very near future.
#### API Enforced Limits
Set the maximum audio file size (in MB) that VLLM will accept, via the