[Misc] Reorganize inputs (#35182)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
@@ -27,11 +27,9 @@ LLM Class.
|
||||
|
||||
- [vllm.LLM][]
|
||||
|
||||
LLM Inputs.
|
||||
Prompt schema for LLM APIs.
|
||||
|
||||
- [vllm.inputs.PromptType][]
|
||||
- [vllm.inputs.TextPrompt][]
|
||||
- [vllm.inputs.TokensPrompt][]
|
||||
- [vllm.inputs.llm][]
|
||||
|
||||
## vLLM Engines
|
||||
|
||||
@@ -58,13 +56,7 @@ Looking to add your own multi-modal model? Please follow the instructions listed
|
||||
|
||||
- [vllm.multimodal.MULTIMODAL_REGISTRY][]
|
||||
|
||||
### Inputs
|
||||
|
||||
User-facing inputs.
|
||||
|
||||
- [vllm.multimodal.inputs.MultiModalDataDict][]
|
||||
|
||||
Internal data structures.
|
||||
### Internal data structures
|
||||
|
||||
- [vllm.multimodal.inputs.PlaceholderRange][]
|
||||
- [vllm.multimodal.inputs.NestedTensors][]
|
||||
@@ -72,7 +64,6 @@ Internal data structures.
|
||||
- [vllm.multimodal.inputs.MultiModalFieldConfig][]
|
||||
- [vllm.multimodal.inputs.MultiModalKwargsItem][]
|
||||
- [vllm.multimodal.inputs.MultiModalKwargsItems][]
|
||||
- [vllm.multimodal.inputs.MultiModalInputs][]
|
||||
|
||||
### Data Parsing
|
||||
|
||||
|
||||
@@ -23,7 +23,7 @@ Declare supported languages and capabilities:
|
||||
from torch import nn
|
||||
|
||||
from vllm.config import ModelConfig, SpeechToTextConfig
|
||||
from vllm.inputs.data import PromptType
|
||||
from vllm.inputs import PromptType
|
||||
from vllm.model_executor.models.interfaces import SupportsTranscription
|
||||
|
||||
class YourASRModel(nn.Module, SupportsTranscription):
|
||||
@@ -66,7 +66,7 @@ This is for controlling general behavior of the API when serving your model:
|
||||
|
||||
See [Audio preprocessing and chunking](#audio-preprocessing-and-chunking) for what each field controls.
|
||||
|
||||
Implement the prompt construction via [get_generation_prompt][vllm.model_executor.models.interfaces.SupportsTranscription.get_generation_prompt]. The server passes you the resampled waveform and task parameters; you return a valid [PromptType][vllm.inputs.data.PromptType]. There are two common patterns:
|
||||
Implement the prompt construction via [get_generation_prompt][vllm.model_executor.models.interfaces.SupportsTranscription.get_generation_prompt]. The server passes you the resampled waveform and task parameters; you return a valid [PromptType][vllm.inputs.llm.PromptType]. There are two common patterns:
|
||||
|
||||
#### Multimodal LLM with audio embeddings (e.g., Voxtral, Gemma3n)
|
||||
|
||||
|
||||
@@ -18,7 +18,7 @@ This page teaches you how to pass multi-modal inputs to [multi-modal models](../
|
||||
To input multi-modal data, follow this schema in [vllm.inputs.PromptType][]:
|
||||
|
||||
- `prompt`: The prompt should follow the format that is documented on HuggingFace.
|
||||
- `multi_modal_data`: This is a dictionary that follows the schema defined in [vllm.multimodal.inputs.MultiModalDataDict][].
|
||||
- `multi_modal_data`: This is a dictionary that follows the schema defined in [vllm.inputs.MultiModalDataDict][].
|
||||
|
||||
### Image Inputs
|
||||
|
||||
|
||||
Reference in New Issue
Block a user