[docs] Prompt Embedding feature support (#25288)

Signed-off-by: Andrew Sansom <andrew@protopia.ai>
2025-09-19 19:46:23 -05:00
parent c7e713616a
commit b8a287a0a8
2 changed files with 18 additions and 19 deletions
--- a/docs/features/prompt_embeds.md
+++ b/docs/features/prompt_embeds.md
@@ -6,9 +6,6 @@ This page teaches you how to pass prompt embedding inputs to vLLM.

 The traditional flow of text data for a Large Language Model goes from text to token ids (via a tokenizer) then from token ids to prompt embeddings. For a traditional decoder-only model (such as meta-llama/Llama-3.1-8B-Instruct), this step of converting token ids to prompt embeddings happens via a look-up from a learned embedding matrix, but the model is not limited to processing only the embeddings corresponding to its token vocabulary.

-!!! note
-    Prompt embeddings are currently only supported in the v0 engine.
-
 ## Offline Inference

 To input multi-modal data, follow this schema in [vllm.inputs.EmbedsPrompt][]: