[Doc] Update docs to refer to pooling models (#11093)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
Cyrus Leung
2024-12-11 21:36:27 +08:00
committed by GitHub
parent 8f10d5e393
commit cad5c0a6ed
14 changed files with 26 additions and 21 deletions

View File

@@ -1085,7 +1085,7 @@ class AsyncLLMEngine(EngineClient):
trace_headers: Optional[Mapping[str, str]] = None,
priority: int = 0,
) -> AsyncGenerator[PoolingRequestOutput, None]:
"""Generate outputs for a request from an embedding model.
"""Generate outputs for a request from a pooling model.
Generate outputs for a request. This method is a coroutine. It adds the
request into the waiting queue of the LLMEngine and streams the outputs