[Doc] Reorganize online pooling APIs (#11172)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
@@ -1,13 +1,13 @@
|
||||
# OpenAI Compatible Server
|
||||
|
||||
vLLM provides an HTTP server that implements OpenAI's [Completions](https://platform.openai.com/docs/api-reference/completions) and [Chat](https://platform.openai.com/docs/api-reference/chat) API.
|
||||
vLLM provides an HTTP server that implements OpenAI's [Completions](https://platform.openai.com/docs/api-reference/completions) and [Chat](https://platform.openai.com/docs/api-reference/chat) API, and more!
|
||||
|
||||
You can start the server using Python, or using [Docker](deploying_with_docker.rst):
|
||||
You can start the server via the [`vllm serve`](#vllm-serve) command, or through [Docker](deploying_with_docker.rst):
|
||||
```bash
|
||||
vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123
|
||||
```
|
||||
|
||||
To call the server, you can use the official OpenAI Python client library, or any other HTTP client.
|
||||
To call the server, you can use the [official OpenAI Python client](https://github.com/openai/openai-python), or any other HTTP client.
|
||||
```python
|
||||
from openai import OpenAI
|
||||
client = OpenAI(
|
||||
@@ -25,265 +25,32 @@ completion = client.chat.completions.create(
|
||||
print(completion.choices[0].message)
|
||||
```
|
||||
|
||||
## API Reference
|
||||
## Supported APIs
|
||||
|
||||
We currently support the following OpenAI APIs:
|
||||
|
||||
- [Completions API](https://platform.openai.com/docs/api-reference/completions)
|
||||
- [Completions API](#completions-api) (`/v1/completions`)
|
||||
- Only applicable to [text generation models](../models/generative_models.rst) (`--task generate`).
|
||||
- *Note: `suffix` parameter is not supported.*
|
||||
- [Chat Completions API](https://platform.openai.com/docs/api-reference/chat)
|
||||
- [Chat Completions API](#chat-api) (`/v1/chat/completions`)
|
||||
- Only applicable to [text generation models](../models/generative_models.rst) (`--task generate`) with a [chat template](#chat-template).
|
||||
- [Vision](https://platform.openai.com/docs/guides/vision)-related parameters are supported; see [Multimodal Inputs](../usage/multimodal_inputs.rst).
|
||||
- *Note: `image_url.detail` parameter is not supported.*
|
||||
- We also support `audio_url` content type for audio files.
|
||||
- Refer to [vllm.entrypoints.chat_utils](https://github.com/vllm-project/vllm/tree/main/vllm/entrypoints/chat_utils.py) for the exact schema.
|
||||
- *TODO: Support `input_audio` content type as defined [here](https://github.com/openai/openai-python/blob/v1.52.2/src/openai/types/chat/chat_completion_content_part_input_audio_param.py).*
|
||||
- *Note: `parallel_tool_calls` and `user` parameters are ignored.*
|
||||
- [Embeddings API](https://platform.openai.com/docs/api-reference/embeddings)
|
||||
- Instead of `inputs`, you can pass in a list of `messages` (same schema as Chat Completions API),
|
||||
which will be treated as a single prompt to the model according to its chat template.
|
||||
- This enables multi-modal inputs to be passed to embedding models, see [this page](../usage/multimodal_inputs.rst) for details.
|
||||
- *Note: You should run `vllm serve` with `--task embedding` to ensure that the model is being run in embedding mode.*
|
||||
- [Embeddings API](#embeddings-api) (`/v1/embeddings`)
|
||||
- Only applicable to [embedding models](../models/pooling_models.rst) (`--task embed`).
|
||||
|
||||
## Score API for Cross Encoder Models
|
||||
In addition, we have the following custom APIs:
|
||||
|
||||
vLLM supports *cross encoders models* at the **/v1/score** endpoint, which is not an OpenAI API standard endpoint. You can find the documentation for these kind of models at [sbert.net](https://www.sbert.net/docs/package_reference/cross_encoder/cross_encoder.html).
|
||||
|
||||
A ***Cross Encoder*** takes exactly two sentences / texts as input and either predicts a score or label for this sentence pair. It can for example predict the similarity of the sentence pair on a scale of 0 … 1.
|
||||
|
||||
### Example of usage for a pair of a string and a list of texts
|
||||
|
||||
In this case, the model will compare the first given text to each of the texts containing the list.
|
||||
|
||||
```bash
|
||||
curl -X 'POST' \
|
||||
'http://127.0.0.1:8000/v1/score' \
|
||||
-H 'accept: application/json' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"model": "BAAI/bge-reranker-v2-m3",
|
||||
"text_1": "What is the capital of France?",
|
||||
"text_2": [
|
||||
"The capital of Brazil is Brasilia.",
|
||||
"The capital of France is Paris."
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```bash
|
||||
{
|
||||
"id": "score-request-id",
|
||||
"object": "list",
|
||||
"created": 693570,
|
||||
"model": "BAAI/bge-reranker-v2-m3",
|
||||
"data": [
|
||||
{
|
||||
"index": 0,
|
||||
"object": "score",
|
||||
"score": [
|
||||
0.001094818115234375
|
||||
]
|
||||
},
|
||||
{
|
||||
"index": 1,
|
||||
"object": "score",
|
||||
"score": [
|
||||
1
|
||||
]
|
||||
}
|
||||
],
|
||||
"usage": {}
|
||||
}
|
||||
```
|
||||
|
||||
### Example of usage for a pair of two lists of texts
|
||||
|
||||
In this case, the model will compare the one by one, making pairs by same index correspondent in each list.
|
||||
|
||||
```bash
|
||||
curl -X 'POST' \
|
||||
'http://127.0.0.1:8000/v1/score' \
|
||||
-H 'accept: application/json' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"model": "BAAI/bge-reranker-v2-m3",
|
||||
"encoding_format": "float",
|
||||
"text_1": [
|
||||
"What is the capital of Brazil?",
|
||||
"What is the capital of France?"
|
||||
],
|
||||
"text_2": [
|
||||
"The capital of Brazil is Brasilia.",
|
||||
"The capital of France is Paris."
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```bash
|
||||
{
|
||||
"id": "score-request-id",
|
||||
"object": "list",
|
||||
"created": 693447,
|
||||
"model": "BAAI/bge-reranker-v2-m3",
|
||||
"data": [
|
||||
{
|
||||
"index": 0,
|
||||
"object": "score",
|
||||
"score": [
|
||||
1
|
||||
]
|
||||
},
|
||||
{
|
||||
"index": 1,
|
||||
"object": "score",
|
||||
"score": [
|
||||
1
|
||||
]
|
||||
}
|
||||
],
|
||||
"usage": {}
|
||||
}
|
||||
```
|
||||
|
||||
### Example of usage for a pair of two strings
|
||||
|
||||
In this case, the model will compare the strings of texts.
|
||||
|
||||
```bash
|
||||
curl -X 'POST' \
|
||||
'http://127.0.0.1:8000/v1/score' \
|
||||
-H 'accept: application/json' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"model": "BAAI/bge-reranker-v2-m3",
|
||||
"encoding_format": "float",
|
||||
"text_1": "What is the capital of France?",
|
||||
"text_2": "The capital of France is Paris."
|
||||
}'
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```bash
|
||||
{
|
||||
"id": "score-request-id",
|
||||
"object": "list",
|
||||
"created": 693447,
|
||||
"model": "BAAI/bge-reranker-v2-m3",
|
||||
"data": [
|
||||
{
|
||||
"index": 0,
|
||||
"object": "score",
|
||||
"score": [
|
||||
1
|
||||
]
|
||||
}
|
||||
],
|
||||
"usage": {}
|
||||
}
|
||||
```
|
||||
|
||||
## Extra Parameters
|
||||
|
||||
vLLM supports a set of parameters that are not part of the OpenAI API.
|
||||
In order to use them, you can pass them as extra parameters in the OpenAI client.
|
||||
Or directly merge them into the JSON payload if you are using HTTP call directly.
|
||||
|
||||
```python
|
||||
completion = client.chat.completions.create(
|
||||
model="NousResearch/Meta-Llama-3-8B-Instruct",
|
||||
messages=[
|
||||
{"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
|
||||
],
|
||||
extra_body={
|
||||
"guided_choice": ["positive", "negative"]
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### Extra HTTP Headers
|
||||
|
||||
Only `X-Request-Id` HTTP request header is supported for now.
|
||||
|
||||
```python
|
||||
completion = client.chat.completions.create(
|
||||
model="NousResearch/Meta-Llama-3-8B-Instruct",
|
||||
messages=[
|
||||
{"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
|
||||
],
|
||||
extra_headers={
|
||||
"x-request-id": "sentiment-classification-00001",
|
||||
}
|
||||
)
|
||||
print(completion._request_id)
|
||||
|
||||
completion = client.completions.create(
|
||||
model="NousResearch/Meta-Llama-3-8B-Instruct",
|
||||
prompt="A robot may not injure a human being",
|
||||
extra_headers={
|
||||
"x-request-id": "completion-test",
|
||||
}
|
||||
)
|
||||
print(completion._request_id)
|
||||
```
|
||||
|
||||
### Extra Parameters for Completions API
|
||||
|
||||
The following [sampling parameters (click through to see documentation)](../dev/sampling_params.rst) are supported.
|
||||
|
||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||
:language: python
|
||||
:start-after: begin-completion-sampling-params
|
||||
:end-before: end-completion-sampling-params
|
||||
```
|
||||
|
||||
The following extra parameters are supported:
|
||||
|
||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||
:language: python
|
||||
:start-after: begin-completion-extra-params
|
||||
:end-before: end-completion-extra-params
|
||||
```
|
||||
|
||||
### Extra Parameters for Chat Completions API
|
||||
|
||||
The following [sampling parameters (click through to see documentation)](../dev/sampling_params.rst) are supported.
|
||||
|
||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||
:language: python
|
||||
:start-after: begin-chat-completion-sampling-params
|
||||
:end-before: end-chat-completion-sampling-params
|
||||
```
|
||||
|
||||
The following extra parameters are supported:
|
||||
|
||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||
:language: python
|
||||
:start-after: begin-chat-completion-extra-params
|
||||
:end-before: end-chat-completion-extra-params
|
||||
```
|
||||
|
||||
### Extra Parameters for Embeddings API
|
||||
|
||||
The following [pooling parameters (click through to see documentation)](../dev/pooling_params.rst) are supported.
|
||||
|
||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||
:language: python
|
||||
:start-after: begin-embedding-pooling-params
|
||||
:end-before: end-embedding-pooling-params
|
||||
```
|
||||
|
||||
The following extra parameters are supported:
|
||||
|
||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||
:language: python
|
||||
:start-after: begin-embedding-extra-params
|
||||
:end-before: end-embedding-extra-params
|
||||
```
|
||||
- [Tokenizer API](#tokenizer-api) (`/tokenize`, `/detokenize`)
|
||||
- Applicable to any model with a tokenizer.
|
||||
- [Score API](#score-api) (`/score`)
|
||||
- Only applicable to [cross-encoder models](../models/pooling_models.rst) (`--task score`).
|
||||
|
||||
(chat-template)=
|
||||
## Chat Template
|
||||
|
||||
In order for the language model to support chat protocol, vLLM requires the model to include
|
||||
@@ -329,7 +96,56 @@ the detected format, which can be one of:
|
||||
If the result is not what you expect, you can set the `--chat-template-content-format` CLI argument
|
||||
to override which format to use.
|
||||
|
||||
## Command line arguments for the server
|
||||
## Extra Parameters
|
||||
|
||||
vLLM supports a set of parameters that are not part of the OpenAI API.
|
||||
In order to use them, you can pass them as extra parameters in the OpenAI client.
|
||||
Or directly merge them into the JSON payload if you are using HTTP call directly.
|
||||
|
||||
```python
|
||||
completion = client.chat.completions.create(
|
||||
model="NousResearch/Meta-Llama-3-8B-Instruct",
|
||||
messages=[
|
||||
{"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
|
||||
],
|
||||
extra_body={
|
||||
"guided_choice": ["positive", "negative"]
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
## Extra HTTP Headers
|
||||
|
||||
Only `X-Request-Id` HTTP request header is supported for now.
|
||||
|
||||
```python
|
||||
completion = client.chat.completions.create(
|
||||
model="NousResearch/Meta-Llama-3-8B-Instruct",
|
||||
messages=[
|
||||
{"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
|
||||
],
|
||||
extra_headers={
|
||||
"x-request-id": "sentiment-classification-00001",
|
||||
}
|
||||
)
|
||||
print(completion._request_id)
|
||||
|
||||
completion = client.completions.create(
|
||||
model="NousResearch/Meta-Llama-3-8B-Instruct",
|
||||
prompt="A robot may not injure a human being",
|
||||
extra_headers={
|
||||
"x-request-id": "completion-test",
|
||||
}
|
||||
)
|
||||
print(completion._request_id)
|
||||
```
|
||||
|
||||
## CLI Reference
|
||||
|
||||
(vllm-serve)=
|
||||
### `vllm serve`
|
||||
|
||||
The `vllm serve` command is used to launch the OpenAI-compatible server.
|
||||
|
||||
```{argparse}
|
||||
:module: vllm.entrypoints.openai.cli_args
|
||||
@@ -337,12 +153,10 @@ to override which format to use.
|
||||
:prog: vllm serve
|
||||
```
|
||||
|
||||
#### Configuration file
|
||||
|
||||
### Config file
|
||||
|
||||
The `serve` module can also accept arguments from a config file in
|
||||
`yaml` format. The arguments in the yaml must be specified using the
|
||||
long form of the argument outlined [here](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#command-line-arguments-for-the-server):
|
||||
You can load CLI arguments via a [YAML](https://yaml.org/) config file.
|
||||
The argument names must be the long form of those outlined [above](#vllm-serve).
|
||||
|
||||
For example:
|
||||
|
||||
@@ -354,10 +168,268 @@ port: 6379
|
||||
uvicorn-log-level: "info"
|
||||
```
|
||||
|
||||
To use the above config file:
|
||||
|
||||
```bash
|
||||
$ vllm serve SOME_MODEL --config config.yaml
|
||||
```
|
||||
---
|
||||
**NOTE**
|
||||
In case an argument is supplied simultaneously using command line and the config file, the value from the commandline will take precedence.
|
||||
|
||||
```{note}
|
||||
In case an argument is supplied simultaneously using command line and the config file, the value from the command line will take precedence.
|
||||
The order of priorities is `command line > config file values > defaults`.
|
||||
```
|
||||
|
||||
## API Reference
|
||||
|
||||
(completions-api)=
|
||||
### Completions API
|
||||
|
||||
Refer to [OpenAI's API reference](https://platform.openai.com/docs/api-reference/completions) for more details.
|
||||
|
||||
#### Extra parameters
|
||||
|
||||
The following [sampling parameters (click through to see documentation)](../dev/sampling_params.rst) are supported.
|
||||
|
||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||
:language: python
|
||||
:start-after: begin-completion-sampling-params
|
||||
:end-before: end-completion-sampling-params
|
||||
```
|
||||
|
||||
The following extra parameters are supported:
|
||||
|
||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||
:language: python
|
||||
:start-after: begin-completion-extra-params
|
||||
:end-before: end-completion-extra-params
|
||||
```
|
||||
|
||||
(chat-api)=
|
||||
### Chat Completions API
|
||||
|
||||
Refer to [OpenAI's API reference](https://platform.openai.com/docs/api-reference/chat) for more details.
|
||||
|
||||
#### Extra parameters
|
||||
|
||||
The following [sampling parameters (click through to see documentation)](../dev/sampling_params.rst) are supported.
|
||||
|
||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||
:language: python
|
||||
:start-after: begin-chat-completion-sampling-params
|
||||
:end-before: end-chat-completion-sampling-params
|
||||
```
|
||||
|
||||
The following extra parameters are supported:
|
||||
|
||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||
:language: python
|
||||
:start-after: begin-chat-completion-extra-params
|
||||
:end-before: end-chat-completion-extra-params
|
||||
```
|
||||
|
||||
(embeddings-api)=
|
||||
### Embeddings API
|
||||
|
||||
Refer to [OpenAI's API reference](https://platform.openai.com/docs/api-reference/embeddings) for more details.
|
||||
|
||||
If the model has a [chat template](#chat-template), you can replace `inputs` with a list of `messages` (same schema as [Chat Completions API](#chat-api))
|
||||
which will be treated as a single prompt to the model.
|
||||
|
||||
```{tip}
|
||||
This enables multi-modal inputs to be passed to embedding models, see [this page](../usage/multimodal_inputs.rst) for details.
|
||||
```
|
||||
|
||||
#### Extra parameters
|
||||
|
||||
The following [pooling parameters (click through to see documentation)](../dev/pooling_params.rst) are supported.
|
||||
|
||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||
:language: python
|
||||
:start-after: begin-embedding-pooling-params
|
||||
:end-before: end-embedding-pooling-params
|
||||
```
|
||||
|
||||
The following extra parameters are supported by default:
|
||||
|
||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||
:language: python
|
||||
:start-after: begin-embedding-extra-params
|
||||
:end-before: end-embedding-extra-params
|
||||
```
|
||||
|
||||
For chat-like input (i.e. if `messages` is passed), these extra parameters are supported instead:
|
||||
|
||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||
:language: python
|
||||
:start-after: begin-chat-embedding-extra-params
|
||||
:end-before: end-chat-embedding-extra-params
|
||||
```
|
||||
|
||||
(tokenizer-api)=
|
||||
### Tokenizer API
|
||||
|
||||
The Tokenizer API is a simple wrapper over [HuggingFace-style tokenizers](https://huggingface.co/docs/transformers/en/main_classes/tokenizer).
|
||||
It consists of two endpoints:
|
||||
|
||||
- `/tokenize` corresponds to calling `tokenizer.encode()`.
|
||||
- `/detokenize` corresponds to calling `tokenizer.decode()`.
|
||||
|
||||
(score-api)=
|
||||
### Score API
|
||||
|
||||
The Score API applies a cross-encoder model to predict scores for sentence pairs.
|
||||
Usually, the score for a sentence pair refers to the similarity between two sentences, on a scale of 0 to 1.
|
||||
|
||||
You can find the documentation for these kind of models at [sbert.net](https://www.sbert.net/docs/package_reference/cross_encoder/cross_encoder.html).
|
||||
|
||||
#### Single inference
|
||||
|
||||
You can pass a string to both `text_1` and `text_2`, forming a single sentence pair.
|
||||
|
||||
Request:
|
||||
|
||||
```bash
|
||||
curl -X 'POST' \
|
||||
'http://127.0.0.1:8000/score' \
|
||||
-H 'accept: application/json' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"model": "BAAI/bge-reranker-v2-m3",
|
||||
"encoding_format": "float",
|
||||
"text_1": "What is the capital of France?",
|
||||
"text_2": "The capital of France is Paris."
|
||||
}'
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```bash
|
||||
{
|
||||
"id": "score-request-id",
|
||||
"object": "list",
|
||||
"created": 693447,
|
||||
"model": "BAAI/bge-reranker-v2-m3",
|
||||
"data": [
|
||||
{
|
||||
"index": 0,
|
||||
"object": "score",
|
||||
"score": 1
|
||||
}
|
||||
],
|
||||
"usage": {}
|
||||
}
|
||||
```
|
||||
|
||||
#### Batch inference
|
||||
|
||||
You can pass a string to `text_1` and a list to `text_2`, forming multiple sentence pairs
|
||||
where each pair is built from `text_1` and a string in `text_2`.
|
||||
The total number of pairs is `len(text_2)`.
|
||||
|
||||
Request:
|
||||
|
||||
```bash
|
||||
curl -X 'POST' \
|
||||
'http://127.0.0.1:8000/score' \
|
||||
-H 'accept: application/json' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"model": "BAAI/bge-reranker-v2-m3",
|
||||
"text_1": "What is the capital of France?",
|
||||
"text_2": [
|
||||
"The capital of Brazil is Brasilia.",
|
||||
"The capital of France is Paris."
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```bash
|
||||
{
|
||||
"id": "score-request-id",
|
||||
"object": "list",
|
||||
"created": 693570,
|
||||
"model": "BAAI/bge-reranker-v2-m3",
|
||||
"data": [
|
||||
{
|
||||
"index": 0,
|
||||
"object": "score",
|
||||
"score": 0.001094818115234375
|
||||
},
|
||||
{
|
||||
"index": 1,
|
||||
"object": "score",
|
||||
"score": 1
|
||||
}
|
||||
],
|
||||
"usage": {}
|
||||
}
|
||||
```
|
||||
|
||||
You can pass a list to both `text_1` and `text_2`, forming multiple sentence pairs
|
||||
where each pair is built from a string in `text_1` and the corresponding string in `text_2` (similar to `zip()`).
|
||||
The total number of pairs is `len(text_2)`.
|
||||
|
||||
Request:
|
||||
|
||||
```bash
|
||||
curl -X 'POST' \
|
||||
'http://127.0.0.1:8000/score' \
|
||||
-H 'accept: application/json' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"model": "BAAI/bge-reranker-v2-m3",
|
||||
"encoding_format": "float",
|
||||
"text_1": [
|
||||
"What is the capital of Brazil?",
|
||||
"What is the capital of France?"
|
||||
],
|
||||
"text_2": [
|
||||
"The capital of Brazil is Brasilia.",
|
||||
"The capital of France is Paris."
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```bash
|
||||
{
|
||||
"id": "score-request-id",
|
||||
"object": "list",
|
||||
"created": 693447,
|
||||
"model": "BAAI/bge-reranker-v2-m3",
|
||||
"data": [
|
||||
{
|
||||
"index": 0,
|
||||
"object": "score",
|
||||
"score": 1
|
||||
},
|
||||
{
|
||||
"index": 1,
|
||||
"object": "score",
|
||||
"score": 1
|
||||
}
|
||||
],
|
||||
"usage": {}
|
||||
}
|
||||
```
|
||||
|
||||
#### Extra parameters
|
||||
|
||||
The following [pooling parameters (click through to see documentation)](../dev/pooling_params.rst) are supported.
|
||||
|
||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||
:language: python
|
||||
:start-after: begin-score-pooling-params
|
||||
:end-before: end-score-pooling-params
|
||||
```
|
||||
|
||||
The following extra parameters are supported:
|
||||
|
||||
```{literalinclude} ../../../vllm/entrypoints/openai/protocol.py
|
||||
:language: python
|
||||
:start-after: begin-score-extra-params
|
||||
:end-before: end-score-extra-params
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user