[Benchmark] Enable MM Embedding benchmarks (#26310)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
@@ -67,13 +67,13 @@ Legend:
|
||||
<details class="admonition abstract" markdown="1">
|
||||
<summary>Show more</summary>
|
||||
|
||||
First start serving your model
|
||||
First start serving your model:
|
||||
|
||||
```bash
|
||||
vllm serve NousResearch/Hermes-3-Llama-3.1-8B
|
||||
```
|
||||
|
||||
Then run the benchmarking script
|
||||
Then run the benchmarking script:
|
||||
|
||||
```bash
|
||||
# download dataset
|
||||
@@ -87,7 +87,7 @@ vllm bench serve \
|
||||
--num-prompts 10
|
||||
```
|
||||
|
||||
If successful, you will see the following output
|
||||
If successful, you will see the following output:
|
||||
|
||||
```text
|
||||
============ Serving Benchmark Result ============
|
||||
@@ -125,7 +125,7 @@ If the dataset you want to benchmark is not supported yet in vLLM, even then you
|
||||
|
||||
```bash
|
||||
# start server
|
||||
VLLM_USE_V1=1 vllm serve meta-llama/Llama-3.1-8B-Instruct
|
||||
vllm serve meta-llama/Llama-3.1-8B-Instruct
|
||||
```
|
||||
|
||||
```bash
|
||||
@@ -167,7 +167,7 @@ vllm bench serve \
|
||||
##### InstructCoder Benchmark with Speculative Decoding
|
||||
|
||||
``` bash
|
||||
VLLM_USE_V1=1 vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
|
||||
vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
|
||||
--speculative-config $'{"method": "ngram",
|
||||
"num_speculative_tokens": 5, "prompt_lookup_max": 5,
|
||||
"prompt_lookup_min": 2}'
|
||||
@@ -184,7 +184,7 @@ vllm bench serve \
|
||||
##### Spec Bench Benchmark with Speculative Decoding
|
||||
|
||||
``` bash
|
||||
VLLM_USE_V1=1 vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
|
||||
vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
|
||||
--speculative-config $'{"method": "ngram",
|
||||
"num_speculative_tokens": 5, "prompt_lookup_max": 5,
|
||||
"prompt_lookup_min": 2}'
|
||||
@@ -366,7 +366,6 @@ Total num output tokens: 1280
|
||||
|
||||
``` bash
|
||||
VLLM_WORKER_MULTIPROC_METHOD=spawn \
|
||||
VLLM_USE_V1=1 \
|
||||
vllm bench throughput \
|
||||
--dataset-name=hf \
|
||||
--dataset-path=likaixin/InstructCoder \
|
||||
@@ -781,6 +780,104 @@ This should be seen as an edge case, and if this behavior can be avoided by sett
|
||||
|
||||
</details>
|
||||
|
||||
#### Embedding Benchmark
|
||||
|
||||
Benchmark the performance of embedding requests in vLLM.
|
||||
|
||||
<details class="admonition abstract" markdown="1">
|
||||
<summary>Show more</summary>
|
||||
|
||||
##### Text Embeddings
|
||||
|
||||
Unlike generative models which use Completions API or Chat Completions API,
|
||||
you should set `--backend openai-embeddings` and `--endpoint /v1/embeddings` to use the Embeddings API.
|
||||
|
||||
You can use any text dataset to benchmark the model, such as ShareGPT.
|
||||
|
||||
Start the server:
|
||||
|
||||
```bash
|
||||
vllm serve jinaai/jina-embeddings-v3 --trust-remote-code
|
||||
```
|
||||
|
||||
Run the benchmark:
|
||||
|
||||
```bash
|
||||
# download dataset
|
||||
# wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
|
||||
vllm bench serve \
|
||||
--model jinaai/jina-embeddings-v3 \
|
||||
--backend openai-embeddings \
|
||||
--endpoint /v1/embeddings \
|
||||
--dataset-name sharegpt \
|
||||
--dataset-path <your data path>/ShareGPT_V3_unfiltered_cleaned_split.json
|
||||
```
|
||||
|
||||
##### Multi-modal Embeddings
|
||||
|
||||
Unlike generative models which use Completions API or Chat Completions API,
|
||||
you should set `--endpoint /v1/embeddings` to use the Embeddings API. The backend to use depends on the model:
|
||||
|
||||
- CLIP: `--backend openai-embeddings-clip`
|
||||
- VLM2Vec: `--backend openai-embeddings-vlm2vec`
|
||||
|
||||
For other models, please add your own implementation inside <gh-file:vllm/benchmarks/lib/endpoint_request_func.py> to match the expected instruction format.
|
||||
|
||||
You can use any text or multi-modal dataset to benchmark the model, as long as the model supports it.
|
||||
For example, you can use ShareGPT and VisionArena to benchmark vision-language embeddings.
|
||||
|
||||
Serve and benchmark CLIP:
|
||||
|
||||
```bash
|
||||
# Run this in another process
|
||||
vllm serve openai/clip-vit-base-patch32
|
||||
|
||||
# Run these one by one after the server is up
|
||||
# download dataset
|
||||
# wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
|
||||
vllm bench serve \
|
||||
--model openai/clip-vit-base-patch32 \
|
||||
--backend openai-embeddings-clip \
|
||||
--endpoint /v1/embeddings \
|
||||
--dataset-name sharegpt \
|
||||
--dataset-path <your data path>/ShareGPT_V3_unfiltered_cleaned_split.json
|
||||
|
||||
vllm bench serve \
|
||||
--model openai/clip-vit-base-patch32 \
|
||||
--backend openai-embeddings-clip \
|
||||
--endpoint /v1/embeddings \
|
||||
--dataset-name hf \
|
||||
--dataset-path lmarena-ai/VisionArena-Chat
|
||||
```
|
||||
|
||||
Serve and benchmark VLM2Vec:
|
||||
|
||||
```bash
|
||||
# Run this in another process
|
||||
vllm serve TIGER-Lab/VLM2Vec-Full --runner pooling \
|
||||
--trust-remote-code \
|
||||
--chat-template examples/template_vlm2vec_phi3v.jinja
|
||||
|
||||
# Run these one by one after the server is up
|
||||
# download dataset
|
||||
# wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
|
||||
vllm bench serve \
|
||||
--model TIGER-Lab/VLM2Vec-Full \
|
||||
--backend openai-embeddings-vlm2vec \
|
||||
--endpoint /v1/embeddings \
|
||||
--dataset-name sharegpt \
|
||||
--dataset-path <your data path>/ShareGPT_V3_unfiltered_cleaned_split.json
|
||||
|
||||
vllm bench serve \
|
||||
--model TIGER-Lab/VLM2Vec-Full \
|
||||
--backend openai-embeddings-vlm2vec \
|
||||
--endpoint /v1/embeddings \
|
||||
--dataset-name hf \
|
||||
--dataset-path lmarena-ai/VisionArena-Chat
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
[](){ #performance-benchmarks }
|
||||
|
||||
## Performance Benchmarks
|
||||
|
||||
Reference in New Issue
Block a user