[Doc][2/N] Reorganize Models and Usage sections (#11755)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
25
docs/source/serving/engine_args.md
Normal file
25
docs/source/serving/engine_args.md
Normal file
@@ -0,0 +1,25 @@
|
||||
(engine-args)=
|
||||
|
||||
# Engine Arguments
|
||||
|
||||
Below, you can find an explanation of every engine argument for vLLM:
|
||||
|
||||
```{eval-rst}
|
||||
.. argparse::
|
||||
:module: vllm.engine.arg_utils
|
||||
:func: _engine_args_parser
|
||||
:prog: vllm serve
|
||||
:nodefaultconst:
|
||||
```
|
||||
|
||||
## Async Engine Arguments
|
||||
|
||||
Below are the additional arguments related to the asynchronous engine:
|
||||
|
||||
```{eval-rst}
|
||||
.. argparse::
|
||||
:module: vllm.engine.arg_utils
|
||||
:func: _async_engine_args_parser
|
||||
:prog: vllm serve
|
||||
:nodefaultconst:
|
||||
```
|
||||
15
docs/source/serving/env_vars.md
Normal file
15
docs/source/serving/env_vars.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# Environment Variables
|
||||
|
||||
vLLM uses the following environment variables to configure the system:
|
||||
|
||||
```{warning}
|
||||
Please note that `VLLM_PORT` and `VLLM_HOST_IP` set the port and ip for vLLM's **internal usage**. It is not the port and ip for the API server. If you use `--host $VLLM_HOST_IP` and `--port $VLLM_PORT` to start the API server, it will not work.
|
||||
|
||||
All environment variables used by vLLM are prefixed with `VLLM_`. **Special care should be taken for Kubernetes users**: please do not name the service as `vllm`, otherwise environment variables set by Kubernetes might conflict with vLLM's environment variables, because [Kubernetes sets environment variables for each service with the capitalized service name as the prefix](https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables).
|
||||
```
|
||||
|
||||
```{literalinclude} ../../../vllm/envs.py
|
||||
:end-before: end-env-vars-definition
|
||||
:language: python
|
||||
:start-after: begin-env-vars-definition
|
||||
```
|
||||
@@ -217,7 +217,7 @@ you can use the [official OpenAI Python client](https://github.com/openai/openai
|
||||
|
||||
We support both [Vision](https://platform.openai.com/docs/guides/vision)- and
|
||||
[Audio](https://platform.openai.com/docs/guides/audio?audio-generation-quickstart-example=audio-in)-related parameters;
|
||||
see our [Multimodal Inputs](../usage/multimodal_inputs.md) guide for more information.
|
||||
see our [Multimodal Inputs](#multimodal-inputs) guide for more information.
|
||||
- *Note: `image_url.detail` parameter is not supported.*
|
||||
|
||||
Code example: <gh-file:examples/openai_chat_completion_client.py>
|
||||
|
||||
57
docs/source/serving/usage_stats.md
Normal file
57
docs/source/serving/usage_stats.md
Normal file
@@ -0,0 +1,57 @@
|
||||
# Usage Stats Collection
|
||||
|
||||
vLLM collects anonymous usage data by default to help the engineering team better understand which hardware and model configurations are widely used. This data allows them to prioritize their efforts on the most common workloads. The collected data is transparent, does not contain any sensitive information, and will be publicly released for the community's benefit.
|
||||
|
||||
## What data is collected?
|
||||
|
||||
The list of data collected by the latest version of vLLM can be found here: <gh-file:vllm/usage/usage_lib.py>
|
||||
|
||||
Here is an example as of v0.4.0:
|
||||
|
||||
```json
|
||||
{
|
||||
"uuid": "fbe880e9-084d-4cab-a395-8984c50f1109",
|
||||
"provider": "GCP",
|
||||
"num_cpu": 24,
|
||||
"cpu_type": "Intel(R) Xeon(R) CPU @ 2.20GHz",
|
||||
"cpu_family_model_stepping": "6,85,7",
|
||||
"total_memory": 101261135872,
|
||||
"architecture": "x86_64",
|
||||
"platform": "Linux-5.10.0-28-cloud-amd64-x86_64-with-glibc2.31",
|
||||
"gpu_count": 2,
|
||||
"gpu_type": "NVIDIA L4",
|
||||
"gpu_memory_per_device": 23580639232,
|
||||
"model_architecture": "OPTForCausalLM",
|
||||
"vllm_version": "0.3.2+cu123",
|
||||
"context": "LLM_CLASS",
|
||||
"log_time": 1711663373492490000,
|
||||
"source": "production",
|
||||
"dtype": "torch.float16",
|
||||
"tensor_parallel_size": 1,
|
||||
"block_size": 16,
|
||||
"gpu_memory_utilization": 0.9,
|
||||
"quantization": null,
|
||||
"kv_cache_dtype": "auto",
|
||||
"enable_lora": false,
|
||||
"enable_prefix_caching": false,
|
||||
"enforce_eager": false,
|
||||
"disable_custom_all_reduce": true
|
||||
}
|
||||
```
|
||||
|
||||
You can preview the collected data by running the following command:
|
||||
|
||||
```bash
|
||||
tail ~/.config/vllm/usage_stats.json
|
||||
```
|
||||
|
||||
## Opt-out of Usage Stats Collection
|
||||
|
||||
You can opt-out of usage stats collection by setting the `VLLM_NO_USAGE_STATS` or `DO_NOT_TRACK` environment variable, or by creating a `~/.config/vllm/do_not_track` file:
|
||||
|
||||
```bash
|
||||
# Any of the following methods can disable usage stats collection
|
||||
export VLLM_NO_USAGE_STATS=1
|
||||
export DO_NOT_TRACK=1
|
||||
mkdir -p ~/.config/vllm && touch ~/.config/vllm/do_not_track
|
||||
```
|
||||
Reference in New Issue
Block a user