[Doc] Update docs on handling OOM (#15357)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
This commit is contained in:
Cyrus Leung
2025-03-25 05:29:34 +08:00
committed by GitHub
parent 3eb08ed9b1
commit 6dd55af6c9
6 changed files with 24 additions and 9 deletions

View File

@@ -97,6 +97,13 @@ llm = LLM(model="adept/fuyu-8b",
max_num_seqs=2)
```
#### Adjust cache size
If you run out of CPU RAM, try the following options:
- (Multi-modal models only) you can set the size of multi-modal input cache using `VLLM_MM_INPUT_CACHE_GIB` environment variable (default 4 GiB).
- (CPU backend only) you can set the size of KV cache using `VLLM_CPU_KVCACHE_SPACE` environment variable (default 4 GiB).
### Performance optimization and tuning
You can potentially improve the performance of vLLM by finetuning various options.