From 4dd42db566097cc2cacb2dddff3a8f3b0c007be0 Mon Sep 17 00:00:00 2001 From: Tyler Michael Smith Date: Mon, 24 Nov 2025 17:16:05 -0500 Subject: [PATCH] Remove VLLM_SKIP_WARMUP tip (#29331) Signed-off-by: Tyler Michael Smith --- docs/features/quantization/inc.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/docs/features/quantization/inc.md b/docs/features/quantization/inc.md index 5e86e9388..9875bc44c 100644 --- a/docs/features/quantization/inc.md +++ b/docs/features/quantization/inc.md @@ -22,9 +22,6 @@ export QUANT_CONFIG=/path/to/quant/config/inc/meta-llama-3.1-405b-instruct/maxab vllm serve meta-llama/Llama-3.1-405B-Instruct --quantization inc --kv-cache-dtype fp8_inc --tensor_paralel_size 8 ``` -!!! tip - If you are just prototyping or testing your model with FP8, you can use the `VLLM_SKIP_WARMUP=true` environment variable to disable the warmup stage, which can take a long time. However, we do not recommend disabling this feature in production environments as it causes a significant performance drop. - !!! tip When using FP8 models, you may experience timeouts caused by the long compilation time of FP8 operations. To mitigate this problem, you can use the below environment variables: `VLLM_ENGINE_ITERATION_TIMEOUT_S` - to adjust the vLLM server timeout. You can set the value in seconds, e.g., 600 equals 10 minutes.