diff --git a/docs/configuration/optimization.md b/docs/configuration/optimization.md
index 556d9f8b9..80b12ae33 100644
--- a/docs/configuration/optimization.md
+++ b/docs/configuration/optimization.md
@@ -47,6 +47,10 @@ You can tune the performance by adjusting `max_num_batched_tokens`:
 - For optimal throughput, we recommend setting `max_num_batched_tokens > 8192` especially for smaller models on large GPUs.
 - If `max_num_batched_tokens` is the same as `max_model_len`, that's almost the equivalent to the V0 default scheduling policy (except that it still prioritizes decodes).
 
+!!! warning
+    When chunked prefill is disabled, `max_num_batched_tokens` must be greater than `max_model_len`.  
+    In that case, if `max_num_batched_tokens < max_model_len`, vLLM may crash at server start‑up.
+
 ```python
 from vllm import LLM