[Docs] Update CacheConfig block_size docstring to remove inaccurate limit when using CUDA (#35632)

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2026-03-04 22:24:08 -08:00
parent c3598d02fa
commit e2b31243c0
1 changed files with 1 additions and 2 deletions
--- a/vllm/config/cache.py
+++ b/vllm/config/cache.py
@@ -40,8 +40,7 @@ class CacheConfig:
    """Configuration for the KV cache."""
    block_size: SkipValidation[BlockSize] = None  # type: ignore[assignment]
-    """Size of a contiguous cache block in number of tokens. On CUDA devices,
+    """Size of a contiguous cache block in number of tokens.
    only block sizes up to 32 are supported.
    This config has no static default. If left unspecified by the user, it will
    be set in `Platform.check_and_update_config()` based on the current