[Docs] Update CacheConfig block_size docstring to remove inaccurate limit when using CUDA (#35632)

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
This commit is contained in:
Seiji Eicher
2026-03-04 22:24:08 -08:00
committed by GitHub
parent c3598d02fa
commit e2b31243c0

View File

@@ -40,8 +40,7 @@ class CacheConfig:
"""Configuration for the KV cache.""" """Configuration for the KV cache."""
block_size: SkipValidation[BlockSize] = None # type: ignore[assignment] block_size: SkipValidation[BlockSize] = None # type: ignore[assignment]
"""Size of a contiguous cache block in number of tokens. On CUDA devices, """Size of a contiguous cache block in number of tokens.
only block sizes up to 32 are supported.
This config has no static default. If left unspecified by the user, it will This config has no static default. If left unspecified by the user, it will
be set in `Platform.check_and_update_config()` based on the current be set in `Platform.check_and_update_config()` based on the current