Allow users to specify kv cache memory size (#21489)

Signed-off-by: Boyuan Feng <boyuan@meta.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
Boyuan Feng
2025-09-11 06:41:07 -07:00
committed by GitHub
parent fd1ce98cdd
commit 94e6b2d55f
10 changed files with 236 additions and 47 deletions

View File

@@ -278,7 +278,8 @@ class LLMEngine:
self.cache_config.block_size,
"gpu_memory_utilization":
self.cache_config.gpu_memory_utilization,
"kv_cache_memory_bytes":
self.cache_config.kv_cache_memory_bytes,
# Quantization
"quantization":
self.model_config.quantization,