Allow users to specify kv cache memory size (#21489)
Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
@@ -355,7 +355,8 @@ def report_usage_stats(
|
||||
vllm_config.cache_config.block_size,
|
||||
"gpu_memory_utilization":
|
||||
vllm_config.cache_config.gpu_memory_utilization,
|
||||
|
||||
"kv_cache_memory_bytes":
|
||||
vllm_config.cache_config.kv_cache_memory_bytes,
|
||||
# Quantization
|
||||
"quantization":
|
||||
vllm_config.model_config.quantization,
|
||||
|
||||
Reference in New Issue
Block a user