Optimize KV cache distribution for asymmetric pipeline parallelism (#25164)
Signed-off-by: gholmes829 <g.holmes429@gmail.com>
This commit is contained in:
committed by
GitHub
parent
7e4cd070b0
commit
d100d78eb3
@@ -143,7 +143,7 @@ class LLM:
|
||||
size based on gpu_memory_utilization. However, users may want to
|
||||
manually specify the kv cache memory size. kv_cache_memory_bytes
|
||||
allows more fine-grain control of how much memory gets used when
|
||||
compared with using gpu_memory_memory_utilization. Note that
|
||||
compared with using gpu_memory_utilization. Note that
|
||||
kv_cache_memory_bytes (when not-None) ignores
|
||||
gpu_memory_utilization
|
||||
swap_space: The size (GiB) of CPU memory per GPU to use as swap space.
|
||||
|
||||
Reference in New Issue
Block a user