[Doc] Add allocate_slots parameter docs (#29777)
Signed-off-by: maang <maang_h@163.com> Signed-off-by: maang-h <55082429+maang-h@users.noreply.github.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>
This commit is contained in:
@@ -230,6 +230,9 @@ class KVCacheManager:
|
|||||||
delay_cache_blocks: Whether to skip caching the blocks. This is
|
delay_cache_blocks: Whether to skip caching the blocks. This is
|
||||||
used by P/D when allocating blocks used in a KV transfer
|
used by P/D when allocating blocks used in a KV transfer
|
||||||
which will complete in a future step.
|
which will complete in a future step.
|
||||||
|
num_encoder_tokens: The number of encoder tokens to allocate for
|
||||||
|
cross-attention in encoder-decoder models(e.g., Whisper).
|
||||||
|
For decoder-only models, this should be 0.
|
||||||
|
|
||||||
Blocks layout:
|
Blocks layout:
|
||||||
```
|
```
|
||||||
|
|||||||
Reference in New Issue
Block a user