[Core][AMD] Migrate fully transparent sleep mode to ROCm platform (#12695)
Signed-off-by: Hollow Man <hollowman@opensuse.org> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>
This commit is contained in:
@@ -11,7 +11,7 @@ Key benefits:
|
||||
- **Fine-grained control**: Optionally wake up only model weights or KV cache to avoid OOM during weight updates.
|
||||
|
||||
!!! note
|
||||
This feature is only supported on CUDA platform.
|
||||
This feature is now supported on CUDA and ROCm platform.
|
||||
|
||||
!!! note
|
||||
For more information, see this [Blog Post](https://blog.vllm.ai/2025/10/26/sleep-mode.html).
|
||||
@@ -116,3 +116,7 @@ curl -X POST 'http://localhost:8000/wake_up?tags=kv_cache'
|
||||
|
||||
!!! note
|
||||
These endpoints are only available when passing `VLLM_SERVER_DEV_MODE=1`.
|
||||
|
||||
## Limitation
|
||||
|
||||
On ROCm, the virtual memory allocation on ROCm is done through chunked memory allocation. You can control the chunk size through `VLLM_ROCM_SLEEP_MEM_CHUNK_SIZE` (in MB). The default value is set at 256MB. The larger the chunk size the faster the performance. However, setting it too large will cause OOM. So if you encounter OOM when using sleep mode. Try reducing the chunk size. It is recommended to define the chunk size as a power of 2.
|
||||
|
||||
Reference in New Issue
Block a user