grace-gpu-containers/vllm/vllm_managed_mem.py at 7c79fb4ee799db0c617b2196accdf6356da23960

Files

biondizzle 2757bffcb6 Add cudaMallocManaged allocator for GH200 EGM support

- managed_alloc.cu: PyTorch pluggable allocator using cudaMallocManaged
- vllm_managed_mem.py: Launcher that patches vLLM for managed memory
- Dockerfile: Build and install managed memory components

This enables vLLM to use cudaMallocManaged for transparent page-fault
access to both HBM (~96 GiB) and LPDDR (EGM, up to 480 GiB additional)
on GH200 systems with Extended GPU Memory enabled.

Experimental branch: v0.19.0-cmm

2026-04-07 21:19:39 +00:00

6.8 KiB

Raw Blame History

View Raw

6.8 KiB Raw Blame History

6.8 KiB

Raw Blame History