Commit Graph

2 Commits

Author SHA1 Message Date
7c79fb4ee7 fix: Update cudaMemAdvise for CUDA 13 API
CUDA 13 changed cudaMemAdvise to take cudaMemLocation struct instead of int.
Updated to use cudaMemLocation with type=cudaMemLocationTypeDevice.
2026-04-07 21:32:17 +00:00
2757bffcb6 Add cudaMallocManaged allocator for GH200 EGM support
- managed_alloc.cu: PyTorch pluggable allocator using cudaMallocManaged
- vllm_managed_mem.py: Launcher that patches vLLM for managed memory
- Dockerfile: Build and install managed memory components

This enables vLLM to use cudaMallocManaged for transparent page-fault
access to both HBM (~96 GiB) and LPDDR (EGM, up to 480 GiB additional)
on GH200 systems with Extended GPU Memory enabled.

Experimental branch: v0.19.0-cmm
2026-04-07 21:19:39 +00:00