grace-gpu-containers

biondizzle/grace-gpu-containers

Fork 0

Commit Graph

Author	SHA1	Message	Date
biondizzle	7c79fb4ee7	fix: Update cudaMemAdvise for CUDA 13 API CUDA 13 changed cudaMemAdvise to take cudaMemLocation struct instead of int. Updated to use cudaMemLocation with type=cudaMemLocationTypeDevice.	2026-04-07 21:32:17 +00:00
biondizzle	2757bffcb6	Add cudaMallocManaged allocator for GH200 EGM support - managed_alloc.cu: PyTorch pluggable allocator using cudaMallocManaged - vllm_managed_mem.py: Launcher that patches vLLM for managed memory - Dockerfile: Build and install managed memory components This enables vLLM to use cudaMallocManaged for transparent page-fault access to both HBM (~96 GiB) and LPDDR (EGM, up to 480 GiB additional) on GH200 systems with Extended GPU Memory enabled. Experimental branch: v0.19.0-cmm	2026-04-07 21:19:39 +00:00

Author

SHA1

Message

Date

biondizzle

7c79fb4ee7

fix: Update cudaMemAdvise for CUDA 13 API

CUDA 13 changed cudaMemAdvise to take cudaMemLocation struct instead of int.
Updated to use cudaMemLocation with type=cudaMemLocationTypeDevice.

2026-04-07 21:32:17 +00:00

biondizzle

2757bffcb6

Add cudaMallocManaged allocator for GH200 EGM support

- managed_alloc.cu: PyTorch pluggable allocator using cudaMallocManaged
- vllm_managed_mem.py: Launcher that patches vLLM for managed memory
- Dockerfile: Build and install managed memory components

This enables vLLM to use cudaMallocManaged for transparent page-fault
access to both HBM (~96 GiB) and LPDDR (EGM, up to 480 GiB additional)
on GH200 systems with Extended GPU Memory enabled.

Experimental branch: v0.19.0-cmm

2026-04-07 21:19:39 +00:00

2 Commits