Files

biondizzle bcc872c2c3 Remove global allocator swap, use targeted KV cache managed allocation

sitecustomize.py: No longer swaps CUDAPluggableAllocator globally.
Sets VLLM_KV_CACHE_USE_MANAGED_MEMORY=1 instead.
vllm_managed_mem.py: No global allocator swap, no torch.cuda patches.

2026-04-11 02:15:09 +00:00

Dockerfile

Fix Dockerfile: separate git clone and build RUN commands

2026-04-10 15:32:16 +00:00

managed_alloc.cu

Sync managed_alloc.cu: selective prefetch (<2 GiB to GPU)

2026-04-10 18:37:11 +00:00

README.md

Updated to v0.11.1rc3

2025-10-23 18:11:41 +00:00

sitecustomize.py

Remove global allocator swap, use targeted KV cache managed allocation

2026-04-11 02:15:09 +00:00

vllm_managed_mem.py

Remove global allocator swap, use targeted KV cache managed allocation

2026-04-11 02:15:09 +00:00

README.md

VLLM images for GH200

Hosted here

 docker login
# Alternative
# docker buildx build --platform linux/arm64 --memory=600g -t rajesh550/gh200-vllm:0.9.0.1 .
 docker build --memory=450g --platform linux/arm64 -t rajesh550/gh200-vllm:0.11.1rc2 . 2>&1 | tee build.log 
 docker push rajesh550/gh200-vllm:0.11.1rc2