biondizzle/vllm - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Author	SHA1	Message	Date
biondizzle	a15f86ecfa	Remove cudaMemPrefetchAsync from managed allocator Eager prefetching was filling HBM+EGM, causing subsequent cudaMallocManaged calls to fail after model loading. On GH200 with EGM, pages should migrate on-demand via hardware page faults over C2C NVLink. The cudaMemAdviseSetPreferredLocation(GPU) hint is sufficient to prefer GPU placement with LPDDR fallback.	2026-04-10 05:58:11 +00:00

Author

SHA1

Message

Date

biondizzle

a15f86ecfa

Remove cudaMemPrefetchAsync from managed allocator

Eager prefetching was filling HBM+EGM, causing subsequent
cudaMallocManaged calls to fail after model loading. On GH200
with EGM, pages should migrate on-demand via hardware page faults
over C2C NVLink. The cudaMemAdviseSetPreferredLocation(GPU) hint
is sufficient to prefer GPU placement with LPDDR fallback.

2026-04-10 05:58:11 +00:00

1 Commits