Commit Graph

1 Commits

Author SHA1 Message Date
a15f86ecfa Remove cudaMemPrefetchAsync from managed allocator
Eager prefetching was filling HBM+EGM, causing subsequent
cudaMallocManaged calls to fail after model loading. On GH200
with EGM, pages should migrate on-demand via hardware page faults
over C2C NVLink. The cudaMemAdviseSetPreferredLocation(GPU) hint
is sufficient to prefer GPU placement with LPDDR fallback.
2026-04-10 05:58:11 +00:00