[V1][Core] Fix memory issue with logits & sampling (#14508)

Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Varun Sundar Rabindranath <3337719+varun-sundar-rabindranath@users.noreply.github.com>
This commit is contained in:
Roger Wang
2025-03-10 21:03:41 -07:00
committed by GitHub
parent c982ac5722
commit 1fc973c0b5
5 changed files with 139 additions and 91 deletions

View File

@@ -3525,6 +3525,11 @@ class VllmConfig:
not self.model_config.enforce_eager:
batch_size_capture_list = [1, 2, 4
] + [i for i in range(8, 513, 8)]
max_num_tokens = self.scheduler_config.max_num_batched_tokens
batch_size_capture_list = [
size for size in batch_size_capture_list
if size <= max_num_tokens
]
self.compilation_config.init_with_cudagraph_sizes(
batch_size_capture_list)