[AsyncScheduling] Don't schedule past request max_tokens (#27922)

Signed-off-by: Nick Hill <nhill@redhat.com>
This commit is contained in:
Nick Hill
2025-11-04 09:06:28 -08:00
committed by GitHub
parent c9f66da8fd
commit 938a81692e
3 changed files with 14 additions and 4 deletions

View File

@@ -155,7 +155,6 @@ def test_suffix_decoding_acceptance(
)
# Run several times and check that the accepted tokens increase.
spec_llm.chat(test_prompts, sampling_config)
num_draft = []
num_accept = []
for i in range(10): # Run multiple times to warm up the cache.