[Frontend][Core] Add shutdown timeout - allowing in-flight requests to finish (#34730)

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Mark McLoughlin
2026-03-06 06:04:31 +00:00
committed by GitHub
parent 57c84ff129
commit 27066d1b2b
15 changed files with 762 additions and 90 deletions

View File

@@ -327,6 +327,12 @@ class VllmConfig:
weight_transfer_config: WeightTransferConfig | None = None
"""The configurations for weight transfer during RL training."""
shutdown_timeout: int = Field(default=0, ge=0)
"""Shutdown grace period for in-flight requests. Shutdown will be delayed for
up to this amount of time to allow already-running requests to complete. Any
remaining requests are aborted once the timeout is reached.
"""
def compute_hash(self) -> str:
"""
WARNING: Whenever a new field is added to this config,