[Tests] Shutdown test RemoteVLLMServer cleanly (#36950)
Recent PR #33949 changed the teardown logic of the RemoteVLLMServer test utility class to send SIGTERM to all vllm (sub)processes at once, which breaks the clean/coordinated shutdown logic that assumes only the top-level process will receive a signal (for example when running in a container that's shut down). This caused a bunch of errors and stacktraces in some test logs, even though those tests still pass. We should still attempt a normal shutdown and only kill other procs if they are still running after a few seconds. Example: tests/v1/distributed/test_external_lb_dp.py::test_external_lb_completion_streaming Signed-off-by: Nick Hill <nickhill123@gmail.com>
This commit is contained in:
@@ -235,13 +235,10 @@ class RemoteVLLMServer:
|
||||
except (ProcessLookupError, OSError):
|
||||
pgid = None
|
||||
|
||||
# Phase 1: graceful SIGTERM to the entire process group
|
||||
if pgid is not None:
|
||||
with contextlib.suppress(ProcessLookupError, OSError):
|
||||
os.killpg(pgid, signal.SIGTERM)
|
||||
print(f"[RemoteOpenAIServer] Sent SIGTERM to process group {pgid}")
|
||||
else:
|
||||
# Phase 1: graceful SIGTERM to the root process
|
||||
with contextlib.suppress(ProcessLookupError, OSError):
|
||||
self.proc.terminate()
|
||||
print(f"[RemoteOpenAIServer] Sent SIGTERM to process {pid}")
|
||||
|
||||
try:
|
||||
self.proc.wait(timeout=15)
|
||||
|
||||
Reference in New Issue
Block a user