[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408)

Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Signed-off-by: Stephanie <swang@anyscale.com>
Co-authored-by: Stephanie <swang@anyscale.com>
This commit is contained in:
Stephanie Wang
2024-06-25 20:30:03 -07:00
committed by GitHub
parent 82079729cc
commit dda4811591
29 changed files with 1106 additions and 573 deletions

View File

@@ -190,9 +190,8 @@ class RayGPUExecutor(DistributedGPUExecutor):
max_parallel_loading_workers)
def _driver_execute_model(
self,
execute_model_req: Optional[ExecuteModelRequest] = None
) -> List[SamplerOutput]:
self, execute_model_req: Optional[ExecuteModelRequest]
) -> Optional[List[SamplerOutput]]:
"""Run execute_model in the driver worker.
Passing None will cause the driver to stop the model execution