biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
William Lin	47b65a5508	[core] Multi Step Scheduling (#7000 ) Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>	2024-08-19 13:52:13 -07:00
SangBin Cho	ff7ec82c4d	[Core] Optimize SPMD architecture with delta + serialization optimization (#7109 )	2024-08-18 17:57:20 -07:00
Rui Qiao	198d6a2898	[Core] Shut down aDAG workers with clean async llm engine exit (#7224 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-08-12 17:57:16 -07:00
Rui Qiao	22e718ff1a	[Misc] Revive to use loopback address for driver IP (#7091 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-08-02 15:50:00 -07:00
Rui Qiao	05308891e2	[Core] Pipeline parallel with Ray ADAG (#6837 ) Support pipeline-parallelism with Ray accelerated DAG. Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-08-02 13:55:40 -07:00
youkaichao	660dea1235	[cuda][misc] remove error_on_invalid_device_count_status (#7069 )	2024-08-02 00:14:21 -07:00
SangBin Cho	1adddb14bf	[Core] Fix ray forward_dag error mssg (#6792 )	2024-07-25 16:53:25 -07:00
Antoni Baum	7bd82002ae	[Core] Allow specifying custom Executor (#6557 )	2024-07-20 01:25:06 +00:00
Nick Hill	b5672a112c	[Core] Multiprocessing Pipeline Parallel support (#6130 ) Co-authored-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-18 19:15:52 -07:00
Rui Qiao	61e592747c	[Core] Introduce SPMD worker execution using Ray accelerated DAG (#6032 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>	2024-07-17 22:27:09 -07:00
Murali Andoorveedu	5fa6e9876e	[Bugfix] Fix for multinode crash on 4 PP (#6495 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-17 08:25:10 +00:00
youkaichao	41708e5034	[ci] try to add multi-node tests (#6280 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-12 21:51:48 -07:00
youkaichao	70c232f85a	[core][distributed] fix ray worker rank assignment (#6235 )	2024-07-08 21:31:44 -07:00
Murali Andoorveedu	0ed646b7aa	[Distributed][Core] Support Py39 and Py38 for PP (#6120 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-03 17:52:29 -07:00
youkaichao	f666207161	[misc][distributed] error on invalid state (#6092 )	2024-07-02 23:37:29 -07:00
Murali Andoorveedu	c5832d2ae9	[Core] Pipeline Parallel Support (#4412 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-02 10:58:08 -07:00
Stephanie Wang	dda4811591	[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408 ) Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu> Signed-off-by: Stephanie <swang@anyscale.com> Co-authored-by: Stephanie <swang@anyscale.com>	2024-06-25 20:30:03 -07:00
youkaichao	3eea74889f	[misc][distributed] use 127.0.0.1 for single-node (#5619 )	2024-06-19 08:05:00 +00:00
youkaichao	1b44aaf4e3	[bugfix][distributed] fix 16 gpus local rank arrangement (#5604 )	2024-06-17 21:35:04 +00:00
Antoni Baum	18a277b52d	Remove Ray health check (#4693 )	2024-06-07 10:01:56 +00:00
Nick Hill	eb6d3c264d	[Core] Eliminate parallel worker per-step task scheduling overhead (#4894 )	2024-05-23 06:17:27 +09:00
Cody Yu	973617ae02	[Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840 ) Co-authored-by: Cade Daniel <edacih@gmail.com> Co-authored-by: Cade Daniel <cade@anyscale.com>	2024-05-16 00:53:51 -07:00
Nick Hill	676a99982f	[Core] Add MultiprocessingGPUExecutor (#4539 ) Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>	2024-05-14 10:38:59 -07:00
Cody Yu	bc8ad68455	[Misc][Refactor] Introduce ExecuteModelData (#4540 )	2024-05-03 17:47:07 -07:00
youkaichao	5b8a7c1cb0	[Misc] centralize all usage of environment variables (#4548 )	2024-05-02 11:13:25 -07:00
Nick Hill	2e240c69a9	[Core] Centralize GPU Worker construction (#4419 )	2024-05-01 01:06:34 +00:00
leiwen83	4bb53e2dde	[BugFix] fix num_lookahead_slots missing in async executor (#4165 ) Co-authored-by: Lei Wen <wenlei03@qiyi.com>	2024-04-30 10:12:59 -07:00
Nick Hill	ba4be44c32	[BugFix] Fix return type of executor execute_model methods (#4402 )	2024-04-27 11:17:45 -07:00
Nick Hill	258a2c58d0	[Core] Introduce `DistributedGPUExecutor` abstract class (#4348 )	2024-04-27 04:14:26 +00:00
SangBin Cho	a88081bf76	[CI] Disable non-lazy string operation on logging (#4326 ) Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>	2024-04-26 00:16:58 -07:00
Nick Hill	479d69fad0	[Core] Move ray_utils.py from `engine` to `executor` package (#4347 )	2024-04-25 06:52:22 +00:00
DefTruth	d87f39e9a9	[Bugfix] Add init_cached_hf_modules to RayWorkerWrapper (#4286 )	2024-04-23 09:28:35 -07:00
Nick Hill	8f2ea22bde	[Core] Some simplification of WorkerWrapper changes (#4183 )	2024-04-23 07:49:08 +00:00
youkaichao	8a7a3e4436	[Core] add an option to log every function call to for debugging hang/crash in distributed inference (#4079 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-04-18 16:15:12 -07:00
youkaichao	8438e0569e	[Core] RayWorkerVllm --> WorkerWrapper to reduce duplication (#4024 ) [Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (#4024)	2024-04-17 08:34:33 +00:00
Cade Daniel	e95cd87959	[Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894 )	2024-04-16 13:09:21 -07:00
Antoni Baum	69e1d2fb69	[Core] Refactor model loading code (#4097 )	2024-04-16 11:34:39 -07:00
Ricky Xu	4695397dcf	[Bugfix] Fix ray workers profiling with nsight (#4095 )	2024-04-15 14:24:45 -07:00
Nick Hill	eb46fbfda2	[Core] Simplifications to executor classes (#4071 )	2024-04-15 13:05:09 -07:00
Sanger Steel	711a000255	[Frontend] [Core] feat: Add model loading using `tensorizer` (#3476 )	2024-04-13 17:13:01 -07:00
SangBin Cho	09473ee41c	[mypy] Add mypy type annotation part 1 (#4006 )	2024-04-12 14:35:50 -07:00
Cade Daniel	e7c7067b45	[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837 )	2024-04-09 11:44:15 -07:00
Isotr0py	0ce0539d47	[Bugfix] Fix Llava inference with Tensor Parallelism. (#3883 )	2024-04-07 22:54:13 +08:00
Cade Daniel	5757d90e26	[Speculative decoding] Adding configuration object for speculative decoding (#3706 ) Co-authored-by: Lily Liu <lilyliupku@gmail.com>	2024-04-03 00:40:57 +00:00
Roy	515386ef3c	[Core] Support multi-node inference(eager and cuda graph) (#3686 )	2024-03-28 15:01:55 -07:00
Cade Daniel	14ccd94c89	[Core][Bugfix]Refactor block manager for better testability (#3492 )	2024-03-27 23:59:28 -07:00
youkaichao	8f44facddd	[Core] remove cupy dependency (#3625 )	2024-03-27 00:33:26 -07:00
xwjiang2010	64172a976c	[Feature] Add vision language model support. (#3042 )	2024-03-25 14:16:30 -07:00
SangBin Cho	01bfb22b41	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
Zhuohan Li	e90fc21f2e	[Hardware][Neuron] Refactor neuron support (#3471 )	2024-03-22 01:22:17 +00:00

1 2

52 Commits