Richard Liu
|
2148441fd3
|
[TPU] Support single and multi-host TPUs on GKE (#7613)
|
2024-08-30 00:27:40 -07:00 |
|
afeldman-nm
|
428dd1445e
|
[Core] Logprobs support in Multi-step (#7652)
|
2024-08-29 19:19:08 -07:00 |
|
youkaichao
|
f52a43a8b9
|
[ci][test] fix pp test failure (#7945)
|
2024-08-28 01:27:07 -07:00 |
|
Kunshang Ji
|
076169f603
|
[Hardware][Intel GPU] Add intel GPU pipeline parallel support. (#7810)
|
2024-08-27 10:07:02 -07:00 |
|
Megha Agarwal
|
2eedede875
|
[Core] Asynchronous Output Processor (#7049)
Co-authored-by: Alexander Matveev <alexm@neuralmagic.com>
|
2024-08-26 20:53:20 -07:00 |
|
omrishiv
|
760e9f71a8
|
[Bugfix] neuron: enable tensor parallelism (#7562)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
|
2024-08-26 15:13:13 -07:00 |
|
Kunshang Ji
|
fc5ebbd1d3
|
[Hardware][Intel GPU] refactor xpu_model_runner for tp (#7712)
|
2024-08-22 20:06:54 -07:00 |
|
SangBin Cho
|
c01a6cb231
|
[Ray backend] Better error when pg topology is bad. (#7584)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-08-22 17:44:25 -07:00 |
|
youkaichao
|
7eebe8ccaa
|
[distributed][misc] error on same VLLM_HOST_IP setting (#7756)
|
2024-08-21 16:25:34 -07:00 |
|
Antoni Baum
|
66a9e713a7
|
[Core] Pipe worker_class_fn argument in Executor (#7707)
|
2024-08-21 00:37:39 +00:00 |
|
Kunshang Ji
|
c42590f97a
|
[Hardware] [Intel GPU] refactor xpu worker/executor (#7686)
|
2024-08-20 09:54:10 -07:00 |
|
Kunshang Ji
|
b6f99a6ffe
|
[Core] Refactor executor classes for easier inheritance (#7673)
[Core] Refactor executor classes to make it easier to inherit GPUExecutor (#7673)
|
2024-08-20 00:56:50 -07:00 |
|
William Lin
|
47b65a5508
|
[core] Multi Step Scheduling (#7000)
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
|
2024-08-19 13:52:13 -07:00 |
|
SangBin Cho
|
ff7ec82c4d
|
[Core] Optimize SPMD architecture with delta + serialization optimization (#7109)
|
2024-08-18 17:57:20 -07:00 |
|
Roger Wang
|
bbf55c4805
|
[VLM] Refactor MultiModalConfig initialization and profiling (#7530)
|
2024-08-17 13:30:55 -07:00 |
|
omrishiv
|
9c1f78d5d6
|
[Bugfix] update neuron for version > 0.5.0 (#7175)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-08-15 09:44:14 -07:00 |
|
youkaichao
|
4d2dc5072b
|
[hardware] unify usage of is_tpu to current_platform.is_tpu() (#7102)
|
2024-08-13 00:16:42 -07:00 |
|
Rui Qiao
|
198d6a2898
|
[Core] Shut down aDAG workers with clean async llm engine exit (#7224)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-08-12 17:57:16 -07:00 |
|
Cyrus Leung
|
4ddc4743d7
|
[Core] Consolidate GB constant and enable float GB arguments (#7416)
|
2024-08-12 14:14:14 -07:00 |
|
Mahesh Keralapura
|
933790c209
|
[Core] Add span metrics for model_forward, scheduler and sampler time (#7089)
|
2024-08-09 13:55:13 -07:00 |
|
Rui Qiao
|
22e718ff1a
|
[Misc] Revive to use loopback address for driver IP (#7091)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-08-02 15:50:00 -07:00 |
|
Rui Qiao
|
05308891e2
|
[Core] Pipeline parallel with Ray ADAG (#6837)
Support pipeline-parallelism with Ray accelerated DAG.
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-08-02 13:55:40 -07:00 |
|
youkaichao
|
660dea1235
|
[cuda][misc] remove error_on_invalid_device_count_status (#7069)
|
2024-08-02 00:14:21 -07:00 |
|
Travis Johnson
|
593e79e733
|
[Bugfix] torch.set_num_threads() in multiproc_gpu_executor (#6802)
[Bugfix] Use torch.set_num_threads() to configure parallelism in multiproc_gpu_executor (#6802)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-07-26 22:15:20 -07:00 |
|
Woosuk Kwon
|
52f07e3dec
|
[Hardware][TPU] Implement tensor parallelism with Ray (#5871)
|
2024-07-26 20:54:27 -07:00 |
|
Li, Jiang
|
3bbb4936dc
|
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125)
|
2024-07-26 13:50:10 -07:00 |
|
Woosuk Kwon
|
aa4867791e
|
[Misc][TPU] Support TPU in initialize_ray_cluster (#6812)
|
2024-07-26 19:39:49 +00:00 |
|
Anthony Platanios
|
084a01fd35
|
[Bugfix] [Easy] Fixed a bug in the multiprocessing GPU executor. (#6770)
|
2024-07-25 21:25:35 -07:00 |
|
SangBin Cho
|
1adddb14bf
|
[Core] Fix ray forward_dag error mssg (#6792)
|
2024-07-25 16:53:25 -07:00 |
|
Antoni Baum
|
7bd82002ae
|
[Core] Allow specifying custom Executor (#6557)
|
2024-07-20 01:25:06 +00:00 |
|
Nick Hill
|
b5672a112c
|
[Core] Multiprocessing Pipeline Parallel support (#6130)
Co-authored-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-18 19:15:52 -07:00 |
|
Rui Qiao
|
61e592747c
|
[Core] Introduce SPMD worker execution using Ray accelerated DAG (#6032)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
|
2024-07-17 22:27:09 -07:00 |
|
Murali Andoorveedu
|
5fa6e9876e
|
[Bugfix] Fix for multinode crash on 4 PP (#6495)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-17 08:25:10 +00:00 |
|
youkaichao
|
09c2eb85dd
|
[ci][distributed] add pipeline parallel correctness test (#6410)
|
2024-07-16 15:44:22 -07:00 |
|
Thomas Parnell
|
eaec4b9153
|
[Bugfix] Add custom Triton cache manager to resolve MoE MP issue (#6140)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com>
|
2024-07-15 10:12:47 -07:00 |
|
youkaichao
|
41708e5034
|
[ci] try to add multi-node tests (#6280)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-12 21:51:48 -07:00 |
|
Woosuk Kwon
|
997df46a32
|
[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor (#6313)
|
2024-07-10 16:39:02 -07:00 |
|
sangjune.park
|
44cc76610d
|
[Bugfix] Fix OpenVINOExecutor abstractmethod error (#6296)
Signed-off-by: sangjune.park <sangjune.park@navercorp.com>
|
2024-07-10 10:03:32 -07:00 |
|
Woosuk Kwon
|
5ed3505d82
|
[Bugfix][TPU] Add prompt adapter methods to TPUExecutor (#6279)
|
2024-07-09 19:30:56 -07:00 |
|
Swapnil Parekh
|
4d6ada947c
|
[CORE] Adding support for insertion of soft-tuned prompts (#4645)
Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
Co-authored-by: Joe G <joseph.granados@h2o.ai>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
|
2024-07-09 13:26:36 -07:00 |
|
youkaichao
|
70c232f85a
|
[core][distributed] fix ray worker rank assignment (#6235)
|
2024-07-08 21:31:44 -07:00 |
|
Murali Andoorveedu
|
0ed646b7aa
|
[Distributed][Core] Support Py39 and Py38 for PP (#6120)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-03 17:52:29 -07:00 |
|
Travis Johnson
|
1dab9bc8a9
|
[Bugfix] set OMP_NUM_THREADS to 1 by default for multiprocessing (#6109)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-07-03 16:56:59 -07:00 |
|
xwjiang2010
|
d9e98f42e4
|
[vlm] Remove vision language config. (#6089)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-03 22:14:16 +00:00 |
|
youkaichao
|
f666207161
|
[misc][distributed] error on invalid state (#6092)
|
2024-07-02 23:37:29 -07:00 |
|
Nick Hill
|
d830656a97
|
[BugFix] Avoid unnecessary Ray import warnings (#6079)
|
2024-07-03 14:09:40 +08:00 |
|
Murali Andoorveedu
|
c5832d2ae9
|
[Core] Pipeline Parallel Support (#4412)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-02 10:58:08 -07:00 |
|
Ilya Lavrenov
|
57f09a419c
|
[Hardware][Intel] OpenVINO vLLM backend (#5379)
|
2024-06-28 13:50:16 +00:00 |
|
Stephanie Wang
|
dda4811591
|
[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408)
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Signed-off-by: Stephanie <swang@anyscale.com>
Co-authored-by: Stephanie <swang@anyscale.com>
|
2024-06-25 20:30:03 -07:00 |
|
aws-patlange
|
82079729cc
|
[Bugfix] Fix assertion in NeuronExecutor (#5841)
|
2024-06-25 19:52:10 -07:00 |
|