Jeff Fialho
|
825b044863
|
[Frontend] Warn if user max_model_len is greater than derived max_model_len (#7080)
Signed-off-by: Jefferson Fialho <jfialho@ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-03 16:01:38 -07:00 |
|
Robert Shaw
|
ed812a73fa
|
[ Frontend ] Multiprocessing for OpenAI Server with zeromq (#6883)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-08-02 18:27:28 -07:00 |
|
youkaichao
|
708989341e
|
[misc] add a flag to enable compile (#7092)
|
2024-08-02 16:18:45 -07:00 |
|
Rui Qiao
|
05308891e2
|
[Core] Pipeline parallel with Ray ADAG (#6837)
Support pipeline-parallelism with Ray accelerated DAG.
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-08-02 13:55:40 -07:00 |
|
Jee Jee Li
|
7ecee34321
|
[Kernel][RFC] Refactor the punica kernel based on Triton (#5036)
|
2024-07-31 17:12:24 -07:00 |
|
Cody Yu
|
bd70013407
|
[MISC] Introduce pipeline parallelism partition strategies (#6920)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-07-31 12:02:17 -07:00 |
|
Li, Jiang
|
3bbb4936dc
|
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125)
|
2024-07-26 13:50:10 -07:00 |
|
Rui Qiao
|
61e592747c
|
[Core] Introduce SPMD worker execution using Ray accelerated DAG (#6032)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
|
2024-07-17 22:27:09 -07:00 |
|
Cyrus Leung
|
d97011512e
|
[CI/Build] vLLM cache directory for images (#6444)
|
2024-07-15 23:12:25 -07:00 |
|
DefTruth
|
44874a0bf9
|
[Doc] add env docs for flashinfer backend (#6437)
|
2024-07-14 21:16:51 -07:00 |
|
Woosuk Kwon
|
eeceadaecc
|
[Misc] Add deprecation warning for beam search (#6402)
|
2024-07-13 11:52:22 -07:00 |
|
Avshalom Manevich
|
12a59959ed
|
[Bugfix] adding chunking mechanism to fused_moe to handle large inputs (#6029)
|
2024-07-01 21:08:29 +00:00 |
|
Ilya Lavrenov
|
57f09a419c
|
[Hardware][Intel] OpenVINO vLLM backend (#5379)
|
2024-06-28 13:50:16 +00:00 |
|
youkaichao
|
d9a252bc8e
|
[Core][Distributed] add shm broadcast (#5399)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-06-21 05:12:35 +00:00 |
|
youkaichao
|
6c5b7af152
|
[distributed][misc] use fork by default for mp (#5669)
|
2024-06-20 17:06:34 -07:00 |
|
Woosuk Kwon
|
1a8bfd92d5
|
[Hardware] Initial TPU integration (#5292)
|
2024-06-12 11:53:03 -07:00 |
|
Woosuk Kwon
|
8bab4959be
|
[Misc] Remove VLLM_BUILD_WITH_NEURON env variable (#5389)
|
2024-06-11 00:37:56 -07:00 |
|
Roger Wang
|
7a9cb294ae
|
[Frontend] Add OpenAI Vision API Support (#5237)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-06-07 11:23:32 -07:00 |
|
youkaichao
|
388596c914
|
[Misc][Utils] allow get_open_port to be called for multiple times (#5333)
|
2024-06-06 22:15:11 -07:00 |
|
youkaichao
|
325c119961
|
[Misc] add logging level env var (#5045)
|
2024-05-24 23:49:49 -07:00 |
|
Wenwei Zhang
|
546a97ef69
|
[Misc]: allow user to specify port in distributed setting (#4914)
|
2024-05-20 17:45:06 +00:00 |
|
Sanger Steel
|
8bc68e198c
|
[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update tensorizer to version 2.9.0 (#4208)
|
2024-05-13 14:57:07 -07:00 |
|
youkaichao
|
344bf7cd2d
|
[Misc] add installation time env vars (#4574)
|
2024-05-03 15:55:56 -07:00 |
|
youkaichao
|
2d7bce9cd5
|
[Doc] add env vars to the doc (#4572)
|
2024-05-03 05:13:49 +00:00 |
|
youkaichao
|
5b8a7c1cb0
|
[Misc] centralize all usage of environment variables (#4548)
|
2024-05-02 11:13:25 -07:00 |
|