Antoni Baum
|
ce741ba3e4
|
Refactor AsyncLLMEngine (#880)
|
2023-09-03 21:43:43 -07:00 |
|
Woosuk Kwon
|
55fe8a81ec
|
Refactor scheduler (#658)
|
2023-08-02 16:42:01 -07:00 |
|
Chaofan Lin
|
aa39e42c5a
|
fix doc (#622)
|
2023-07-31 13:11:57 -07:00 |
|
Fang li
|
953f28cf9a
|
fix ModuleNotFoundError (#599)
Co-authored-by: fangli <fangli@tencent.com>
|
2023-07-29 20:52:41 -07:00 |
|
Xudong Zhang
|
c0d00f5be6
|
[Fix] fix import error of RayWorker (#604) (#605)
|
2023-07-27 23:37:40 -07:00 |
|
Zhuohan Li
|
58a072be15
|
[Fix] Add model sequence length into model config (#575)
|
2023-07-25 23:46:30 -07:00 |
|
Antoni Baum
|
c487a221ee
|
Fix bad assert in initialize_cluster if PG already exists (#526)
|
2023-07-19 23:17:12 -07:00 |
|
Antoni Baum
|
9925c17940
|
Ray placement group support (#397)
|
2023-07-19 22:49:31 -07:00 |
|
Massimiliano Pronesti
|
16c3e295a8
|
fix(ray_utils): ignore re-init error (#465)
|
2023-07-19 17:01:19 -07:00 |
|
Lily Liu
|
b4b195b360
|
fix max seq len (#489)
|
2023-07-17 23:20:20 -07:00 |
|
Zhuohan Li
|
2bdea7ac11
|
[Fix] Fix the condition of max_seq_len (#477)
|
2023-07-17 00:33:48 -04:00 |
|
Zhangir Azerbayev
|
6d7d95a70a
|
Offload port selection to OS (#467)
|
2023-07-15 23:11:02 -07:00 |
|
xcnick
|
c6dfc3cdbe
|
Fix handling of special tokens in decoding. (#418)
|
2023-07-12 11:14:56 -04:00 |
|
codethazine
|
a945fcc2ae
|
Add trust-remote-code flag to handle remote tokenizers (#364)
|
2023-07-07 11:04:58 -07:00 |
|
coolcloudcol
|
7717d0838b
|
Fix an endless loop issue when engine_step throws a RuntimeError (#339)
|
2023-07-03 15:22:28 -07:00 |
|
Zhuohan Li
|
42e0c1df78
|
[Quality] Add CI for formatting (#343)
|
2023-07-03 14:50:56 -07:00 |
|
Zhuohan Li
|
d6fa1be3a8
|
[Quality] Add code formatter and linter (#326)
|
2023-07-03 11:31:55 -07:00 |
|
Lily Liu
|
dafd924c1f
|
Raise error for long prompt (#273)
|
2023-06-30 18:48:49 -07:00 |
|
Woosuk Kwon
|
998d9d1509
|
[Tokenizer] Add tokenizer mode (#298)
|
2023-06-28 14:19:22 -07:00 |
|
Woosuk Kwon
|
4338cc4750
|
[Tokenizer] Add an option to specify tokenizer (#284)
|
2023-06-28 09:46:58 -07:00 |
|
Zhuohan Li
|
0b7db411b5
|
[Bug] Fix the OOM condition for CPU cache (#260)
|
2023-06-26 11:16:13 -07:00 |
|
metacryptom
|
0603379863
|
fix wrong using getattr to get dict value (#232)
|
2023-06-24 22:00:24 -07:00 |
|
Zhuohan Li
|
1d24ccb96c
|
[Fix] Better error message when there is OOM during cache initialization (#203)
|
2023-06-22 15:30:06 +08:00 |
|
Woosuk Kwon
|
14f0b39cda
|
[Bugfix] Fix a bug in RequestOutput.finished (#202)
|
2023-06-22 00:17:24 -07:00 |
|
Zhuohan Li
|
2e0d314384
|
fix-ray (#193)
|
2023-06-22 00:21:41 +08:00 |
|
Woosuk Kwon
|
67d96c29fb
|
Use slow tokenizer for open llama models (#168)
|
2023-06-20 14:19:47 +08:00 |
|
Zhuohan Li
|
bf5f121c02
|
Reduce GPU memory utilization to make sure OOM doesn't happen (#153)
|
2023-06-18 17:33:50 +08:00 |
|
Woosuk Kwon
|
0b98ba15c7
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |
|