Commit Graph

9263 Commits

Author SHA1 Message Date
Zhuohan Li
0ffded812a [Fix] Better error message for batched prompts (#342) 2023-07-03 09:27:31 -07:00
Michele Catalano
0bd2a573a5 Allow send list of str for the Prompt on openai demo endpoint /v1/completions (#323)
* allow str or List[str] for prompt

* Update vllm/entrypoints/openai/api_server.py

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

---------

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-07-03 09:17:50 -07:00
Ricardo Lu
49b26e2cec feat: add ChatCompletion endpoint in OpenAI demo server. (#330) 2023-07-02 22:54:33 -07:00
Lily Liu
dafd924c1f Raise error for long prompt (#273) 2023-06-30 18:48:49 -07:00
Zhuohan Li
598dc4b79a [Fix] Weight loading for GPTBigCode (#313) 2023-06-29 22:14:17 -07:00
Zhuohan Li
85de093472 [Fix] Do not pin memory when in WSL (#312) 2023-06-29 15:00:21 -07:00
Zhanghao Wu
f72297562f Add news for the vllm+skypilot example (#314) 2023-06-29 12:32:37 -07:00
Bayang
9d27b09d12 Update README.md (#306) 2023-06-29 06:52:15 -07:00
Woosuk Kwon
998d9d1509 [Tokenizer] Add tokenizer mode (#298) 2023-06-28 14:19:22 -07:00
Lily Liu
425040d4c1 remove floats == 0 comparison (#285) 2023-06-28 14:11:51 -07:00
Woosuk Kwon
4338cc4750 [Tokenizer] Add an option to specify tokenizer (#284) 2023-06-28 09:46:58 -07:00
Jishnu Ray Chowdhury
bdd6b4c8bc Add LLM.set_tokenizer (#283) 2023-06-28 00:28:29 -07:00
Cody Yu
2b7d3aca2e Update setup.py (#282)
Co-authored-by: neubig <neubig@gmail.com>
2023-06-27 14:34:23 -07:00
twaka
4026a049d3 expand coverage of gpt2 model loading (#271) 2023-06-27 06:27:41 -07:00
Zhuohan Li
43710e8d09 [Fix] Fix default port number in benchmark scripts (#265) 2023-06-26 13:15:35 -07:00
Woosuk Kwon
526df28fb2 [BugFix] Fix a bug in counting running sequences (#266) 2023-06-26 13:09:02 -07:00
Zhuohan Li
2cf1a333b6 [Doc] Documentation for distributed inference (#261) 2023-06-26 11:34:23 -07:00
Zhuohan Li
0b7db411b5 [Bug] Fix the OOM condition for CPU cache (#260) 2023-06-26 11:16:13 -07:00
BasicCoder
471a7a4566 Compatible with Decapoda Research llama hf version (#251) 2023-06-26 09:23:57 -07:00
Lianmin Zheng
6214dd6ce9 Update README.md (#236) 2023-06-25 16:58:06 -07:00
metacryptom
0603379863 fix wrong using getattr to get dict value (#232) 2023-06-24 22:00:24 -07:00
Woosuk Kwon
665c48963b [Docs] Add GPTBigCode to supported models (#213) 2023-06-22 15:05:11 -07:00
Michael Feil
298695b766 GPTBigCode (StarCoder, SantaCoder Support) (#209) 2023-06-23 01:49:27 +08:00
Zhuohan Li
83658c8ace Bump up version to 0.1.1 (#204) v0.1.1 2023-06-22 15:33:32 +08:00
Zhuohan Li
1d24ccb96c [Fix] Better error message when there is OOM during cache initialization (#203) 2023-06-22 15:30:06 +08:00
Woosuk Kwon
14f0b39cda [Bugfix] Fix a bug in RequestOutput.finished (#202) 2023-06-22 00:17:24 -07:00
Zhuohan Li
2e0d314384 fix-ray (#193) 2023-06-22 00:21:41 +08:00
Woosuk Kwon
67d96c29fb Use slow tokenizer for open llama models (#168) v0.1.0 2023-06-20 14:19:47 +08:00
Zhuohan Li
033f5c78f5 Remove e.g. in README (#167) 2023-06-20 14:00:28 +08:00
Woosuk Kwon
794e578de0 [Minor] Fix URLs (#166) 2023-06-19 22:57:14 -07:00
Woosuk Kwon
caddfc14c1 [Minor] Fix icons in doc (#165) 2023-06-19 20:35:38 -07:00
Zhuohan Li
fc72e39de3 Change image urls (#164) 2023-06-20 11:15:15 +08:00
Woosuk Kwon
b7e62d3454 Fix repo & documentation URLs (#163) 2023-06-19 20:03:40 -07:00
Woosuk Kwon
364536acd1 [Docs] Minor fix (#162) 2023-06-19 19:58:23 -07:00
Zhuohan Li
0b32a987dd Add and list supported models in README (#161) 2023-06-20 10:57:46 +08:00
Woosuk Kwon
570fb2e9cc [PyPI] Fix package info in setup.py (#158) 2023-06-19 18:05:01 -07:00
Zhuohan Li
a255885f83 Add logo and polish readme (#156) 2023-06-19 16:31:13 +08:00
Woosuk Kwon
5822ede66e Add performance figures for dark mode (#160) 2023-06-18 23:46:24 -07:00
Zhuohan Li
0370afa2e5 Remove benchmark_async_llm_server.py (#155) 2023-06-19 11:12:37 +08:00
Woosuk Kwon
7e2a913c64 [Minor] Fix CompletionOutput.__repr__ (#157) 2023-06-18 19:58:25 -07:00
Woosuk Kwon
3f92038b99 Add comments on swap space (#154) 2023-06-18 11:39:35 -07:00
Woosuk Kwon
dcda03b4cb Write README and front page of doc (#147) 2023-06-18 03:19:38 -07:00
Zhuohan Li
bf5f121c02 Reduce GPU memory utilization to make sure OOM doesn't happen (#153) 2023-06-18 17:33:50 +08:00
Zhuohan Li
bec7b2dc26 Add quickstart guide (#148) 2023-06-18 01:26:12 +08:00
Woosuk Kwon
0b98ba15c7 Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
Zhuohan Li
e5464ee484 Rename servers to engines (#152) 2023-06-17 17:25:21 +08:00
Woosuk Kwon
bab8f3dd0d [Minor] Fix benchmark_throughput.py (#151) 2023-06-16 21:00:52 -07:00
Zhuohan Li
eedb46bf03 Rename servers and change port numbers to reduce confusion (#149) 2023-06-17 00:13:02 +08:00
Woosuk Kwon
311490a720 Add script for benchmarking serving throughput (#145) 2023-06-14 19:55:38 -07:00
Woosuk Kwon
da5ddcd544 Remove redundant code in ColumnParallelLinear (#146) 2023-06-10 21:25:11 -07:00