Commit Graph

7968 Commits

Author SHA1 Message Date
Woosuk Kwon
4338cc4750 [Tokenizer] Add an option to specify tokenizer (#284) 2023-06-28 09:46:58 -07:00
Jishnu Ray Chowdhury
bdd6b4c8bc Add LLM.set_tokenizer (#283) 2023-06-28 00:28:29 -07:00
twaka
4026a049d3 expand coverage of gpt2 model loading (#271) 2023-06-27 06:27:41 -07:00
Woosuk Kwon
526df28fb2 [BugFix] Fix a bug in counting running sequences (#266) 2023-06-26 13:09:02 -07:00
Zhuohan Li
0b7db411b5 [Bug] Fix the OOM condition for CPU cache (#260) 2023-06-26 11:16:13 -07:00
BasicCoder
471a7a4566 Compatible with Decapoda Research llama hf version (#251) 2023-06-26 09:23:57 -07:00
metacryptom
0603379863 fix wrong using getattr to get dict value (#232) 2023-06-24 22:00:24 -07:00
Michael Feil
298695b766 GPTBigCode (StarCoder, SantaCoder Support) (#209) 2023-06-23 01:49:27 +08:00
Zhuohan Li
83658c8ace Bump up version to 0.1.1 (#204) 2023-06-22 15:33:32 +08:00
Zhuohan Li
1d24ccb96c [Fix] Better error message when there is OOM during cache initialization (#203) 2023-06-22 15:30:06 +08:00
Woosuk Kwon
14f0b39cda [Bugfix] Fix a bug in RequestOutput.finished (#202) 2023-06-22 00:17:24 -07:00
Zhuohan Li
2e0d314384 fix-ray (#193) 2023-06-22 00:21:41 +08:00
Woosuk Kwon
67d96c29fb Use slow tokenizer for open llama models (#168) 2023-06-20 14:19:47 +08:00
Woosuk Kwon
7e2a913c64 [Minor] Fix CompletionOutput.__repr__ (#157) 2023-06-18 19:58:25 -07:00
Woosuk Kwon
3f92038b99 Add comments on swap space (#154) 2023-06-18 11:39:35 -07:00
Zhuohan Li
bf5f121c02 Reduce GPU memory utilization to make sure OOM doesn't happen (#153) 2023-06-18 17:33:50 +08:00
Zhuohan Li
bec7b2dc26 Add quickstart guide (#148) 2023-06-18 01:26:12 +08:00
Woosuk Kwon
0b98ba15c7 Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00