Commit Graph

  • 598dc4b79a [Fix] Weight loading for GPTBigCode (#313) Zhuohan Li 2023-06-29 22:14:17 -07:00
  • 85de093472 [Fix] Do not pin memory when in WSL (#312) Zhuohan Li 2023-06-29 15:00:21 -07:00
  • f72297562f Add news for the vllm+skypilot example (#314) Zhanghao Wu 2023-06-29 12:32:37 -07:00
  • 9d27b09d12 Update README.md (#306) Bayang 2023-06-29 14:52:15 +01:00
  • 998d9d1509 [Tokenizer] Add tokenizer mode (#298) Woosuk Kwon 2023-06-28 14:19:22 -07:00
  • 425040d4c1 remove floats == 0 comparison (#285) Lily Liu 2023-06-28 14:11:51 -07:00
  • 4338cc4750 [Tokenizer] Add an option to specify tokenizer (#284) Woosuk Kwon 2023-06-28 09:46:58 -07:00
  • bdd6b4c8bc Add LLM.set_tokenizer (#283) Jishnu Ray Chowdhury 2023-06-28 02:28:29 -05:00
  • 2b7d3aca2e Update setup.py (#282) Cody Yu 2023-06-27 14:34:23 -07:00
  • 4026a049d3 expand coverage of gpt2 model loading (#271) twaka 2023-06-27 22:27:41 +09:00
  • 43710e8d09 [Fix] Fix default port number in benchmark scripts (#265) Zhuohan Li 2023-06-26 13:15:35 -07:00
  • 526df28fb2 [BugFix] Fix a bug in counting running sequences (#266) Woosuk Kwon 2023-06-26 13:09:02 -07:00
  • 2cf1a333b6 [Doc] Documentation for distributed inference (#261) Zhuohan Li 2023-06-26 11:34:23 -07:00
  • 0b7db411b5 [Bug] Fix the OOM condition for CPU cache (#260) Zhuohan Li 2023-06-26 11:16:13 -07:00
  • 471a7a4566 Compatible with Decapoda Research llama hf version (#251) BasicCoder 2023-06-27 00:23:57 +08:00
  • 6214dd6ce9 Update README.md (#236) Lianmin Zheng 2023-06-25 16:58:06 -07:00
  • 0603379863 fix wrong using getattr to get dict value (#232) metacryptom 2023-06-25 13:00:24 +08:00
  • 665c48963b [Docs] Add GPTBigCode to supported models (#213) Woosuk Kwon 2023-06-22 15:05:11 -07:00
  • 298695b766 GPTBigCode (StarCoder, SantaCoder Support) (#209) Michael Feil 2023-06-22 19:49:27 +02:00
  • 83658c8ace Bump up version to 0.1.1 (#204) v0.1.1 Zhuohan Li 2023-06-22 15:33:32 +08:00
  • 1d24ccb96c [Fix] Better error message when there is OOM during cache initialization (#203) Zhuohan Li 2023-06-22 15:30:06 +08:00
  • 14f0b39cda [Bugfix] Fix a bug in RequestOutput.finished (#202) Woosuk Kwon 2023-06-22 00:17:24 -07:00
  • 2e0d314384 fix-ray (#193) Zhuohan Li 2023-06-22 00:21:41 +08:00
  • 67d96c29fb Use slow tokenizer for open llama models (#168) v0.1.0 Woosuk Kwon 2023-06-19 23:19:47 -07:00
  • 033f5c78f5 Remove e.g. in README (#167) Zhuohan Li 2023-06-20 14:00:28 +08:00
  • 794e578de0 [Minor] Fix URLs (#166) Woosuk Kwon 2023-06-19 22:57:14 -07:00
  • caddfc14c1 [Minor] Fix icons in doc (#165) Woosuk Kwon 2023-06-19 20:35:38 -07:00
  • fc72e39de3 Change image urls (#164) Zhuohan Li 2023-06-20 11:15:15 +08:00
  • b7e62d3454 Fix repo & documentation URLs (#163) Woosuk Kwon 2023-06-19 20:03:40 -07:00
  • 364536acd1 [Docs] Minor fix (#162) Woosuk Kwon 2023-06-19 19:58:23 -07:00
  • 0b32a987dd Add and list supported models in README (#161) Zhuohan Li 2023-06-20 10:57:46 +08:00
  • 570fb2e9cc [PyPI] Fix package info in setup.py (#158) Woosuk Kwon 2023-06-19 18:05:01 -07:00
  • a255885f83 Add logo and polish readme (#156) Zhuohan Li 2023-06-19 16:31:13 +08:00
  • 5822ede66e Add performance figures for dark mode (#160) Woosuk Kwon 2023-06-18 23:46:24 -07:00
  • 0370afa2e5 Remove benchmark_async_llm_server.py (#155) Zhuohan Li 2023-06-19 11:12:37 +08:00
  • 7e2a913c64 [Minor] Fix CompletionOutput.__repr__ (#157) Woosuk Kwon 2023-06-18 19:58:25 -07:00
  • 3f92038b99 Add comments on swap space (#154) Woosuk Kwon 2023-06-18 11:39:35 -07:00
  • dcda03b4cb Write README and front page of doc (#147) Woosuk Kwon 2023-06-18 03:19:38 -07:00
  • bf5f121c02 Reduce GPU memory utilization to make sure OOM doesn't happen (#153) Zhuohan Li 2023-06-18 17:33:50 +08:00
  • bec7b2dc26 Add quickstart guide (#148) Zhuohan Li 2023-06-18 01:26:12 +08:00
  • 0b98ba15c7 Change the name to vLLM (#150) Woosuk Kwon 2023-06-17 03:07:40 -07:00
  • e5464ee484 Rename servers to engines (#152) Zhuohan Li 2023-06-17 17:25:21 +08:00
  • bab8f3dd0d [Minor] Fix benchmark_throughput.py (#151) Woosuk Kwon 2023-06-16 21:00:52 -07:00
  • eedb46bf03 Rename servers and change port numbers to reduce confusion (#149) Zhuohan Li 2023-06-17 00:13:02 +08:00
  • 311490a720 Add script for benchmarking serving throughput (#145) Woosuk Kwon 2023-06-14 19:55:38 -07:00
  • da5ddcd544 Remove redundant code in ColumnParallelLinear (#146) Woosuk Kwon 2023-06-10 21:25:11 -07:00
  • 5020e1e80c Non-streaming simple fastapi server (#144) Zhuohan Li 2023-06-11 01:43:07 +08:00
  • 4298374265 Add docstrings for LLMServer and related classes and examples (#142) Zhuohan Li 2023-06-07 18:25:20 +08:00
  • e38074b1e6 Support FP32 (#141) Woosuk Kwon 2023-06-07 00:40:21 -07:00
  • 376725ce74 [PyPI] Packaging for PyPI distribution (#140) Woosuk Kwon 2023-06-05 20:03:14 -07:00
  • 456941cfe4 [Docs] Write the Adding a New Model section (#138) Woosuk Kwon 2023-06-05 20:01:26 -07:00
  • 1a956e136b Fix various issues of async servers (#135) Zhuohan Li 2023-06-05 23:44:50 +08:00
  • 8274ca23ac Add docstrings for LLM (#137) Woosuk Kwon 2023-06-04 12:52:41 -07:00
  • 62ec38ea41 Document supported models (#127) Woosuk Kwon 2023-06-02 22:35:17 -07:00
  • 0eda2e0953 Add .readthedocs.yaml (#136) Woosuk Kwon 2023-06-02 22:27:44 -07:00
  • 211318d44a Add throughput benchmarking script (#133) Woosuk Kwon 2023-05-28 03:20:05 -07:00
  • 337871c6fd Enable LLaMA fast tokenizer (#132) Woosuk Kwon 2023-05-28 02:51:42 -07:00
  • 56b7f0efa4 Add a doc for installation (#128) Woosuk Kwon 2023-05-27 01:13:06 -07:00
  • d721168449 Improve setup script & Add a guard for bfloat16 kernels (#130) Woosuk Kwon 2023-05-27 00:59:32 -07:00
  • 4a151dd453 Add activation registry (#126) Woosuk Kwon 2023-05-25 00:09:07 -07:00
  • 057daef778 OpenAI Compatible Frontend (#116) Zhuohan Li 2023-05-23 21:39:50 -07:00
  • e86717833d Incrementally decode output tokens (#121) Woosuk Kwon 2023-05-23 20:46:32 -07:00
  • aedba6d5ec Print warnings/errors for large swap space (#123) Woosuk Kwon 2023-05-23 18:22:26 -07:00
  • a283ec2eec Add contributing guideline and mypy config (#122) Woosuk Kwon 2023-05-23 17:58:51 -07:00
  • 3f942acfe1 Fix latency benchmark script (#118) Woosuk Kwon 2023-05-22 17:03:40 -07:00
  • 19d2899439 Add initial sphinx docs (#120) Woosuk Kwon 2023-05-22 17:02:44 -07:00
  • 655a5e48df Introduce LLM class for offline inference (#115) Woosuk Kwon 2023-05-21 17:04:18 -07:00
  • f746ced08d Implement stop strings and best_of (#114) Woosuk Kwon 2023-05-21 11:18:00 -07:00
  • c3442c1f6f Refactor system architecture (#109) Woosuk Kwon 2023-05-20 13:06:59 -07:00
  • 7297fa6f7c Remove unused parts in Megatron-LM code and add copyright notice (#110) Zhuohan Li 2023-05-20 09:11:34 -06:00
  • b7955ef17b Fix timeout error in the FastAPI frontend (#34) Zhuohan Li 2023-05-19 14:00:46 -06:00
  • f756799b84 Use runtime profiling to replace manual memory analyzers (#81) Zhuohan Li 2023-05-19 11:35:44 -06:00
  • 825d8892b5 Use pytest format for unit tests (#107) Woosuk Kwon 2023-05-17 17:11:23 -07:00
  • b322fd1607 Add docstrings to some modules and classes (#100) Woosuk Kwon 2023-05-14 22:32:38 -07:00
  • 667ba3995c Add copyright headers to source files adapted from FT (#104) Woosuk Kwon 2023-05-14 22:19:19 -07:00
  • 707ec647bb Add copyright headers for HF models (#103) Woosuk Kwon 2023-05-14 21:54:32 -07:00
  • 89988ec8c2 Add Apache-2.0 license (#102) Woosuk Kwon 2023-05-14 18:05:19 -07:00
  • 6208d622ca Minor code cleaning for SamplingParams (#99) Woosuk Kwon 2023-05-12 18:07:09 -07:00
  • 42f1042e1c Enhance SamplingParams (#96) Woosuk Kwon 2023-05-11 15:45:30 -07:00
  • 55f8b0a5de Implement presence and frequency penalties (#95) Woosuk Kwon 2023-05-10 23:39:12 -07:00
  • 9f88db35da Support top-k sampling (#94) Woosuk Kwon 2023-05-10 12:51:36 -07:00
  • ae356774ab Avoid sorting waiting queue & Minor code cleaning (#93) Woosuk Kwon 2023-05-10 01:57:07 -07:00
  • e331957784 Log system stats (#90) Woosuk Kwon 2023-05-10 01:06:53 -07:00
  • 8d66a7b6d7 Rename variables and methods (#91) Woosuk Kwon 2023-05-10 00:58:31 -07:00
  • ce26e57fd3 Update sample prompts in simple_server.py (#89) Woosuk Kwon 2023-05-09 16:47:39 -07:00
  • 85eb631839 Use slow tokenizer for LLaMA (#84) Woosuk Kwon 2023-05-09 16:03:44 -07:00
  • add055e151 Enhance model loader (#83) Woosuk Kwon 2023-05-09 15:46:42 -07:00
  • 7c041ab578 Refactor system architecture (#82) Woosuk Kwon 2023-05-09 15:30:12 -07:00
  • 8917782af6 Add a system logger (#85) Woosuk Kwon 2023-05-08 23:03:35 -07:00
  • 7addca5935 Specify python package dependencies in requirements.txt (#78) Woosuk Kwon 2023-05-07 16:30:43 -07:00
  • c84e924287 [Minor] Fix a dtype bug (#79) Woosuk Kwon 2023-05-06 02:12:12 -07:00
  • c9d5b6d4a8 Replace FlashAttention with xformers (#70) Woosuk Kwon 2023-05-05 02:01:08 -07:00
  • 189ae23133 Use dtype from model config & Add Dolly V2 (#63) Woosuk Kwon 2023-05-04 03:05:37 -07:00
  • e548c1488a Add support for GPT-2 (#60) Woosuk Kwon 2023-05-04 02:59:56 -07:00
  • 130d5fd8c7 Fix a bug in attention kernel (#68) Woosuk Kwon 2023-05-04 02:56:09 -07:00
  • e070829ae8 Support bfloat16 data type (#54) Woosuk Kwon 2023-05-03 14:09:44 -07:00
  • 436e523bf1 Refactor attention kernels (#53) Woosuk Kwon 2023-05-03 13:40:13 -07:00
  • 27f1410d06 New weight loader without np copy (#52) Zhuohan Li 2023-05-03 15:32:04 +08:00
  • 4858f3bb45 Add an option to launch cacheflow without ray (#51) Zhuohan Li 2023-04-30 15:42:17 +08:00
  • a96d63c21d Add support for GPT-NeoX (Pythia) (#50) Woosuk Kwon 2023-04-28 00:32:10 -07:00