Commit Graph

  • 1696725879 Initialize AsyncLLMEngine bg loop correctly (#943) Antoni Baum 2023-09-04 17:41:22 -07:00
  • 002800f081 Align vLLM's beam search implementation with HF generate (#857) Zhuohan Li 2023-09-04 17:29:42 -07:00
  • e15932bb60 Only emit warning about internal tokenizer if it isn't being used (#939) Nelson Liu 2023-09-04 08:50:55 -07:00
  • ce741ba3e4 Refactor AsyncLLMEngine (#880) Antoni Baum 2023-09-03 21:43:43 -07:00
  • bf87484efa [BugFix] Fix NaN errors in paged attention kernel (#936) Woosuk Kwon 2023-09-04 09:20:06 +09:00
  • 8ce9c50d40 Avoid compiling kernels for double data type (#933) Woosuk Kwon 2023-09-02 14:59:47 +09:00
  • 32b6816e55 Add tests for models (#922) Woosuk Kwon 2023-09-01 11:19:43 +09:00
  • c128d69856 Fix README.md Link (#927) Zhuohan Li 2023-08-31 17:18:34 -07:00
  • 55b28b1eee [Docs] Minor fixes in supported models (#920) Woosuk Kwon 2023-09-01 08:28:39 +09:00
  • e11222333f fix: bug fix when penalties are negative (#913) Dong-Yong Lee 2023-09-01 00:37:17 +09:00
  • 28873a2799 Improve _prune_hidden_states micro-benchmark (#707) Aman Gupta Karmani 2023-08-31 00:28:43 -04:00
  • 0080d8329d Add acknowledgement to a16z grant Zhuohan Li 2023-08-30 02:17:27 -07:00
  • 0d93f15694 Accelerate LLaMA model loading (#234) JFDuan 2023-08-30 16:00:13 +08:00
  • becd7a56f1 Enable request body OpenAPI spec for OpenAI endpoints (#865) lplcor 2023-08-29 21:54:08 -07:00
  • 75471386de use flash-attn via xformers (#877) Aman Gupta Karmani 2023-08-30 00:52:13 -04:00
  • d2b2eed67c [Fix] Fix a condition for ignored sequences (#867) Zhuohan Li 2023-08-27 23:00:56 -07:00
  • 4b6f069b6f Add support for CodeLlama (#854) Antoni Baum 2023-08-25 12:44:07 -07:00
  • 791d79de32 Bump up the version to v0.1.4 (#846) v0.1.4 Woosuk Kwon 2023-08-25 12:28:00 +09:00
  • 94d2f59895 Set replacement=True in torch.multinomial (#858) Woosuk Kwon 2023-08-25 12:22:01 +09:00
  • 75c0ca9d43 Clean up code (#844) wenjun93 2023-08-24 07:44:15 +08:00
  • 2a4ec90854 Fix for breaking changes in xformers 0.0.21 (#834) Woosuk Kwon 2023-08-23 17:44:21 +09:00
  • 85ebcda94d Fix typo of Aquila in README.md (#836) ldwang 2023-08-23 11:48:36 +08:00
  • d64bf1646c Implement approximate GELU kernels (#828) Woosuk Kwon 2023-08-23 07:43:21 +09:00
  • a41c20435e Add compute capability 8.9 to default targets (#829) Woosuk Kwon 2023-08-23 07:28:38 +09:00
  • eedac9dba0 fix: revert code to avoid no attribute problem (#827) Wen Sun 2023-08-23 02:55:16 +08:00
  • 14f9c72bfd Update Supported Model List (#825) Zhuohan Li 2023-08-22 11:51:44 -07:00
  • ad5f2fe34c Add support for aquila (#663) shunxing1234 2023-08-22 15:13:36 +08:00
  • 4f8584756d Fix mqa is false case in gpt_bigcode (#806) zhaoyang-star 2023-08-22 13:22:06 +08:00
  • 65fc1c3127 set default coompute capability according to cuda version (#773) Xudong Zhang 2023-08-22 07:05:44 +08:00
  • c393af6cd7 [Feature | CI] Added a github action to build wheels (#746) Daniel 2023-08-21 10:59:15 +03:00
  • 0c04ce3234 Fix typo in sampling_params.py (#788) wangcx18 2023-08-18 09:12:46 +08:00
  • 73b3de79ea explicitly del state (#784) Xinyu Yang 2023-08-18 03:56:04 +08:00
  • d1744376ae Align with huggingface Top K sampling (#753) Abraham-Xu 2023-08-16 07:44:33 +08:00
  • 805de738f6 Fix typo in tokenizer.py (#750) Ikko Eltociear Ashimine 2023-08-15 14:26:36 +09:00
  • 1b151ed181 Fix baichuan doc style (#748) Uranus 2023-08-14 11:57:31 +08:00
  • e06f504a76 Supports tokens and arrays of tokens as inputs to the OpenAI completion API (#715) WanMok 2023-08-11 12:14:34 -07:00
  • 462ae5220a [Fix] unwantted bias in InternLM Model (#740) WRH 2023-08-12 02:40:37 +08:00
  • 66c54aa9c3 Check the max prompt length for the OpenAI completions API (#472) Nicolas Basile 2023-08-08 17:43:49 -07:00
  • 735ecfff61 add internlm model (#528) Jia Guoqing 2023-08-09 07:35:06 +08:00
  • a57d13cc96 add QWen-7b (#685) Qing 2023-08-09 04:50:38 +08:00
  • 79af7e96a0 [OPTIMIZATION] Optimizes the single_query_cached_kv_attention kernel (#420) Dean Leitersdorf 2023-08-04 20:57:29 +03:00
  • 621980bdc0 fix: incorrect bigcode attention heads num (#676) Wen Sun 2023-08-05 01:35:22 +08:00
  • aa84c92ef6 Bump up version to 0.1.3 (#657) v0.1.3 Zhuohan Li 2023-08-02 16:46:53 -07:00
  • f7389f4763 [Doc] Add Baichuan 13B to supported models (#656) Zhuohan Li 2023-08-02 16:45:12 -07:00
  • 55fe8a81ec Refactor scheduler (#658) Woosuk Kwon 2023-08-02 16:42:01 -07:00
  • e8ddc08ec8 [BUG FIX] upgrade fschat version to 0.2.23 (#650) YHPeter 2023-08-02 17:05:59 -04:00
  • 1b0bd0fe8a Add Falcon support (new) (#592) Zhuohan Li 2023-08-02 14:04:39 -07:00
  • 20044cab7a Fix log message in scheduler (#652) Lily Liu 2023-08-02 13:35:10 -07:00
  • 64f23c2900 fix baichuan for different position embedding for 7b and 13b models (#643) Song 2023-08-02 13:22:51 +08:00
  • d4c7755ca8 fix biachuan-7b tp (#598) Qing 2023-08-02 06:41:36 +08:00
  • aa39e42c5a fix doc (#622) Chaofan Lin 2023-08-01 04:11:57 +08:00
  • 953f28cf9a fix ModuleNotFoundError (#599) Fang li 2023-07-30 11:52:41 +08:00
  • c0d00f5be6 [Fix] fix import error of RayWorker (#604) (#605) Xudong Zhang 2023-07-28 14:37:40 +08:00
  • 58a072be15 [Fix] Add model sequence length into model config (#575) Zhuohan Li 2023-07-25 23:46:30 -07:00
  • 82ad323dee [Fix] Add chat completion Example and simplify dependencies (#576) Zhuohan Li 2023-07-25 23:45:48 -07:00
  • df5dd3c68e Add Baichuan-7B to README (#494) Zhuohan Li 2023-07-25 15:25:12 -07:00
  • 2d867b55fa fixed tensor parallel is not defined (#564) MoeedDar 2023-07-25 22:16:51 +01:00
  • d7a1c6d614 Fix paged attention testing. (#495) Tao Peng 2023-07-25 12:01:56 +08:00
  • 7d5a155e4a [Fix] Fix GPTBigcoder for distributed execution (#503) Zhuohan Li 2023-07-24 18:36:33 -07:00
  • 1dde34e0f8 GPTJConfig has no attribute rotary. (#532) leegohi04517 2023-07-25 02:29:30 +08:00
  • 6fc2a38b11 Add support for LLaMA-2 (#505) Zhuohan Li 2023-07-20 11:38:27 -07:00
  • c487a221ee Fix bad assert in initialize_cluster if PG already exists (#526) Antoni Baum 2023-07-19 23:17:12 -07:00
  • 9925c17940 Ray placement group support (#397) Antoni Baum 2023-07-19 22:49:31 -07:00
  • 8c4b2592fb fix: enable trust-remote-code in api server & benchmark. (#509) Ricardo Lu 2023-07-20 08:06:15 +08:00
  • cf21a9bd5c support trust_remote_code in benchmark (#518) WRH 2023-07-20 08:02:40 +08:00
  • 16c3e295a8 fix(ray_utils): ignore re-init error (#465) Massimiliano Pronesti 2023-07-20 02:01:19 +02:00
  • bda41c70dd hotfix attn alibi wo head mapping (#496) Song 2023-07-19 02:31:48 +08:00
  • 453bafb96f Merge pull request #498 from MoeedDar/main Lily Liu 2023-07-18 09:22:56 -07:00
  • 328d231c17 Fixed old name reference for max_seq_len MoeedDar 2023-07-18 16:47:59 +01:00
  • b4b195b360 fix max seq len (#489) Lily Liu 2023-07-17 23:20:20 -07:00
  • 20b0d88d16 Add support for baichuan (#365) codethazine 2023-07-17 21:50:55 +01:00
  • 2bdea7ac11 [Fix] Fix the condition of max_seq_len (#477) Zhuohan Li 2023-07-17 00:33:48 -04:00
  • 58df2883cb [Doc] Add doc for running vLLM on the cloud (#426) Zhanghao Wu 2023-07-16 13:37:14 -07:00
  • 6d7d95a70a Offload port selection to OS (#467) Zhangir Azerbayev 2023-07-16 02:11:02 -04:00
  • 96853af5a8 Optimize MQA Kernel (#452) Zhuohan Li 2023-07-14 20:06:40 -04:00
  • dbed69058c Fix the KeyError when loading bloom-based models (#441) Wen Sun 2023-07-14 12:58:09 +08:00
  • 7b6ae94059 add vocab padding for LLama(Support WizardLM) (#411) panda 2023-07-14 11:56:22 +08:00
  • c6dfc3cdbe Fix handling of special tokens in decoding. (#418) xcnick 2023-07-12 23:14:56 +08:00
  • 51be365143 fix: freeze pydantic to v1 (#429) Keming 2023-07-12 23:10:55 +08:00
  • c894836108 [Model] Add support for GPT-J (#226) Andre Slavescu 2023-07-08 20:55:16 -04:00
  • 75beba29b5 Don't try to load training_args.bin (#373) Fazlul Shahriar 2023-07-08 18:26:28 -04:00
  • ddfdf470ae Add trust_remote_code arg to get_config (#405) Woosuk Kwon 2023-07-08 15:24:17 -07:00
  • b6fbb9a565 Sort the outputs before return (#402) Woosuk Kwon 2023-07-08 14:48:18 -07:00
  • 2179e4f4c5 avoid python list copy in sequence initialization (#401) Lily Liu 2023-07-08 12:42:08 -07:00
  • a945fcc2ae Add trust-remote-code flag to handle remote tokenizers (#364) codethazine 2023-07-07 20:04:58 +02:00
  • be54f8e5c4 [Fix] Change /generate response-type to json for non-streaming (#374) Nicolas Frenay 2023-07-06 20:15:17 -05:00
  • b396cb4998 fix: only response [DONE] once when streaming response. (#378) Ricardo Lu 2023-07-07 09:08:40 +08:00
  • 1c395b4eaa Bump up the version (#300) v0.1.2 Woosuk Kwon 2023-07-04 21:41:53 -07:00
  • 3d64cf019e [Server] use fastchat.model.model_adapter.get_conversation_template method to get model template (#357) akxxsb 2023-07-05 12:39:59 +08:00
  • 98fe8cb542 [Server] Add option to specify chat template for chat endpoint (#345) Zhuohan Li 2023-07-03 23:01:56 -07:00
  • ffa6d2f9f9 [Docs] Fix typo (#346) Woosuk Kwon 2023-07-03 16:51:47 -07:00
  • 404422f42e [Model] Add support for MPT (#334) Woosuk Kwon 2023-07-03 16:47:53 -07:00
  • 7717d0838b Fix an endless loop issue when engine_step throws a RuntimeError (#339) coolcloudcol 2023-07-04 06:22:28 +08:00
  • 42e0c1df78 [Quality] Add CI for formatting (#343) Zhuohan Li 2023-07-03 14:50:56 -07:00
  • e41f06702c Add support for BLOOM (#331) Woosuk Kwon 2023-07-03 13:12:35 -07:00
  • d6fa1be3a8 [Quality] Add code formatter and linter (#326) Zhuohan Li 2023-07-03 11:31:55 -07:00
  • 0ffded812a [Fix] Better error message for batched prompts (#342) Zhuohan Li 2023-07-03 09:27:31 -07:00
  • 0bd2a573a5 Allow send list of str for the Prompt on openai demo endpoint /v1/completions (#323) Michele Catalano 2023-07-03 18:17:50 +02:00
  • 49b26e2cec feat: add ChatCompletion endpoint in OpenAI demo server. (#330) Ricardo Lu 2023-07-03 13:54:33 +08:00
  • dafd924c1f Raise error for long prompt (#273) Lily Liu 2023-06-30 18:48:49 -07:00