Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

1696725879 Initialize AsyncLLMEngine bg loop correctly (#943) Antoni Baum 2023-09-04 17:41:22 -07:00
002800f081 Align vLLM's beam search implementation with HF generate (#857) Zhuohan Li 2023-09-04 17:29:42 -07:00
e15932bb60 Only emit warning about internal tokenizer if it isn't being used (#939) Nelson Liu 2023-09-04 08:50:55 -07:00
ce741ba3e4 Refactor AsyncLLMEngine (#880) Antoni Baum 2023-09-03 21:43:43 -07:00
bf87484efa [BugFix] Fix NaN errors in paged attention kernel (#936) Woosuk Kwon 2023-09-04 09:20:06 +09:00
8ce9c50d40 Avoid compiling kernels for double data type (#933) Woosuk Kwon 2023-09-02 14:59:47 +09:00
32b6816e55 Add tests for models (#922) Woosuk Kwon 2023-09-01 11:19:43 +09:00
c128d69856 Fix README.md Link (#927) Zhuohan Li 2023-08-31 17:18:34 -07:00
55b28b1eee [Docs] Minor fixes in supported models (#920) Woosuk Kwon 2023-09-01 08:28:39 +09:00
e11222333f fix: bug fix when penalties are negative (#913) Dong-Yong Lee 2023-09-01 00:37:17 +09:00
28873a2799 Improve _prune_hidden_states micro-benchmark (#707) Aman Gupta Karmani 2023-08-31 00:28:43 -04:00
0080d8329d Add acknowledgement to a16z grant Zhuohan Li 2023-08-30 02:17:27 -07:00
0d93f15694 Accelerate LLaMA model loading (#234) JFDuan 2023-08-30 16:00:13 +08:00
becd7a56f1 Enable request body OpenAPI spec for OpenAI endpoints (#865) lplcor 2023-08-29 21:54:08 -07:00
75471386de use flash-attn via xformers (#877) Aman Gupta Karmani 2023-08-30 00:52:13 -04:00
d2b2eed67c [Fix] Fix a condition for ignored sequences (#867) Zhuohan Li 2023-08-27 23:00:56 -07:00
4b6f069b6f Add support for CodeLlama (#854) Antoni Baum 2023-08-25 12:44:07 -07:00
791d79de32 Bump up the version to v0.1.4 (#846) v0.1.4 Woosuk Kwon 2023-08-25 12:28:00 +09:00
94d2f59895 Set replacement=True in torch.multinomial (#858) Woosuk Kwon 2023-08-25 12:22:01 +09:00
75c0ca9d43 Clean up code (#844) wenjun93 2023-08-24 07:44:15 +08:00
2a4ec90854 Fix for breaking changes in xformers 0.0.21 (#834) Woosuk Kwon 2023-08-23 17:44:21 +09:00
85ebcda94d Fix typo of Aquila in README.md (#836) ldwang 2023-08-23 11:48:36 +08:00
d64bf1646c Implement approximate GELU kernels (#828) Woosuk Kwon 2023-08-23 07:43:21 +09:00
a41c20435e Add compute capability 8.9 to default targets (#829) Woosuk Kwon 2023-08-23 07:28:38 +09:00
eedac9dba0 fix: revert code to avoid no attribute problem (#827) Wen Sun 2023-08-23 02:55:16 +08:00
14f9c72bfd Update Supported Model List (#825) Zhuohan Li 2023-08-22 11:51:44 -07:00
ad5f2fe34c Add support for aquila (#663) shunxing1234 2023-08-22 15:13:36 +08:00
4f8584756d Fix mqa is false case in gpt_bigcode (#806) zhaoyang-star 2023-08-22 13:22:06 +08:00
65fc1c3127 set default coompute capability according to cuda version (#773) Xudong Zhang 2023-08-22 07:05:44 +08:00
c393af6cd7 [Feature | CI] Added a github action to build wheels (#746) Daniel 2023-08-21 10:59:15 +03:00
0c04ce3234 Fix typo in sampling_params.py (#788) wangcx18 2023-08-18 09:12:46 +08:00
73b3de79ea explicitly del state (#784) Xinyu Yang 2023-08-18 03:56:04 +08:00
d1744376ae Align with huggingface Top K sampling (#753) Abraham-Xu 2023-08-16 07:44:33 +08:00
805de738f6 Fix typo in tokenizer.py (#750) Ikko Eltociear Ashimine 2023-08-15 14:26:36 +09:00
1b151ed181 Fix baichuan doc style (#748) Uranus 2023-08-14 11:57:31 +08:00
e06f504a76 Supports tokens and arrays of tokens as inputs to the OpenAI completion API (#715) WanMok 2023-08-11 12:14:34 -07:00
462ae5220a [Fix] unwantted bias in InternLM Model (#740) WRH 2023-08-12 02:40:37 +08:00
66c54aa9c3 Check the max prompt length for the OpenAI completions API (#472) Nicolas Basile 2023-08-08 17:43:49 -07:00
735ecfff61 add internlm model (#528) Jia Guoqing 2023-08-09 07:35:06 +08:00
a57d13cc96 add QWen-7b (#685) Qing 2023-08-09 04:50:38 +08:00
79af7e96a0 [OPTIMIZATION] Optimizes the single_query_cached_kv_attention kernel (#420) Dean Leitersdorf 2023-08-04 20:57:29 +03:00
621980bdc0 fix: incorrect bigcode attention heads num (#676) Wen Sun 2023-08-05 01:35:22 +08:00
aa84c92ef6 Bump up version to 0.1.3 (#657) v0.1.3 Zhuohan Li 2023-08-02 16:46:53 -07:00
f7389f4763 [Doc] Add Baichuan 13B to supported models (#656) Zhuohan Li 2023-08-02 16:45:12 -07:00
55fe8a81ec Refactor scheduler (#658) Woosuk Kwon 2023-08-02 16:42:01 -07:00
e8ddc08ec8 [BUG FIX] upgrade fschat version to 0.2.23 (#650) YHPeter 2023-08-02 17:05:59 -04:00
1b0bd0fe8a Add Falcon support (new) (#592) Zhuohan Li 2023-08-02 14:04:39 -07:00
20044cab7a Fix log message in scheduler (#652) Lily Liu 2023-08-02 13:35:10 -07:00
64f23c2900 fix baichuan for different position embedding for 7b and 13b models (#643) Song 2023-08-02 13:22:51 +08:00
d4c7755ca8 fix biachuan-7b tp (#598) Qing 2023-08-02 06:41:36 +08:00
aa39e42c5a fix doc (#622) Chaofan Lin 2023-08-01 04:11:57 +08:00
953f28cf9a fix ModuleNotFoundError (#599) Fang li 2023-07-30 11:52:41 +08:00
c0d00f5be6 [Fix] fix import error of RayWorker (#604) (#605) Xudong Zhang 2023-07-28 14:37:40 +08:00
58a072be15 [Fix] Add model sequence length into model config (#575) Zhuohan Li 2023-07-25 23:46:30 -07:00
82ad323dee [Fix] Add chat completion Example and simplify dependencies (#576) Zhuohan Li 2023-07-25 23:45:48 -07:00
df5dd3c68e Add Baichuan-7B to README (#494) Zhuohan Li 2023-07-25 15:25:12 -07:00
2d867b55fa fixed tensor parallel is not defined (#564) MoeedDar 2023-07-25 22:16:51 +01:00
d7a1c6d614 Fix paged attention testing. (#495) Tao Peng 2023-07-25 12:01:56 +08:00
7d5a155e4a [Fix] Fix GPTBigcoder for distributed execution (#503) Zhuohan Li 2023-07-24 18:36:33 -07:00
1dde34e0f8 GPTJConfig has no attribute rotary. (#532) leegohi04517 2023-07-25 02:29:30 +08:00
6fc2a38b11 Add support for LLaMA-2 (#505) Zhuohan Li 2023-07-20 11:38:27 -07:00
c487a221ee Fix bad assert in initialize_cluster if PG already exists (#526) Antoni Baum 2023-07-19 23:17:12 -07:00
9925c17940 Ray placement group support (#397) Antoni Baum 2023-07-19 22:49:31 -07:00
8c4b2592fb fix: enable trust-remote-code in api server & benchmark. (#509) Ricardo Lu 2023-07-20 08:06:15 +08:00
cf21a9bd5c support trust_remote_code in benchmark (#518) WRH 2023-07-20 08:02:40 +08:00
16c3e295a8 fix(ray_utils): ignore re-init error (#465) Massimiliano Pronesti 2023-07-20 02:01:19 +02:00
bda41c70dd hotfix attn alibi wo head mapping (#496) Song 2023-07-19 02:31:48 +08:00
453bafb96f Merge pull request #498 from MoeedDar/main Lily Liu 2023-07-18 09:22:56 -07:00
328d231c17 Fixed old name reference for max_seq_len MoeedDar 2023-07-18 16:47:59 +01:00
b4b195b360 fix max seq len (#489) Lily Liu 2023-07-17 23:20:20 -07:00
20b0d88d16 Add support for baichuan (#365) codethazine 2023-07-17 21:50:55 +01:00
2bdea7ac11 [Fix] Fix the condition of max_seq_len (#477) Zhuohan Li 2023-07-17 00:33:48 -04:00
58df2883cb [Doc] Add doc for running vLLM on the cloud (#426) Zhanghao Wu 2023-07-16 13:37:14 -07:00
6d7d95a70a Offload port selection to OS (#467) Zhangir Azerbayev 2023-07-16 02:11:02 -04:00
96853af5a8 Optimize MQA Kernel (#452) Zhuohan Li 2023-07-14 20:06:40 -04:00
dbed69058c Fix the KeyError when loading bloom-based models (#441) Wen Sun 2023-07-14 12:58:09 +08:00
7b6ae94059 add vocab padding for LLama(Support WizardLM) (#411) panda 2023-07-14 11:56:22 +08:00
c6dfc3cdbe Fix handling of special tokens in decoding. (#418) xcnick 2023-07-12 23:14:56 +08:00
51be365143 fix: freeze pydantic to v1 (#429) Keming 2023-07-12 23:10:55 +08:00
c894836108 [Model] Add support for GPT-J (#226) Andre Slavescu 2023-07-08 20:55:16 -04:00
75beba29b5 Don't try to load training_args.bin (#373) Fazlul Shahriar 2023-07-08 18:26:28 -04:00
ddfdf470ae Add trust_remote_code arg to get_config (#405) Woosuk Kwon 2023-07-08 15:24:17 -07:00
b6fbb9a565 Sort the outputs before return (#402) Woosuk Kwon 2023-07-08 14:48:18 -07:00
2179e4f4c5 avoid python list copy in sequence initialization (#401) Lily Liu 2023-07-08 12:42:08 -07:00
a945fcc2ae Add trust-remote-code flag to handle remote tokenizers (#364) codethazine 2023-07-07 20:04:58 +02:00
be54f8e5c4 [Fix] Change /generate response-type to json for non-streaming (#374) Nicolas Frenay 2023-07-06 20:15:17 -05:00
b396cb4998 fix: only response [DONE] once when streaming response. (#378) Ricardo Lu 2023-07-07 09:08:40 +08:00
1c395b4eaa Bump up the version (#300) v0.1.2 Woosuk Kwon 2023-07-04 21:41:53 -07:00
3d64cf019e [Server] use fastchat.model.model_adapter.get_conversation_template method to get model template (#357) akxxsb 2023-07-05 12:39:59 +08:00
98fe8cb542 [Server] Add option to specify chat template for chat endpoint (#345) Zhuohan Li 2023-07-03 23:01:56 -07:00
ffa6d2f9f9 [Docs] Fix typo (#346) Woosuk Kwon 2023-07-03 16:51:47 -07:00
404422f42e [Model] Add support for MPT (#334) Woosuk Kwon 2023-07-03 16:47:53 -07:00
7717d0838b Fix an endless loop issue when engine_step throws a RuntimeError (#339) coolcloudcol 2023-07-04 06:22:28 +08:00
42e0c1df78 [Quality] Add CI for formatting (#343) Zhuohan Li 2023-07-03 14:50:56 -07:00
e41f06702c Add support for BLOOM (#331) Woosuk Kwon 2023-07-03 13:12:35 -07:00
d6fa1be3a8 [Quality] Add code formatter and linter (#326) Zhuohan Li 2023-07-03 11:31:55 -07:00
0ffded812a [Fix] Better error message for batched prompts (#342) Zhuohan Li 2023-07-03 09:27:31 -07:00
0bd2a573a5 Allow send list of str for the Prompt on openai demo endpoint /v1/completions (#323) Michele Catalano 2023-07-03 18:17:50 +02:00
49b26e2cec feat: add ChatCompletion endpoint in OpenAI demo server. (#330) Ricardo Lu 2023-07-03 13:54:33 +08:00
dafd924c1f Raise error for long prompt (#273) Lily Liu 2023-06-30 18:48:49 -07:00

... 155 156 157 158 159 ...