Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

29678cd213 Minor fix on AWQ kernel launch (#1356) Woosuk Kwon 2023-10-15 21:53:56 -07:00
d0740dff1b Fix error message on TORCH_CUDA_ARCH_LIST (#1239) Woosuk Kwon 2023-10-14 14:47:43 -07:00
de89472897 Fix the issue for AquilaChat2-* models (#1339) Lu Wang 2023-10-13 11:51:29 -07:00
e7c8555d06 Bump up transformers version & Remove MistralConfig (#1254) Woosuk Kwon 2023-10-13 10:05:26 -07:00
ec3b5ce9cc Improve detokenization performance (#1338) Antoni Baum 2023-10-13 09:59:07 -07:00
6368e777a8 Add Aquila2 to README (#1331) ldwang 2023-10-13 03:11:16 +08:00
875afe38ab Add blacklist in model checkpoint (#1325) Woosuk Kwon 2023-10-12 01:05:37 -07:00
ee8217e5be Add Mistral to quantization model list (#1278) amaleshvemula 2023-10-11 09:26:24 +02:00
980dd4a2c4 Fix overflow in awq kernel (#1295) CHU Tianxiang 2023-10-11 15:19:53 +08:00
8285736840 workaround of AWQ for Turing GPUs (#1252) twaka 2023-10-11 11:48:16 +09:00
91fce82c6f change the timing of sorting logits (#1309) yhlskt23 2023-10-11 11:37:42 +09:00
ac5cf86aa6 Fix __repr__ of SequenceOutputs (#1311) Wang Ran (汪然) 2023-10-11 00:58:28 +08:00
6a6119554c lock torch version to 2.0.1 (#1290) yanxiyue 2023-10-11 00:21:57 +08:00
b95ee898fe [Minor] Fix comment in mistral.py (#1303) Zhuohan Li 2023-10-09 19:44:37 -07:00
9eed4d1f3e Update README.md (#1292) Zhuohan Li 2023-10-08 23:15:50 -07:00
6b5296aa3a [FIX] Explain why the finished_reason of ignored sequences are length (#1289) Zhuohan Li 2023-10-08 15:22:38 -07:00
ee92b58b3a Move bfloat16 check to worker (#1259) Antoni Baum 2023-10-07 22:10:44 -07:00
09ff7f106a API server support ipv4 / ipv6 dualstack (#1288) Yunfeng Bai 2023-10-07 15:15:54 -07:00
acbed3ef40 Use monotonic time where appropriate (#1249) Antoni Baum 2023-10-02 19:22:05 -07:00
66d18a7fb0 add support for tokenizer revision (#1163) Federico Cassano 2023-10-02 22:19:46 -04:00
ba0bfd40e2 TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) Zhuohan Li 2023-10-02 15:36:09 -07:00
84e4e37d14 [Minor] Fix type annotations (#1238) Woosuk Kwon 2023-10-02 15:28:31 -07:00
a60b353005 support sharding llama2-70b on more than 8 GPUs (#1209) Zhuohan Li 2023-10-02 15:26:33 -07:00
ebe4d1db3a Fix boundary check in paged attention kernel (#1241) Liang 2023-10-02 02:35:06 +08:00
b5a10eb0ef Added dtype arg to benchmarks (#1228) kg6-sleipnir 2023-10-01 00:04:03 -04:00
0967102c6d fixing typo in tiiuae/falcon-rw-7b model name (#1226) Usama Ahmed 2023-09-29 23:40:25 +03:00
e2fb71ec9f Bump up the version to v0.2.0 (#1212) v0.2.0 Woosuk Kwon 2023-09-28 15:30:38 -07:00
f936657eb6 Provide default max model length (#1224) Woosuk Kwon 2023-09-28 14:44:02 -07:00
6f88f762bf Fix OOM in attention kernel test (#1223) Woosuk Kwon 2023-09-28 14:33:24 -07:00
202351d5bf Add Mistral to supported model list (#1221) Woosuk Kwon 2023-09-28 14:33:04 -07:00
2e8e49fce3 [Fix] Remove false assertion (#1222) Woosuk Kwon 2023-09-28 10:52:38 -07:00
a8e98aee0c Fix Mistral model (#1220) Woosuk Kwon 2023-09-28 10:44:05 -07:00
bb1ba58f06 [Mistral] Mistral-7B-v0.1 support (#1196) Chris Bamford 2023-09-28 19:41:03 +02:00
7bedab5748 Add rope_scaling to Qwen (#1210) Qing 2023-09-28 15:49:23 +08:00
20f7cc4cde Add skip_special_tokens sampling params (#1186) Dan Lord 2023-09-27 19:21:42 -07:00
649aa730c5 Use standard extras for uvicorn (#1166) Danilo Peixoto 2023-09-27 21:41:36 -03:00
a19bc5c628 Automatically configure max_num_batched_tokens (#1198) Woosuk Kwon 2023-09-27 16:34:00 -07:00
28e616c4e3 fix qwen-14b model (#1173) Qing 2023-09-28 07:33:16 +08:00
30e775281d fix typo (#1184) Wang Ran (汪然) 2023-09-28 07:22:45 +08:00
21877b0d75 Support Longchat and RoPE scaling (#555) Lily Liu 2023-09-27 03:36:02 -07:00
cf5cb1e33e Allocate more shared memory to attention kernel (#1154) Antoni Baum 2023-09-26 22:27:13 -07:00
03ffd0a022 Add comments on RoPE initialization (#1176) Woosuk Kwon 2023-09-26 10:48:33 -07:00
a425bd9a9a [Setup] Enable TORCH_CUDA_ARCH_LIST for selecting target GPUs (#1074) Woosuk Kwon 2023-09-26 10:21:08 -07:00
bbbf86565f Align max_tokens behavior with openai (#852) Wen Sun 2023-09-24 09:10:13 +08:00
9f6be8692e Fix config for Falcon (#1164) Woosuk Kwon 2023-09-23 17:38:43 -07:00
f187877945 [FIX] Simplify sampler logic (#1156) Zhuohan Li 2023-09-23 17:21:56 -07:00
947b794146 [Sampler] Vectorized sampling (simplified) (#1048) Zhuohan Li 2023-09-22 17:48:04 -07:00
8d926e91f1 Announce the First vLLM Meetup (#1148) Woosuk Kwon 2023-09-22 11:37:14 -07:00
4ee52bb169 Docs: Fix broken link to openai example (#1145) Nick Perez 2023-09-22 14:36:09 -04:00
7d7e3b78a3 Use --ipc=host in docker run for distributed inference (#1125) Woosuk Kwon 2023-09-21 18:26:47 -07:00
f98b745a81 feat: support stop_token_ids parameter. (#1097) Ricardo Lu 2023-09-22 06:34:02 +08:00
2d1e86f1b1 clean api code, remove redundant background task. (#1102) Roy 2023-09-22 04:25:05 +08:00
1ac4ccf73c Add float16 and float32 (#1115) Woosuk Kwon 2023-09-21 00:52:47 -07:00
2ac4d5e2bf Replace DtypeTensor (#1123) Woosuk Kwon 2023-09-21 00:51:47 -07:00
3302f0aef3 rope_theta and max_position_embeddings from config (#1096) Antoni Baum 2023-09-20 13:35:11 -07:00
6f2dd6c37e Add documentation to Triton server tutorial (#983) Tanmay Verma 2023-09-20 10:32:40 -07:00
bc0644574c Add gpu_memory_utilization and swap_space to LLM (#1090) Woosuk Kwon 2023-09-19 22:16:04 -07:00
400b8289f7 Add pyarrow to dependencies & Print warning on Ray import error (#1094) Woosuk Kwon 2023-09-18 22:36:17 -07:00
c1026311b5 [Community] Add vLLM Discord server (#1086) Zhuohan Li 2023-09-18 12:23:35 -07:00
2b1c116b5a Add minimum capability requirement for AWQ (#1064) Woosuk Kwon 2023-09-18 12:02:01 -07:00
cc796b1358 Convert before transpose (#1073) Woosuk Kwon 2023-09-18 11:51:48 -07:00
f029ef94d7 Fix get_max_num_running_seqs for waiting and swapped seq groups (#1068) Zhuohan Li 2023-09-18 11:49:40 -07:00
95592fa00a align llm_engine and async_engine. (#1081) Roy 2023-09-19 02:49:10 +08:00
fbe66e1d0b added support for quantize on LLM module (#1080) orellavie1212 2023-09-18 21:04:21 +03:00
90979c38f8 [FIX] Don't initialize parameter by default (#1067) Zhuohan Li 2023-09-17 17:15:38 -07:00
e21d7687a9 Fix hanging when prompt exceeds limit (#1029) 陈序 2023-09-17 16:48:56 +08:00
ff36139ffc Remove AsyncLLMEngine busy loop, shield background task (#1059) Antoni Baum 2023-09-17 00:29:08 -07:00
e3e79e9e8a Implement AWQ quantization support for LLaMA (#1032) Woosuk Kwon 2023-09-16 00:03:37 -07:00
b9fe4616f9 Abort when coroutine is cancelled (#1020) Jerry Yang 2023-09-15 08:40:18 +08:00
64ca424e75 Fix warning message on LLaMA FastTokenizer (#1037) Woosuk Kwon 2023-09-14 17:33:32 -07:00
b5f93d0631 Only fail if logit_bias has actual values (#1045) Lukas Kreussel 2023-09-15 02:33:01 +02:00
a58936966f Add pandas to requirements.txt (#1047) Woosuk Kwon 2023-09-14 17:31:38 -07:00
dd54a4b026 Fix detokenization leaving special tokens (#1044) Antoni Baum 2023-09-14 16:37:03 -07:00
eda1a7cad3 Announce paper release (#1036) Woosuk Kwon 2023-09-13 17:38:13 -07:00
f04908cae7 [FIX] Minor bug fixes (#1035) Zhuohan Li 2023-09-13 16:38:12 -07:00
ab019eea75 Add Model Revision Support (#1014) Jasmond L 2023-09-14 06:20:02 +08:00
9841d48a10 Use TGI-like incremental detokenization (#984) Antoni Baum 2023-09-13 13:38:01 -07:00
3272d7a0b7 Fix typo in README.md (#1033) Ikko Eltociear Ashimine 2023-09-14 04:55:23 +09:00
0bb1e885a0 Make max_model_len configurable (#972) Antoni Baum 2023-09-12 16:29:19 -07:00
d6545ad22e add option to shorten prompt print in log (#991) leiwen83 2023-09-13 06:10:14 +08:00
90eb3f43ca Bump up the version to v0.1.7 (#1013) v0.1.7 Woosuk Kwon 2023-09-11 00:54:30 -07:00
e67b4f2c2a Use FP32 in RoPE initialization (#1004) Woosuk Kwon 2023-09-11 00:26:35 -07:00
d6770d1f23 Update setup.py (#1006) Woosuk Kwon 2023-09-10 23:42:45 -07:00
b9cecc2635 [Docs] Update installation page (#1005) Woosuk Kwon 2023-09-10 14:23:31 -07:00
898285c9bf fix: CUDA error when inferencing with Falcon-40B base model (#992) Kyujin Cho 2023-09-10 17:39:02 +09:00
a62de9ecfd Fix wrong dtype in PagedAttentionWithALiBi bias (#996) Antoni Baum 2023-09-09 14:58:35 -07:00
4042d192f5 fix "tansformers_module" ModuleNotFoundError when load model with trust_remote_code=True (#871) Jingru 2023-09-09 08:21:30 +08:00
1117aa1411 Bump up the version to v0.1.6 (#989) v0.1.6 Zhuohan Li 2023-09-08 00:07:46 -07:00
080438477f Start background task in AsyncLLMEngine.generate (#988) Antoni Baum 2023-09-08 00:03:39 -07:00
4b5bcf8906 faster startup of vLLM (#982) Robert Irvine 2023-09-08 06:48:54 +01:00
852ef5b4f5 Bump up the version to v0.1.5 (#944) v0.1.5 Woosuk Kwon 2023-09-08 08:15:31 +09:00
db09d4ad83 [FIX] Fix Alibi implementation in PagedAttention kernel (#945) Zhuohan Li 2023-09-07 15:53:14 -07:00
c957c741d9 Enable safetensors loading for all models (#974) Zhuohan Li 2023-09-07 15:49:52 -07:00
c07ece5ca4 Make AsyncLLMEngine more robust & fix batched abort (#969) Antoni Baum 2023-09-07 13:43:45 -07:00
7a9c20c715 Bum up transformers version (#976) Woosuk Kwon 2023-09-08 05:15:53 +09:00
005ba458b5 Set torch default dtype in a context manager (#971) Antoni Baum 2023-09-06 23:39:37 -07:00
320a622ec4 [BugFix] Implement RoPE for GPT-J (#941) Woosuk Kwon 2023-09-06 11:54:33 +09:00
c9927c1a6a Use queue for finished requests (#957) Antoni Baum 2023-09-05 19:27:23 -07:00
fbd80ad409 Clean up kernel unit tests (#938) Woosuk Kwon 2023-09-06 08:57:38 +09:00
22379d5513 fix: typo (#948) Wen Sun 2023-09-05 14:22:30 +08:00

... 154 155 156 157 158 ...