Woosuk Kwon
|
928de46888
|
Implement PagedAttention V2 (#1348)
|
2023-10-16 00:59:57 -07:00 |
|
Lu Wang
|
de89472897
|
Fix the issue for AquilaChat2-* models (#1339)
|
2023-10-13 11:51:29 -07:00 |
|
Woosuk Kwon
|
e7c8555d06
|
Bump up transformers version & Remove MistralConfig (#1254)
|
2023-10-13 10:05:26 -07:00 |
|
Antoni Baum
|
ec3b5ce9cc
|
Improve detokenization performance (#1338)
|
2023-10-13 09:59:07 -07:00 |
|
Woosuk Kwon
|
875afe38ab
|
Add blacklist in model checkpoint (#1325)
|
2023-10-12 01:05:37 -07:00 |
|
amaleshvemula
|
ee8217e5be
|
Add Mistral to quantization model list (#1278)
|
2023-10-11 00:26:24 -07:00 |
|
twaka
|
8285736840
|
workaround of AWQ for Turing GPUs (#1252)
|
2023-10-10 19:48:16 -07:00 |
|
yhlskt23
|
91fce82c6f
|
change the timing of sorting logits (#1309)
|
2023-10-10 19:37:42 -07:00 |
|
Wang Ran (汪然)
|
ac5cf86aa6
|
Fix __repr__ of SequenceOutputs (#1311)
|
2023-10-10 09:58:28 -07:00 |
|
Zhuohan Li
|
b95ee898fe
|
[Minor] Fix comment in mistral.py (#1303)
|
2023-10-09 19:44:37 -07:00 |
|
Zhuohan Li
|
6b5296aa3a
|
[FIX] Explain why the finished_reason of ignored sequences are length (#1289)
|
2023-10-08 15:22:38 -07:00 |
|
Antoni Baum
|
ee92b58b3a
|
Move bfloat16 check to worker (#1259)
|
2023-10-07 22:10:44 -07:00 |
|
Yunfeng Bai
|
09ff7f106a
|
API server support ipv4 / ipv6 dualstack (#1288)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-10-07 15:15:54 -07:00 |
|
Antoni Baum
|
acbed3ef40
|
Use monotonic time where appropriate (#1249)
|
2023-10-02 19:22:05 -07:00 |
|
Federico Cassano
|
66d18a7fb0
|
add support for tokenizer revision (#1163)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-10-02 19:19:46 -07:00 |
|
Zhuohan Li
|
ba0bfd40e2
|
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181)
|
2023-10-02 15:36:09 -07:00 |
|
Woosuk Kwon
|
84e4e37d14
|
[Minor] Fix type annotations (#1238)
|
2023-10-02 15:28:31 -07:00 |
|
Zhuohan Li
|
a60b353005
|
support sharding llama2-70b on more than 8 GPUs (#1209)
Co-authored-by: JiCheng <247153481@qq.com>
|
2023-10-02 15:26:33 -07:00 |
|
Woosuk Kwon
|
e2fb71ec9f
|
Bump up the version to v0.2.0 (#1212)
Create Release / Create Release (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.10) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.11) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.8) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.9) (push) Has been cancelled
|
2023-09-28 15:30:38 -07:00 |
|
Woosuk Kwon
|
f936657eb6
|
Provide default max model length (#1224)
|
2023-09-28 14:44:02 -07:00 |
|
Woosuk Kwon
|
2e8e49fce3
|
[Fix] Remove false assertion (#1222)
|
2023-09-28 10:52:38 -07:00 |
|
Woosuk Kwon
|
a8e98aee0c
|
Fix Mistral model (#1220)
|
2023-09-28 10:44:05 -07:00 |
|
Chris Bamford
|
bb1ba58f06
|
[Mistral] Mistral-7B-v0.1 support (#1196)
Co-authored-by: timlacroix <t@mistral.ai>
|
2023-09-28 10:41:03 -07:00 |
|
Qing
|
7bedab5748
|
Add rope_scaling to Qwen (#1210)
|
2023-09-28 00:49:23 -07:00 |
|
Dan Lord
|
20f7cc4cde
|
Add skip_special_tokens sampling params (#1186)
|
2023-09-27 19:21:42 -07:00 |
|
Woosuk Kwon
|
a19bc5c628
|
Automatically configure max_num_batched_tokens (#1198)
|
2023-09-27 16:34:00 -07:00 |
|
Qing
|
28e616c4e3
|
fix qwen-14b model (#1173)
|
2023-09-27 16:33:16 -07:00 |
|
Wang Ran (汪然)
|
30e775281d
|
fix typo (#1184)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-09-27 16:22:45 -07:00 |
|
Lily Liu
|
21877b0d75
|
Support Longchat and RoPE scaling (#555)
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2023-09-27 03:36:02 -07:00 |
|
Antoni Baum
|
cf5cb1e33e
|
Allocate more shared memory to attention kernel (#1154)
|
2023-09-26 22:27:13 -07:00 |
|
Woosuk Kwon
|
03ffd0a022
|
Add comments on RoPE initialization (#1176)
|
2023-09-26 10:48:33 -07:00 |
|
Wen Sun
|
bbbf86565f
|
Align max_tokens behavior with openai (#852)
|
2023-09-23 18:10:13 -07:00 |
|
Woosuk Kwon
|
9f6be8692e
|
Fix config for Falcon (#1164)
|
2023-09-23 17:38:43 -07:00 |
|
Zhuohan Li
|
f187877945
|
[FIX] Simplify sampler logic (#1156)
|
2023-09-23 17:21:56 -07:00 |
|
Zhuohan Li
|
947b794146
|
[Sampler] Vectorized sampling (simplified) (#1048)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
|
2023-09-22 17:48:04 -07:00 |
|
Ricardo Lu
|
f98b745a81
|
feat: support stop_token_ids parameter. (#1097)
|
2023-09-21 15:34:02 -07:00 |
|
Roy
|
2d1e86f1b1
|
clean api code, remove redundant background task. (#1102)
|
2023-09-21 13:25:05 -07:00 |
|
Woosuk Kwon
|
1ac4ccf73c
|
Add float16 and float32 (#1115)
|
2023-09-21 00:52:47 -07:00 |
|
Woosuk Kwon
|
2ac4d5e2bf
|
Replace DtypeTensor (#1123)
|
2023-09-21 00:51:47 -07:00 |
|
Antoni Baum
|
3302f0aef3
|
rope_theta and max_position_embeddings from config (#1096)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: wnma3mz <wnma3mz@gmail.com>
|
2023-09-20 13:35:11 -07:00 |
|
Woosuk Kwon
|
bc0644574c
|
Add gpu_memory_utilization and swap_space to LLM (#1090)
|
2023-09-19 22:16:04 -07:00 |
|
Woosuk Kwon
|
400b8289f7
|
Add pyarrow to dependencies & Print warning on Ray import error (#1094)
|
2023-09-18 22:36:17 -07:00 |
|
Woosuk Kwon
|
2b1c116b5a
|
Add minimum capability requirement for AWQ (#1064)
|
2023-09-18 12:02:01 -07:00 |
|
Woosuk Kwon
|
cc796b1358
|
Convert before transpose (#1073)
|
2023-09-18 11:51:48 -07:00 |
|
Zhuohan Li
|
f029ef94d7
|
Fix get_max_num_running_seqs for waiting and swapped seq groups (#1068)
|
2023-09-18 11:49:40 -07:00 |
|
Roy
|
95592fa00a
|
align llm_engine and async_engine. (#1081)
|
2023-09-18 11:49:10 -07:00 |
|
orellavie1212
|
fbe66e1d0b
|
added support for quantize on LLM module (#1080)
|
2023-09-18 11:04:21 -07:00 |
|
Zhuohan Li
|
90979c38f8
|
[FIX] Don't initialize parameter by default (#1067)
|
2023-09-17 17:15:38 -07:00 |
|
陈序
|
e21d7687a9
|
Fix hanging when prompt exceeds limit (#1029)
|
2023-09-17 01:48:56 -07:00 |
|
Antoni Baum
|
ff36139ffc
|
Remove AsyncLLMEngine busy loop, shield background task (#1059)
|
2023-09-17 00:29:08 -07:00 |
|