Woosuk Kwon
|
054072bee5
|
[Minor] Move RoPE selection logic to get_rope (#1633)
|
2023-11-12 16:04:50 -08:00 |
|
lirui
|
eb825c1e74
|
Fix #1474 - AssertionError:assert param_slice.shape == loaded_weight.shape (#1631)
|
2023-11-12 15:53:12 -08:00 |
|
Dominik Schwabe
|
1b290ace4f
|
Run default _AsyncLLMEngine._run_workers_async in threadpool (#1628)
|
2023-11-11 14:50:44 -08:00 |
|
Sin
|
0d578228ca
|
config parser: add ChatGLM2 seq_length to _get_and_verify_max_len (#1617)
|
2023-11-09 19:29:51 -08:00 |
|
GhaziSyed
|
aebfcb262a
|
Dockerfile: Upgrade Cuda to 12.1 (#1609)
|
2023-11-09 11:49:02 -08:00 |
|
forpanyang
|
ab9e8488d5
|
Add Yi model to quantization support (#1600)
|
2023-11-09 11:47:14 -08:00 |
|
Woosuk Kwon
|
fd58b73a40
|
Build CUDA11.8 wheels for release (#1596)
|
2023-11-09 03:52:29 -08:00 |
|
Yanming W
|
8efe23f150
|
Fix input_metadata.selected_token_indices in worker prepare_inputs (#1546)
|
2023-11-08 14:19:12 -08:00 |
|
Zhuohan Li
|
06458a0b42
|
Upgrade to CUDA 12 (#1527)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2023-11-08 14:17:49 -08:00 |
|
GoHomeToMacDonal
|
1a2bbc9301
|
ChatGLM Support (#1261)
|
2023-11-06 16:09:33 -08:00 |
|
Roy
|
e7f579eb97
|
Support Yi model (#1567)
|
2023-11-06 15:26:03 -08:00 |
|
Casper
|
8516999495
|
Add Quantization and AutoAWQ to docs (#1235)
|
2023-11-04 22:43:39 -07:00 |
|
Antoni Baum
|
9f669a9a7c
|
Support YaRN models (#1264)
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Viktor Ferenczi <viktor@ferenczi.eu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2023-11-03 14:12:48 -07:00 |
|
Noam Gat
|
555bdcc5a3
|
Added logits processor API to sampling params (#1469)
|
2023-11-03 14:12:15 -07:00 |
|
lots-o
|
54ca1ba71d
|
docs: add description (#1553)
|
2023-11-03 09:14:52 -07:00 |
|
Antoni Baum
|
9738b84a08
|
Force paged attention v2 for long contexts (#1510)
|
2023-11-01 16:24:32 -07:00 |
|
Woosuk Kwon
|
1fe0990023
|
Remove MPTConfig (#1529)
|
2023-11-01 15:29:05 -07:00 |
|
Fluder-Paradyne
|
7e90a2d117
|
Add /health Endpoint for both Servers (#1540)
|
2023-11-01 10:29:44 -07:00 |
|
ljss
|
5687d584fe
|
[BugFix] Set engine_use_ray=True when TP>1 (#1531)
|
2023-11-01 02:14:18 -07:00 |
|
Wenfei Yan
|
cf8849f2d6
|
Add MptForCausalLM key in model_loader (#1526)
|
2023-10-31 15:46:53 -07:00 |
|
Cade Daniel
|
e575df33b1
|
[Small] Formatter only checks lints in changed files (#1528)
|
2023-10-31 15:39:38 -07:00 |
|
Woosuk Kwon
|
0ce8647dc5
|
Fix integer overflows in attention & cache ops (#1514)
|
2023-10-31 15:19:30 -07:00 |
|
Stephen Krider
|
9cabcb7645
|
Add Dockerfile (#1350)
|
2023-10-31 12:36:47 -07:00 |
|
Zhuohan Li
|
7b895c5976
|
[Fix] Fix duplicated logging messages (#1524)
|
2023-10-31 09:04:47 -07:00 |
|
Dan Lord
|
7013a80170
|
Add support for spaces_between_special_tokens
|
2023-10-30 16:52:56 -07:00 |
|
Jared Roesch
|
79a30912b8
|
Add py.typed so consumers of vLLM can get type checking (#1509)
* Add py.typed so consumers of vLLM can get type checking
* Update py.typed
---------
Co-authored-by: aarnphm <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-10-30 14:50:47 -07:00 |
|
Adam Brusselback
|
2f3d36a8a1
|
Fix logging so we actually get info level entries in the log. (#1494)
|
2023-10-30 10:02:21 -07:00 |
|
iongpt
|
ac8d36f3e5
|
Refactor LLMEngine demo script for clarity and modularity (#1413)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-10-30 09:14:37 -07:00 |
|
Antoni Baum
|
15f5632365
|
Delay GPU->CPU sync in sampling (#1337)
|
2023-10-30 09:01:34 -07:00 |
|
Woosuk Kwon
|
aa9af07cac
|
Fix bias in InternLM (#1501)
|
2023-10-29 16:24:18 -07:00 |
|
ljss
|
69be658bba
|
Support repetition_penalty (#1424)
|
2023-10-29 10:02:41 -07:00 |
|
Ricardo Lu
|
beac8dd461
|
fix: don't skip first special token. (#1497)
|
2023-10-29 04:26:36 -07:00 |
|
Qing
|
28b47d1e49
|
Add rope_scaling to Aquila model (#1457)
|
2023-10-29 04:25:21 -07:00 |
|
chooper1
|
1f24755bf8
|
Support SqueezeLLM (#1326)
Co-authored-by: squeeze-ai-lab <squeezeailab.bair@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2023-10-21 23:14:59 -07:00 |
|
Thiago Salvatore
|
bf31d3606a
|
Pin pydantic dependency versions (#1429)
|
2023-10-21 11:18:58 -07:00 |
|
Wang Ran (汪然)
|
d189170b6c
|
remove useless statements (#1408)
|
2023-10-20 08:52:07 -07:00 |
|
Light Lin
|
f61dc8072f
|
Fix type hints (#1427)
|
2023-10-20 08:50:47 -07:00 |
|
Woosuk Kwon
|
f8a1e39fae
|
[BugFix] Define __eq__ in SequenceGroupOutputs (#1389)
|
2023-10-17 01:09:44 -07:00 |
|
Wang Ran (汪然)
|
a132435204
|
Fix typo (#1383)
|
2023-10-16 21:53:37 -07:00 |
|
Woosuk Kwon
|
9524867701
|
Add Mistral 7B to test_models (#1366)
|
2023-10-16 17:49:54 -07:00 |
|
Woosuk Kwon
|
c1376e0f82
|
Change scheduler & input tensor shape (#1381)
|
2023-10-16 17:48:42 -07:00 |
|
Zhuohan Li
|
651c614aa4
|
Bump up the version to v0.2.1 (#1355)
Create Release / Create Release (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.10, 2.0.1) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.11, 2.0.1) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.8, 2.0.1) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.9, 2.0.1) (push) Has been cancelled
v0.2.1
|
2023-10-16 12:58:57 -07:00 |
|
Woosuk Kwon
|
d3a5bd9fb7
|
Fix sampler test (#1379)
|
2023-10-16 12:57:26 -07:00 |
|
Woosuk Kwon
|
e8ef4c0820
|
Fix PyTorch index URL in workflow (#1378)
|
2023-10-16 12:37:56 -07:00 |
|
Woosuk Kwon
|
348897af31
|
Fix PyTorch version to 2.0.1 in workflow (#1377)
|
2023-10-16 11:27:17 -07:00 |
|
Zhuohan Li
|
9d9072a069
|
Implement prompt logprobs & Batched topk for computing logprobs (#1328)
Co-authored-by: Yunmo Chen <16273544+wanmok@users.noreply.github.com>
|
2023-10-16 10:56:50 -07:00 |
|
Woosuk Kwon
|
928de46888
|
Implement PagedAttention V2 (#1348)
|
2023-10-16 00:59:57 -07:00 |
|
Woosuk Kwon
|
29678cd213
|
Minor fix on AWQ kernel launch (#1356)
|
2023-10-15 21:53:56 -07:00 |
|
Woosuk Kwon
|
d0740dff1b
|
Fix error message on TORCH_CUDA_ARCH_LIST (#1239)
Co-authored-by: Yunfeng Bai <yunfeng.bai@scale.com>
|
2023-10-14 14:47:43 -07:00 |
|
Lu Wang
|
de89472897
|
Fix the issue for AquilaChat2-* models (#1339)
|
2023-10-13 11:51:29 -07:00 |
|