Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

c782195662 Disable Logs Requests should Disable Logging of requests. (#1779) Michael McCulloch 2023-11-29 22:50:02 -07:00
0f621c2c7d [Docs] Add information about using shared memory in docker (#1845) Simon Mo 2023-11-29 18:33:56 -08:00
a9e4574261 Refactor Attention (#1840) Woosuk Kwon 2023-11-29 15:37:31 -08:00
0229c386c5 Better integration with Ray Serve (#1821) FlorianJoncour 2023-11-29 21:25:43 +00:00
a7b3e33078 [Fix] Fix RoPE in ChatGLM-32K (#1841) Woosuk Kwon 2023-11-29 13:01:19 -08:00
e19a64c7ef [FIX] Fix formatting error in main branch (#1822) Zhuohan Li 2023-11-28 16:56:43 -08:00
1cb4ad8de9 [FIX] Fix formatting error Zhuohan Li 2023-11-29 00:40:19 +00:00
6ed068a71a Use the type BlockTable (#1791) explainerauthors 2023-11-28 16:34:05 -08:00
708e6c18b0 [FIX] Fix class naming (#1803) Zhuohan Li 2023-11-28 14:08:01 -08:00
b943890484 Fix OPT param names (#1819) Woosuk Kwon 2023-11-28 11:22:44 -08:00
a1125ad4df Correct comments in parallel_state.py (#1818) explainerauthors 2023-11-28 10:19:35 -08:00
a8b150c595 Init model on GPU to reduce CPU memory footprint (#1796) ljss 2023-11-28 03:18:26 +08:00
665cbcec4b Added echo function to OpenAI API server. (#1504) Yunmo Chen 2023-11-27 13:29:17 +08:00
7c600440f7 Fix model docstrings (#1764) Woosuk Kwon 2023-11-23 23:04:44 -08:00
e0c6f556e8 [Build] Avoid building too many extensions (#1624) Yanming W 2023-11-23 16:31:19 -08:00
de23687d16 Fix repetition penalty aligned with huggingface (#1577) ljss 2023-11-23 06:41:44 +08:00
4cea74c73b Set top_p=0 and top_k=-1 in greedy sampling (#1748) ljss 2023-11-23 04:51:09 +08:00
a921d8be9d [DOCS] Add engine args documentation (#1741) Casper 2023-11-22 21:31:27 +01:00
094f716bf2 Add stop_token_ids in SamplingParams.__repr__ (#1745) 陈序 2023-11-22 12:13:53 +08:00
7d761fe3c1 [FIX] Fix the case when input_is_parallel=False for ScaledActivation (#1737) Zhuohan Li 2023-11-20 23:56:48 -08:00
cf35d8f3d7 [BugFix] Fix TP support for AWQ (#1731) Woosuk Kwon 2023-11-20 21:42:45 -08:00
4bb6b67188 fix RAM OOM when load large models in tensor parallel mode. (#1395) boydfd 2023-11-21 11:02:42 +08:00
819b18e7ba Rewrite torch.repeat_interleave to remove cpu synchronization (#1599) ljss 2023-11-21 09:46:32 +08:00
19849db573 [Fix] Fix bugs in scheduler (#1727) Zhuofan 2023-11-21 08:10:50 +08:00
3d4ceb292c Fix hanging in the scheduler caused by long prompts (#1534) 陈序 2023-11-21 08:06:49 +08:00
f5a37c6c6c [BugFix] Fix a bug in loading safetensors (#1732) Woosuk Kwon 2023-11-20 15:51:18 -08:00
32c927b53f [FIX] Update the doc link in README.md (#1730) Zhuohan Li 2023-11-20 12:46:24 -08:00
5ffc0d13a2 Migrate linter from pylint to ruff (#1665) Simon Mo 2023-11-20 11:58:01 -08:00
112627e8b2 [Docs] Fix the code block's format in deploying_with_docker page (#1722) Wen Sun 2023-11-20 17:22:39 +08:00
37c1e3c218 Documentation about official docker image (#1709) Simon Mo 2023-11-19 20:56:26 -08:00
06e9ebebd5 Add instructions to install vLLM+cu118 (#1717) Woosuk Kwon 2023-11-18 23:48:58 -08:00
c5f7740d89 Bump up to v0.2.2 (#1689) v0.2.2 Woosuk Kwon 2023-11-18 21:57:07 -08:00
be66d9b125 Fix warning msg on quantization (#1715) Woosuk Kwon 2023-11-18 21:49:55 -08:00
e1054247ba [Optimization] Implement fused add rmsnorm (#1667) ljss 2023-11-19 10:18:02 +08:00
8d17774f92 Add AWQ support for all models (#1714) Woosuk Kwon 2023-11-18 17:56:47 -08:00
e946260cf3 use get_tensor in safe_open (#1696) twaka 2023-11-19 09:45:18 +09:00
edb305584b Support download models from www.modelscope.cn (#1588) liuyhwangyh 2023-11-18 12:38:31 +08:00
bb00f66e19 Use quantization_config in hf config (#1695) Woosuk Kwon 2023-11-17 16:23:49 -08:00
e87557b069 Support Min P Sampler (#1642) Roy 2023-11-18 08:20:49 +08:00
dcc543a298 [Minor] Fix comment (#1704) Zhuofan 2023-11-18 01:42:49 +08:00
0fc280b06c Update the adding-model doc according to the new refactor (#1692) Zhuohan Li 2023-11-16 18:46:26 -08:00
20d0699d49 [Fix] Fix comm test (#1691) Zhuohan Li 2023-11-16 16:28:39 -08:00
686f5e3210 Return usage for openai streaming requests (#1663) Iskren Ivov Chernev 2023-11-17 01:28:36 +02:00
415d109527 [Fix] Update Supported Models List (#1690) Zhuohan Li 2023-11-16 14:47:26 -08:00
521b35f799 Support Microsoft Phi 1.5 (#1664) maximzubkov 2023-11-16 23:28:39 +01:00
cb08cd0d75 [Minor] Fix duplication of ignored seq group in engine step (#1666) Simon Mo 2023-11-16 13:11:41 -08:00
2a2c135b41 Fix loading error when safetensors contains empty tensor (#1687) twaka 2023-11-17 03:38:10 +09:00
65ea2ddf17 feat(config): support parsing torch.dtype (#1641) Aaron Pham 2023-11-16 04:31:06 -05:00
b514d3c496 Revert MptConfig to MPTConfig (#1668) Megha Agarwal 2023-11-16 01:19:39 -08:00
7076fa1c9f TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622) Zhuohan Li 2023-11-15 22:50:41 -08:00
660a7fcfa4 Add DeepSpeed MII backend to benchmark script (#1649) Woosuk Kwon 2023-11-14 12:35:30 -08:00
054072bee5 [Minor] Move RoPE selection logic to get_rope (#1633) Woosuk Kwon 2023-11-12 16:04:50 -08:00
eb825c1e74 Fix #1474 - AssertionError:assert param_slice.shape == loaded_weight.shape (#1631) lirui 2023-11-13 07:53:12 +08:00
1b290ace4f Run default _AsyncLLMEngine._run_workers_async in threadpool (#1628) Dominik Schwabe 2023-11-11 23:50:44 +01:00
0d578228ca config parser: add ChatGLM2 seq_length to _get_and_verify_max_len (#1617) Sin 2023-11-10 11:29:51 +08:00
aebfcb262a Dockerfile: Upgrade Cuda to 12.1 (#1609) GhaziSyed 2023-11-09 20:49:02 +01:00
ab9e8488d5 Add Yi model to quantization support (#1600) forpanyang 2023-11-10 03:47:14 +08:00
fd58b73a40 Build CUDA11.8 wheels for release (#1596) Woosuk Kwon 2023-11-09 03:52:29 -08:00
8efe23f150 Fix input_metadata.selected_token_indices in worker prepare_inputs (#1546) Yanming W 2023-11-09 06:19:12 +08:00
06458a0b42 Upgrade to CUDA 12 (#1527) Zhuohan Li 2023-11-08 14:17:49 -08:00
1a2bbc9301 ChatGLM Support (#1261) GoHomeToMacDonal 2023-11-07 08:09:33 +08:00
e7f579eb97 Support Yi model (#1567) Roy 2023-11-07 07:26:03 +08:00
8516999495 Add Quantization and AutoAWQ to docs (#1235) Casper 2023-11-05 06:43:39 +01:00
9f669a9a7c Support YaRN models (#1264) Antoni Baum 2023-11-03 14:12:48 -07:00
555bdcc5a3 Added logits processor API to sampling params (#1469) Noam Gat 2023-11-03 23:12:15 +02:00
54ca1ba71d docs: add description (#1553) lots-o 2023-11-04 01:14:52 +09:00
9738b84a08 Force paged attention v2 for long contexts (#1510) Antoni Baum 2023-11-01 16:24:32 -07:00
1fe0990023 Remove MPTConfig (#1529) Woosuk Kwon 2023-11-01 15:29:05 -07:00
7e90a2d117 Add /health Endpoint for both Servers (#1540) Fluder-Paradyne 2023-11-01 22:59:44 +05:30
5687d584fe [BugFix] Set engine_use_ray=True when TP>1 (#1531) ljss 2023-11-01 17:14:18 +08:00
cf8849f2d6 Add MptForCausalLM key in model_loader (#1526) Wenfei Yan 2023-10-31 15:46:53 -07:00
e575df33b1 [Small] Formatter only checks lints in changed files (#1528) Cade Daniel 2023-10-31 15:39:38 -07:00
0ce8647dc5 Fix integer overflows in attention & cache ops (#1514) Woosuk Kwon 2023-10-31 15:19:30 -07:00
9cabcb7645 Add Dockerfile (#1350) Stephen Krider 2023-10-31 12:36:47 -07:00
7b895c5976 [Fix] Fix duplicated logging messages (#1524) Zhuohan Li 2023-10-31 09:04:47 -07:00
7013a80170 Add support for spaces_between_special_tokens Dan Lord 2023-10-30 16:52:56 -07:00
79a30912b8 Add py.typed so consumers of vLLM can get type checking (#1509) Jared Roesch 2023-10-30 14:50:47 -07:00
2f3d36a8a1 Fix logging so we actually get info level entries in the log. (#1494) Adam Brusselback 2023-10-30 13:02:21 -04:00
ac8d36f3e5 Refactor LLMEngine demo script for clarity and modularity (#1413) iongpt 2023-10-30 18:14:37 +02:00
15f5632365 Delay GPU->CPU sync in sampling (#1337) Antoni Baum 2023-10-30 09:01:34 -07:00
aa9af07cac Fix bias in InternLM (#1501) Woosuk Kwon 2023-10-30 00:24:18 +01:00
69be658bba Support repetition_penalty (#1424) ljss 2023-10-30 01:02:41 +08:00
beac8dd461 fix: don't skip first special token. (#1497) Ricardo Lu 2023-10-29 19:26:36 +08:00
28b47d1e49 Add rope_scaling to Aquila model (#1457) Qing 2023-10-29 19:25:21 +08:00
1f24755bf8 Support SqueezeLLM (#1326) chooper1 2023-10-22 03:14:59 -03:00
bf31d3606a Pin pydantic dependency versions (#1429) Thiago Salvatore 2023-10-21 15:18:58 -03:00
d189170b6c remove useless statements (#1408) Wang Ran (汪然) 2023-10-20 23:52:07 +08:00
f61dc8072f Fix type hints (#1427) Light Lin 2023-10-20 23:50:47 +08:00
3d40c834f0 v0.2.1.post1 v0.2.1.post1 Woosuk Kwon 2023-10-17 16:30:46 +00:00
d0fb047de3 [BugFix] Define __eq__ in SequenceGroupOutputs (#1389) Woosuk Kwon 2023-10-17 01:09:44 -07:00
f8a1e39fae [BugFix] Define __eq__ in SequenceGroupOutputs (#1389) Woosuk Kwon 2023-10-17 01:09:44 -07:00
a132435204 Fix typo (#1383) Wang Ran (汪然) 2023-10-17 12:53:37 +08:00
9524867701 Add Mistral 7B to test_models (#1366) Woosuk Kwon 2023-10-16 17:49:54 -07:00
c1376e0f82 Change scheduler & input tensor shape (#1381) Woosuk Kwon 2023-10-16 17:48:42 -07:00
651c614aa4 Bump up the version to v0.2.1 (#1355) v0.2.1 Zhuohan Li 2023-10-16 12:58:57 -07:00
d3a5bd9fb7 Fix sampler test (#1379) Woosuk Kwon 2023-10-16 12:57:26 -07:00
e8ef4c0820 Fix PyTorch index URL in workflow (#1378) Woosuk Kwon 2023-10-16 12:37:56 -07:00
348897af31 Fix PyTorch version to 2.0.1 in workflow (#1377) Woosuk Kwon 2023-10-16 11:27:17 -07:00
9d9072a069 Implement prompt logprobs & Batched topk for computing logprobs (#1328) Zhuohan Li 2023-10-16 10:56:50 -07:00
928de46888 Implement PagedAttention V2 (#1348) Woosuk Kwon 2023-10-16 00:59:57 -07:00

... 153 154 155 156 157 ...