Commit Graph

  • c782195662 Disable Logs Requests should Disable Logging of requests. (#1779) Michael McCulloch 2023-11-29 22:50:02 -07:00
  • 0f621c2c7d [Docs] Add information about using shared memory in docker (#1845) Simon Mo 2023-11-29 18:33:56 -08:00
  • a9e4574261 Refactor Attention (#1840) Woosuk Kwon 2023-11-29 15:37:31 -08:00
  • 0229c386c5 Better integration with Ray Serve (#1821) FlorianJoncour 2023-11-29 21:25:43 +00:00
  • a7b3e33078 [Fix] Fix RoPE in ChatGLM-32K (#1841) Woosuk Kwon 2023-11-29 13:01:19 -08:00
  • e19a64c7ef [FIX] Fix formatting error in main branch (#1822) Zhuohan Li 2023-11-28 16:56:43 -08:00
  • 1cb4ad8de9 [FIX] Fix formatting error Zhuohan Li 2023-11-29 00:40:19 +00:00
  • 6ed068a71a Use the type BlockTable (#1791) explainerauthors 2023-11-28 16:34:05 -08:00
  • 708e6c18b0 [FIX] Fix class naming (#1803) Zhuohan Li 2023-11-28 14:08:01 -08:00
  • b943890484 Fix OPT param names (#1819) Woosuk Kwon 2023-11-28 11:22:44 -08:00
  • a1125ad4df Correct comments in parallel_state.py (#1818) explainerauthors 2023-11-28 10:19:35 -08:00
  • a8b150c595 Init model on GPU to reduce CPU memory footprint (#1796) ljss 2023-11-28 03:18:26 +08:00
  • 665cbcec4b Added echo function to OpenAI API server. (#1504) Yunmo Chen 2023-11-27 13:29:17 +08:00
  • 7c600440f7 Fix model docstrings (#1764) Woosuk Kwon 2023-11-23 23:04:44 -08:00
  • e0c6f556e8 [Build] Avoid building too many extensions (#1624) Yanming W 2023-11-23 16:31:19 -08:00
  • de23687d16 Fix repetition penalty aligned with huggingface (#1577) ljss 2023-11-23 06:41:44 +08:00
  • 4cea74c73b Set top_p=0 and top_k=-1 in greedy sampling (#1748) ljss 2023-11-23 04:51:09 +08:00
  • a921d8be9d [DOCS] Add engine args documentation (#1741) Casper 2023-11-22 21:31:27 +01:00
  • 094f716bf2 Add stop_token_ids in SamplingParams.__repr__ (#1745) 陈序 2023-11-22 12:13:53 +08:00
  • 7d761fe3c1 [FIX] Fix the case when input_is_parallel=False for ScaledActivation (#1737) Zhuohan Li 2023-11-20 23:56:48 -08:00
  • cf35d8f3d7 [BugFix] Fix TP support for AWQ (#1731) Woosuk Kwon 2023-11-20 21:42:45 -08:00
  • 4bb6b67188 fix RAM OOM when load large models in tensor parallel mode. (#1395) boydfd 2023-11-21 11:02:42 +08:00
  • 819b18e7ba Rewrite torch.repeat_interleave to remove cpu synchronization (#1599) ljss 2023-11-21 09:46:32 +08:00
  • 19849db573 [Fix] Fix bugs in scheduler (#1727) Zhuofan 2023-11-21 08:10:50 +08:00
  • 3d4ceb292c Fix hanging in the scheduler caused by long prompts (#1534) 陈序 2023-11-21 08:06:49 +08:00
  • f5a37c6c6c [BugFix] Fix a bug in loading safetensors (#1732) Woosuk Kwon 2023-11-20 15:51:18 -08:00
  • 32c927b53f [FIX] Update the doc link in README.md (#1730) Zhuohan Li 2023-11-20 12:46:24 -08:00
  • 5ffc0d13a2 Migrate linter from pylint to ruff (#1665) Simon Mo 2023-11-20 11:58:01 -08:00
  • 112627e8b2 [Docs] Fix the code block's format in deploying_with_docker page (#1722) Wen Sun 2023-11-20 17:22:39 +08:00
  • 37c1e3c218 Documentation about official docker image (#1709) Simon Mo 2023-11-19 20:56:26 -08:00
  • 06e9ebebd5 Add instructions to install vLLM+cu118 (#1717) Woosuk Kwon 2023-11-18 23:48:58 -08:00
  • c5f7740d89 Bump up to v0.2.2 (#1689) v0.2.2 Woosuk Kwon 2023-11-18 21:57:07 -08:00
  • be66d9b125 Fix warning msg on quantization (#1715) Woosuk Kwon 2023-11-18 21:49:55 -08:00
  • e1054247ba [Optimization] Implement fused add rmsnorm (#1667) ljss 2023-11-19 10:18:02 +08:00
  • 8d17774f92 Add AWQ support for all models (#1714) Woosuk Kwon 2023-11-18 17:56:47 -08:00
  • e946260cf3 use get_tensor in safe_open (#1696) twaka 2023-11-19 09:45:18 +09:00
  • edb305584b Support download models from www.modelscope.cn (#1588) liuyhwangyh 2023-11-18 12:38:31 +08:00
  • bb00f66e19 Use quantization_config in hf config (#1695) Woosuk Kwon 2023-11-17 16:23:49 -08:00
  • e87557b069 Support Min P Sampler (#1642) Roy 2023-11-18 08:20:49 +08:00
  • dcc543a298 [Minor] Fix comment (#1704) Zhuofan 2023-11-18 01:42:49 +08:00
  • 0fc280b06c Update the adding-model doc according to the new refactor (#1692) Zhuohan Li 2023-11-16 18:46:26 -08:00
  • 20d0699d49 [Fix] Fix comm test (#1691) Zhuohan Li 2023-11-16 16:28:39 -08:00
  • 686f5e3210 Return usage for openai streaming requests (#1663) Iskren Ivov Chernev 2023-11-17 01:28:36 +02:00
  • 415d109527 [Fix] Update Supported Models List (#1690) Zhuohan Li 2023-11-16 14:47:26 -08:00
  • 521b35f799 Support Microsoft Phi 1.5 (#1664) maximzubkov 2023-11-16 23:28:39 +01:00
  • cb08cd0d75 [Minor] Fix duplication of ignored seq group in engine step (#1666) Simon Mo 2023-11-16 13:11:41 -08:00
  • 2a2c135b41 Fix loading error when safetensors contains empty tensor (#1687) twaka 2023-11-17 03:38:10 +09:00
  • 65ea2ddf17 feat(config): support parsing torch.dtype (#1641) Aaron Pham 2023-11-16 04:31:06 -05:00
  • b514d3c496 Revert MptConfig to MPTConfig (#1668) Megha Agarwal 2023-11-16 01:19:39 -08:00
  • 7076fa1c9f TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622) Zhuohan Li 2023-11-15 22:50:41 -08:00
  • 660a7fcfa4 Add DeepSpeed MII backend to benchmark script (#1649) Woosuk Kwon 2023-11-14 12:35:30 -08:00
  • 054072bee5 [Minor] Move RoPE selection logic to get_rope (#1633) Woosuk Kwon 2023-11-12 16:04:50 -08:00
  • eb825c1e74 Fix #1474 - AssertionError:assert param_slice.shape == loaded_weight.shape (#1631) lirui 2023-11-13 07:53:12 +08:00
  • 1b290ace4f Run default _AsyncLLMEngine._run_workers_async in threadpool (#1628) Dominik Schwabe 2023-11-11 23:50:44 +01:00
  • 0d578228ca config parser: add ChatGLM2 seq_length to _get_and_verify_max_len (#1617) Sin 2023-11-10 11:29:51 +08:00
  • aebfcb262a Dockerfile: Upgrade Cuda to 12.1 (#1609) GhaziSyed 2023-11-09 20:49:02 +01:00
  • ab9e8488d5 Add Yi model to quantization support (#1600) forpanyang 2023-11-10 03:47:14 +08:00
  • fd58b73a40 Build CUDA11.8 wheels for release (#1596) Woosuk Kwon 2023-11-09 03:52:29 -08:00
  • 8efe23f150 Fix input_metadata.selected_token_indices in worker prepare_inputs (#1546) Yanming W 2023-11-09 06:19:12 +08:00
  • 06458a0b42 Upgrade to CUDA 12 (#1527) Zhuohan Li 2023-11-08 14:17:49 -08:00
  • 1a2bbc9301 ChatGLM Support (#1261) GoHomeToMacDonal 2023-11-07 08:09:33 +08:00
  • e7f579eb97 Support Yi model (#1567) Roy 2023-11-07 07:26:03 +08:00
  • 8516999495 Add Quantization and AutoAWQ to docs (#1235) Casper 2023-11-05 06:43:39 +01:00
  • 9f669a9a7c Support YaRN models (#1264) Antoni Baum 2023-11-03 14:12:48 -07:00
  • 555bdcc5a3 Added logits processor API to sampling params (#1469) Noam Gat 2023-11-03 23:12:15 +02:00
  • 54ca1ba71d docs: add description (#1553) lots-o 2023-11-04 01:14:52 +09:00
  • 9738b84a08 Force paged attention v2 for long contexts (#1510) Antoni Baum 2023-11-01 16:24:32 -07:00
  • 1fe0990023 Remove MPTConfig (#1529) Woosuk Kwon 2023-11-01 15:29:05 -07:00
  • 7e90a2d117 Add /health Endpoint for both Servers (#1540) Fluder-Paradyne 2023-11-01 22:59:44 +05:30
  • 5687d584fe [BugFix] Set engine_use_ray=True when TP>1 (#1531) ljss 2023-11-01 17:14:18 +08:00
  • cf8849f2d6 Add MptForCausalLM key in model_loader (#1526) Wenfei Yan 2023-10-31 15:46:53 -07:00
  • e575df33b1 [Small] Formatter only checks lints in changed files (#1528) Cade Daniel 2023-10-31 15:39:38 -07:00
  • 0ce8647dc5 Fix integer overflows in attention & cache ops (#1514) Woosuk Kwon 2023-10-31 15:19:30 -07:00
  • 9cabcb7645 Add Dockerfile (#1350) Stephen Krider 2023-10-31 12:36:47 -07:00
  • 7b895c5976 [Fix] Fix duplicated logging messages (#1524) Zhuohan Li 2023-10-31 09:04:47 -07:00
  • 7013a80170 Add support for spaces_between_special_tokens Dan Lord 2023-10-30 16:52:56 -07:00
  • 79a30912b8 Add py.typed so consumers of vLLM can get type checking (#1509) Jared Roesch 2023-10-30 14:50:47 -07:00
  • 2f3d36a8a1 Fix logging so we actually get info level entries in the log. (#1494) Adam Brusselback 2023-10-30 13:02:21 -04:00
  • ac8d36f3e5 Refactor LLMEngine demo script for clarity and modularity (#1413) iongpt 2023-10-30 18:14:37 +02:00
  • 15f5632365 Delay GPU->CPU sync in sampling (#1337) Antoni Baum 2023-10-30 09:01:34 -07:00
  • aa9af07cac Fix bias in InternLM (#1501) Woosuk Kwon 2023-10-30 00:24:18 +01:00
  • 69be658bba Support repetition_penalty (#1424) ljss 2023-10-30 01:02:41 +08:00
  • beac8dd461 fix: don't skip first special token. (#1497) Ricardo Lu 2023-10-29 19:26:36 +08:00
  • 28b47d1e49 Add rope_scaling to Aquila model (#1457) Qing 2023-10-29 19:25:21 +08:00
  • 1f24755bf8 Support SqueezeLLM (#1326) chooper1 2023-10-22 03:14:59 -03:00
  • bf31d3606a Pin pydantic dependency versions (#1429) Thiago Salvatore 2023-10-21 15:18:58 -03:00
  • d189170b6c remove useless statements (#1408) Wang Ran (汪然) 2023-10-20 23:52:07 +08:00
  • f61dc8072f Fix type hints (#1427) Light Lin 2023-10-20 23:50:47 +08:00
  • 3d40c834f0 v0.2.1.post1 v0.2.1.post1 Woosuk Kwon 2023-10-17 16:30:46 +00:00
  • d0fb047de3 [BugFix] Define __eq__ in SequenceGroupOutputs (#1389) Woosuk Kwon 2023-10-17 01:09:44 -07:00
  • f8a1e39fae [BugFix] Define __eq__ in SequenceGroupOutputs (#1389) Woosuk Kwon 2023-10-17 01:09:44 -07:00
  • a132435204 Fix typo (#1383) Wang Ran (汪然) 2023-10-17 12:53:37 +08:00
  • 9524867701 Add Mistral 7B to test_models (#1366) Woosuk Kwon 2023-10-16 17:49:54 -07:00
  • c1376e0f82 Change scheduler & input tensor shape (#1381) Woosuk Kwon 2023-10-16 17:48:42 -07:00
  • 651c614aa4 Bump up the version to v0.2.1 (#1355) v0.2.1 Zhuohan Li 2023-10-16 12:58:57 -07:00
  • d3a5bd9fb7 Fix sampler test (#1379) Woosuk Kwon 2023-10-16 12:57:26 -07:00
  • e8ef4c0820 Fix PyTorch index URL in workflow (#1378) Woosuk Kwon 2023-10-16 12:37:56 -07:00
  • 348897af31 Fix PyTorch version to 2.0.1 in workflow (#1377) Woosuk Kwon 2023-10-16 11:27:17 -07:00
  • 9d9072a069 Implement prompt logprobs & Batched topk for computing logprobs (#1328) Zhuohan Li 2023-10-16 10:56:50 -07:00
  • 928de46888 Implement PagedAttention V2 (#1348) Woosuk Kwon 2023-10-16 00:59:57 -07:00