Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

0003e9154b [Misc][Minor] Fix CPU block num log in CPUExecutor. (#4088) Li, Jiang 2024-04-15 23:35:55 +08:00
e11e200736 [Bugfix] Fix filelock version requirement (#4075) Zhuohan Li 2024-04-14 21:50:08 -07:00
8db1bf32f8 [Misc] Upgrade triton to 2.2.0 (#4061) Roy 2024-04-15 08:43:54 +08:00
aceb17cf2d [Docs] document that mixtral 8x22b is supported (#4073) Simon Mo 2024-04-14 14:35:55 -07:00
563c54f760 [BugFix] Fix tensorizer extra in setup.py (#4072) Nick Hill 2024-04-14 22:12:42 +01:00
2cd6b4f362 [Core] avoid too many cuda context by caching p2p test (#4021) youkaichao 2024-04-13 23:40:21 -07:00
711a000255 [Frontend] [Core] feat: Add model loading using tensorizer (#3476) Sanger Steel 2024-04-13 20:13:01 -04:00
989ae2538d [Kernel] Add punica dimension for Baichuan-13B (#4053) Jee Li 2024-04-13 22:55:05 +08:00
0a430b4ae2 [Bugfix] fix_small_bug_in_neuron_executor (#4051) zspo 2024-04-13 22:54:03 +08:00
ec8e3c695f [Bugfix] fix_log_time_in_metrics (#4050) zspo 2024-04-13 22:52:36 +08:00
98afde19fc [Core][Distributed] improve logging for init dist (#4042) youkaichao 2024-04-13 07:12:53 -07:00
5c2e66e487 [Bugfix] More type hint fixes for py 3.8 (#4039) Dylan Hawk 2024-04-12 21:07:04 -07:00
546e721168 [CI/Test] expand ruff and yapf for all supported python version (#4037) youkaichao 2024-04-12 18:43:37 -07:00
b8aacac31a [Bugfix] Fix LoRA bug (#4032) Jee Li 2024-04-13 07:56:37 +08:00
d04973ad54 Fix triton compilation issue (#3984) Bellk17 2024-04-12 16:41:26 -07:00
fbb9d9eef4 [Core] fix custom allreduce default value (#4040) youkaichao 2024-04-12 16:40:39 -07:00
09473ee41c [mypy] Add mypy type annotation part 1 (#4006) SangBin Cho 2024-04-13 06:35:50 +09:00
d4ec9ffb95 [Misc] Fix typo in scheduler.py (#4022) Zhuohan Li 2024-04-12 13:56:04 -07:00
96b6a6d790 [Bugfix] fix type hint for py 3.8 (#4036) youkaichao 2024-04-12 12:35:44 -07:00
36729bac13 [Test] Test multiple attn backend for chunked prefill. (#4023) SangBin Cho 2024-04-13 01:56:57 +09:00
7fd3949a0b [Frontend][Core] Move merge_async_iterators to utils (#4026) Cyrus Leung 2024-04-12 13:30:54 +08:00
1096717ae9 [Core] Support LoRA on quantized models (#4012) Jee Li 2024-04-12 12:02:44 +08:00
c2b4a1bce9 [Doc] Add typing hints / mypy types cleanup (#3816) Michael Feil 2024-04-11 17:17:21 -07:00
e46a60aa4c [BugFix] Fix handling of stop strings and stop token ids (#3672) Nick Hill 2024-04-11 23:34:12 +01:00
1e96c3341a Add extra punica sizes to support bigger vocabs (#4015) Antoni Baum 2024-04-11 15:18:57 -07:00
95e7d4a97c Fix echo/logprob OpenAI completion bug (#3441) Dylan Hawk 2024-04-11 15:15:50 -07:00
559eb852f8 [Core] init_distributed_environment align with init_process_group(#4014) youkaichao 2024-04-11 14:00:48 -07:00
a10d3056da [Core] Set linear_weights directly on the layer (#3977) Antoni Baum 2024-04-11 13:35:51 -07:00
8afca50889 [Hardware][Intel] Isolate CPUModelRunner and ModelRunner for better maintenance (#3824) bigPYJ1151 2024-04-12 02:56:49 +08:00
08ccee1e83 punica fix-bgmv-kernel-640 (#4007) fuchen.ljl 2024-04-11 23:59:26 +08:00
c1dc547129 [Kernel] Fused MoE Config for Mixtral 8x22 (#4002) Roger Wang 2024-04-11 07:50:00 -07:00
f3d0bf7589 [Doc][Installation] delete python setup.py develop (#3989) youkaichao 2024-04-10 20:33:02 -07:00
e9da5a40c6 [Misc] Add indirection layer for custom ops (#3913) Kunshang Ji 2024-04-11 03:26:07 +00:00
e42df7227d [Test] Add xformer and flash attn tests (#3961) SangBin Cho 2024-04-11 12:09:50 +09:00
caada5e50a [Core][Model] torch.compile for layernorm in commandr (#3985) youkaichao 2024-04-10 18:48:26 -07:00
67b4221a61 [Core][5/N] Fully working chunked prefill e2e (#3884) SangBin Cho 2024-04-11 09:56:48 +09:00
63e7176f26 [Core][Refactor] move parallel_utils into vllm/distributed (#3950) youkaichao 2024-04-10 15:33:30 -07:00
934d3662f7 [Bugfix] handle hf_config with architectures == None (#3982) Travis Johnson 2024-04-10 16:28:25 -06:00
92cd2e2f21 [Doc] Fix getting stared to use publicly available model (#3963) Frαnçois 2024-04-10 20:05:52 +02:00
e4c4072c94 [Bugfix] Remove key sorting for guided_json parameter in OpenAi compatible Server (#3945) Daniel E Marasco 2024-04-10 13:15:51 -04:00
e35397468f [Doc] Add doc to state our model support policy (#3948) youkaichao 2024-04-10 10:03:02 -07:00
8b317c6dd0 [Model][AMD] ROCm support for 256 head dims for Gemma (#3972) James Whedbee 2024-04-10 10:12:00 -05:00
bd3c144e0b [Bugfix][ROCm] Add numba to Dockerfile.rocm (#3962) Woosuk Kwon 2024-04-10 07:37:17 -07:00
0258b7a94b [Bugfix] handle prompt_logprobs in _apply_min_tokens_penalty (#3876) Travis Johnson 2024-04-10 02:39:56 -06:00
b3104b2a10 [Bugfix] Fix logits processor when prompt_logprobs is not None (#3899) 胡译文 2024-04-10 15:09:36 +08:00
c2e00af523 [Bugfix] fix utils.py/merge_dict func TypeError: 'type' object is not subscriptable (#3955) zhaotyer 2024-04-10 12:49:11 +08:00
c013d32c75 [Benchmark] Add cpu options to bench scripts (#3915) Zedong Peng 2024-04-10 12:30:03 +08:00
11dd6ebb89 [Misc] Avoid loading incorrect LoRA config (#3777) Jee Li 2024-04-10 10:47:15 +08:00
6c0b04515f [ROCm][Hardware][AMD] Use Triton Kernel for default FA on ROCm (#3643) Juan Villamizar 2024-04-09 17:10:47 -05:00
e23a43aef8 [Bugfix] Fix KeyError on loading GPT-NeoX (#3925) Junichi Sato 2024-04-10 04:11:31 +09:00
e7c7067b45 [Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837) Cade Daniel 2024-04-09 11:44:15 -07:00
6d592eb430 [Core] separate distributed_init from worker (#3904) youkaichao 2024-04-09 01:49:02 -07:00
d036198e23 [BugFix][Model] Fix commandr RoPE max_position_embeddings (#3919) Roy 2024-04-09 06:17:21 +08:00
59a6abf3c9 [Hotfix][CI/Build][Kernel] CUDA 11.8 does not support layernorm optimizations (#3782) Matt Wong 2024-04-08 14:31:02 -07:00
bc0c0192d1 [Bugfix] Enable Proper attention_bias Usage in Llama Model Configuration (#3767) Kiran R 2024-04-09 01:12:35 +05:30
f46864d68d [Bugfix] Added Command-R GPTQ support (#3849) egortolmachev 2024-04-08 17:59:38 +03:00
b4543c8f6b [Model] add minicpm (#3893) ywfang 2024-04-08 18:28:36 +08:00
0ce0539d47 [Bugfix] Fix Llava inference with Tensor Parallelism. (#3883) Isotr0py 2024-04-07 22:54:13 +08:00
2f19283549 [Core] latency optimization (#3890) youkaichao 2024-04-06 19:14:06 -07:00
95baec828f [Core] enable out-of-tree model register (#3871) youkaichao 2024-04-06 17:11:41 -07:00
e4be7d70bb [CI/Benchmark] add more iteration and use median for robust latency benchmark (#3889) youkaichao 2024-04-06 14:32:30 -07:00
54951ac4bf [Bugfix] Fix incorrect output on OLMo models in Tensor Parallelism (#3869) Isotr0py 2024-04-06 03:02:09 +08:00
18de883489 [Chunked Prefill][4/n] Chunked prefill scheduler. (#3853) SangBin Cho 2024-04-06 02:17:58 +09:00
1d7c940d74 Add option to completion API to truncate prompt tokens (#3144) Thomas Parnell 2024-04-05 19:15:42 +02:00
cfaf49a167 [Misc] Define common requirements (#3841) Woosuk Kwon 2024-04-05 00:39:17 -07:00
9edec652e2 [Bugfix] Fixing requirements.txt (#3865) Noam Gat 2024-04-05 09:46:01 +03:00
e0dd4d3589 [Misc] Fix linter issues in examples/fp8/quantizer/quantize.py (#3864) Cade Daniel 2024-04-04 21:57:33 -07:00
e5043a3e75 [Misc] Add pytest marker to opt-out of global test cleanup (#3863) Cade Daniel 2024-04-04 21:54:16 -07:00
d03d64fd2e [CI/Build] refactor dockerfile & fix pip cache youkaichao 2024-04-04 21:53:16 -07:00
78107fa091 [Doc]Add asynchronous engine arguments to documentation. (#3810) Sean Gallen 2024-04-04 23:52:01 -05:00
c391e4b68e [Core] improve robustness of pynccl (#3860) youkaichao 2024-04-04 16:52:12 -07:00
9117f892f0 [Model] Cohere CommandR+ (#3829) Saurabh Dash 2024-04-05 02:01:49 +05:30
db2a6a41e2 [Hardware][CPU] Update cpu torch to match default of 2.2.1 (#3854) Michael Goin 2024-04-04 12:49:49 -07:00
ca81ff5196 [Core] manage nccl via a pypi package & upgrade to pt 2.2.1 (#3805) youkaichao 2024-04-04 10:26:19 -07:00
b7782002e1 [Benchmark] Refactor sample_requests in benchmark_throughput (#3613) TianYu GUO 2024-04-04 17:56:22 +08:00
819a309c0f [Bugfix] Fix args in benchmark_serving (#3836) Chang Su 2024-04-04 00:41:05 -07:00
aabe8f40f2 [Core] [Frontend] Make detokenization optional (#3749) Matthias Gerstgrasser 2024-04-03 21:52:18 -07:00
498eb5cfa3 [Bugfix] Add kv_scale input parameter to CPU backend (#3840) Woosuk Kwon 2024-04-03 21:33:08 -07:00
537ee25f43 [Core] Enable hf_transfer by default if available (#3817) Michael Feil 2024-04-03 21:02:43 -07:00
294f8f6665 [BugFix] Pass tokenizer_config to local_tokenizer_group (#3754) Tao He 2024-04-04 11:31:46 +08:00
b95047f2da [Misc] Publish 3rd meetup slides (#3835) Woosuk Kwon 2024-04-03 15:46:10 -07:00
2ff767b513 Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290) Adrian Abeyta 2024-04-03 16:15:55 -05:00
3dcb3e8b98 [3/N] Refactor scheduler for chunked prefill scheduling (#3550) SangBin Cho 2024-04-04 06:13:49 +09:00
c64cf38673 [Doc] Update contribution guidelines for better onboarding (#3819) Michael Feil 2024-04-03 00:31:43 -07:00
76b889bf1d [Doc] Update README.md (#3806) Robert Shaw 2024-04-02 23:11:10 -07:00
c9b506dad4 [BugFix] Use different mechanism to get vllm version in is_cpu() (#3804) Nick Hill 2024-04-02 23:06:25 -07:00
5757d90e26 [Speculative decoding] Adding configuration object for speculative decoding (#3706) Cade Daniel 2024-04-02 17:40:57 -07:00
a3c226e7eb [CI/Build] 0.4.0.post1, fix sm 7.0/7.5 binary (#3803) v0.4.0.post1 youkaichao 2024-04-02 12:57:04 -07:00
b321d4881b [Bugfix] Add __init__.py files for vllm/core/block/ and vllm/spec_decode/ (#3798) Michael Goin 2024-04-02 12:35:31 -07:00
ad6eca408b Fix early CUDA init via get_architecture_class_name import (#3770) leiwen83 2024-04-03 02:56:26 +08:00
205b94942e [CI/Build] fix TORCH_CUDA_ARCH_LIST in wheel build (#3801) youkaichao 2024-04-02 11:54:33 -07:00
3bec41f41a [Doc] Fix vLLMEngine Doc Page (#3791) Roger Wang 2024-04-02 09:49:37 -07:00
0739b1947f [Frontend][Bugfix] allow using the default middleware with a root path (#3788) A-Mahla 2024-04-02 10:20:28 +02:00
77a6572aa5 [HotFix] [CI/Build] Minor fix for CPU backend CI (#3787) bigPYJ1151 2024-04-02 13:50:53 +08:00
0e3f06fe9c [Hardware][Intel] Add CPU inference backend (#3634) bigPYJ1151 2024-04-02 13:07:30 +08:00
eb69d68804 [Misc] [CI/Build] Speed up block manager CPU-only unit tests ~10x by opting-out of GPU cleanup (#3783) Cade Daniel 2024-04-01 17:49:51 -07:00
7d4e1b85e7 [Misc] Add support for new autogptq checkpoint_format (#3689) Qubitium 2024-04-02 07:32:01 +08:00
93deb0b38f [Speculative decoding 4/9] Lookahead scheduling for speculative decoding (#3250) Cade Daniel 2024-04-01 15:55:24 -07:00
ccb58b23e6 [Misc] Fix Benchmark TTFT Calculation for Chat Completions (#3768) Roger Wang 2024-04-01 15:24:30 -07:00
49782fcb76 [Misc] Some minor simplifications to detokenization logic (#3670) Nick Hill 2024-04-01 13:22:06 -07:00

... 147 148 149 150 151 ...