biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Benjamin Chislett	9804145cac	[Model][Speculative Decoding] Expand DeepSeek MTP code to support k > n_predict (#13626 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-02-27 15:28:08 -08:00
Cyrus Leung	a2dd48c386	[VLM] Deprecate legacy input mapper for OOT multimodal models (#13979 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-02-27 19:14:55 +00:00
Szymon Ożóg	7f0be2aa24	[Model] Deepseek GGUF support (#13167 )	2025-02-27 02:08:35 -08:00
Sage Moore	1d35662e6d	[ROCm] Disable chunked prefill/prefix caching when running MLA on non-cuda platforms (#13844 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-02-26 14:56:58 +08:00
cjackal	51010a1807	[Misc] set single whitespace between log sentences (#13771 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2025-02-25 10:26:12 +08:00
Robert Shaw	f61528d46d	[Misc][Chore] Clean Up `AsyncOutputProcessing` Logs (#13780 )	2025-02-24 16:39:07 -08:00
Robert Shaw	1f0ae3ed0a	[Misc] Clean Up `EngineArgs.create_engine_config` (#13734 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2025-02-24 13:52:21 -05:00
Nicolò Lucchesi	444b0f0f62	[Misc][Docs] Raise error when flashinfer is not installed and `VLLM_ATTENTION_BACKEND` is set (#12513 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-02-24 10:43:21 -05:00
Jongseok Park	781096e385	Expert Parallelism (EP) Support for DeepSeek V2 (#12583 )	2025-02-24 07:33:20 -08:00
youkaichao	eb24dc4a45	[v1] torchrun compatibility (#13642 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-23 22:47:24 +08:00
youkaichao	2382ad29d1	[ci] fix linter (#13701 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-22 20:28:59 +08:00
youkaichao	3e472d882a	[core] set up data parallel communication (#13591 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-22 19:28:59 +08:00
Mark McLoughlin	2cb8c1540e	[Metrics] Add `--show-hidden-metrics-for-version` CLI arg (#13295 )	2025-02-22 00:20:45 -08:00
Mark McLoughlin	1cd981da4f	[V1][Metrics] Support `vllm:cache_config_info` (#13299 )	2025-02-22 00:20:00 -08:00
Lucas Wilkinson	288cc6c234	[Attention] MLA with chunked prefill (#12639 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Patrick Horn <patrick.horn@gmail.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-21 15:30:12 -08:00
Michael Goin	71face8540	[Bugfix] Fix max_num_batched_tokens for MLA (#13620 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-20 17:45:20 -08:00
Joe Runde	bfbc0b32c6	[Frontend] Add backend-specific options for guided decoding (#13505 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-20 15:07:58 -05:00
Yannick Schnider	423330263b	[Feature] Pluggable platform-specific scheduler (#13161 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com> Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>	2025-02-19 17:16:38 +08:00
Lucia Fang	f525c0be8b	[Model][Speculative Decoding] DeepSeek MTP spec decode (#12755 ) Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-02-19 17:06:23 +08:00
Kevin H. Luu	d5d214ac7f	[1/n][CI] Load models in CI from S3 instead of HF (#13205 ) Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>	2025-02-19 07:34:59 +00:00
shangmingc	46cdd59577	[Feature][Spec Decode] Simplify the use of Eagle Spec Decode (#12304 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-02-16 19:32:26 -08:00
Joe Runde	3bcb8c75da	[Core] Reduce TTFT with concurrent partial prefills (#10235 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2025-02-14 15:36:07 -08:00
Michael Goin	f0b2da72a8	Expand MLA to support most types of quantization (#13181 )	2025-02-13 22:19:22 -08:00
Nicolò Lucchesi	d84cef76eb	[Frontend] Add `/v1/audio/transcriptions` OpenAI API endpoint (#12909 )	2025-02-13 07:23:45 -08:00
Keyun Tong	3ee696a63d	[RFC][vllm-API] Support tokenizer registry for customized tokenizer in vLLM (#12518 ) Signed-off-by: Keyun Tong <tongkeyun@gmail.com>	2025-02-12 12:25:58 +08:00
wangxiyuan	2e3b969ec0	[Platform] add pre_register_and_update function (#12432 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-11 22:06:46 +08:00
youkaichao	91dd8f7aa6	[bugfix] respect distributed_executor_backend in world_size=1 (#12934 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-08 16:17:08 +08:00
youkaichao	09b95e36ab	[torch.compile] PyTorch 2.6 and nightly compatibility (#12393 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-07 01:09:07 +08:00
Michael Goin	449d1bce02	[Misc] Remove duplicated DeepSeek V2/V3 model definition (#12793 )	2025-02-05 23:16:20 -08:00
Kyle Sayers	4896d0c2dd	[Quant] Fix use_mla TypeError and support loading pure-sparsity Compressed Tensors configs (#12711 )	2025-02-03 23:27:11 -08:00
Kyle Sayers	6dd5e52823	Squelch MLA warning for Compressed-Tensors Models (#12704 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-02-03 13:29:56 -08:00
Arthur	a1a2aaadb9	[Model]: Add `transformers` backend support (#11330 ) # Adds support for `transformers` as a backend Following https://github.com/huggingface/transformers/pull/35235, a bunch of models should already be supported, we are ramping up support for more models. Thanks @Isotr0py for the TP support, and @hmellor for his help as well! This includes: - `trust_remote_code=True` support: any model on the hub, if it implements attention the correct way can be natively supported!! - tensor parallel support --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-02-03 21:30:38 +08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Simon Mo	4f4d427ac2	Disable chunked prefill and/or prefix caching when MLA is enabled (#12642 ) Some checks failed Create Release / Create Release (push) Has been cancelled Details From @mgoin in https://github.com/vllm-project/vllm/pull/12638 I cannot push to that branch, therefore a new PR to unblock release. --------- Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2025-01-31 23:46:57 -08:00
Lucas Wilkinson	baeded2569	[Attention] Deepseek v3 MLA support with FP8 compute (#12601 ) This PR implements the Deepseek V3 support by performing matrix absorption the fp8 weights --------- Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com>	2025-01-31 21:52:51 -08:00
Lucas Wilkinson	cabaf4eff3	[Attention] MLA decode optimizations (#12528 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Co-authored-by: simon-mo <xmo@berkeley.edu>	2025-01-30 23:49:37 -08:00
Yanyi Liu	ff7424f491	[Frontend] Support override generation config in args (#12409 ) Signed-off-by: liuyanyi <wolfsonliu@163.com>	2025-01-29 01:41:01 -08:00
Harry Mellor	823ab79633	Update `pre-commit` hooks (#12475 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-27 17:23:08 -07:00
Nicolò Lucchesi	6116ca8cd7	[Feature] [Spec decode]: Enable MLPSpeculator/Medusa and `prompt_logprobs` with ChunkedPrefill (#10132 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: wallashss <wallashss@ibm.com> Co-authored-by: wallashss <wallashss@ibm.com>	2025-01-27 13:38:35 -08:00
Bowen Wang	2bc3fbba0c	[FlashInfer] Upgrade to 0.2.0 (#11194 ) Signed-off-by: Bowen Wang <abmfy@icloud.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-01-27 18:19:24 +00:00
Matthew Hendrey	9ddc35220b	[Frontend] generation_config.json for maximum tokens(#12242 ) Signed-off-by: Matthew Hendrey <matthew.hendrey@gmail.com> Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: shangmingc <caishangming@linux.alibaba.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-01-26 19:59:25 +08:00
Cyrus Leung	df5dafaa5b	[Misc] Remove deprecated code (#12383 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-24 14:45:20 -05:00
Gregory Shtrasberg	e97f802b2d	[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2025-01-23 18:04:03 +00:00
youkaichao	6e650f56a1	[torch.compile] decouple compile sizes and cudagraph sizes (#12243 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-24 02:01:30 +08:00
Konrad Zawora	96f6a7596f	[Bugfix] Fix HPU multiprocessing executor (#12167 ) Signed-off-by: Konrad Zawora <kzawora@habana.ai>	2025-01-23 02:07:07 +08:00
youkaichao	68ad4e3a8d	[Core] Support fully transparent sleep mode (#11743 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-22 14:39:32 +08:00
youkaichao	c81081fece	[torch.compile] transparent compilation with more logging (#12246 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-21 19:32:55 +08:00
Jinzhen Lin	750f4cabfa	[Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) (#12222 ) Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-01-20 16:42:16 -08:00
youkaichao	e66faf4809	[torch.compile] store inductor compiled Python file (#12182 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-19 16:27:26 +08:00
Chen Zhang	d1adb9b403	[BugFix] add more `is not None` check in VllmConfig.__post_init__ (#12138 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-01-17 05:33:22 +00:00

... 4 5 6 7 8 ...

671 Commits