biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Rohan Potdar	f38f8c9742	[ROCm]: Enable customop and rope+kvcache fusion for AITER RoPE (#35180 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-02-25 04:36:40 +00:00
Benjamin Chislett	f5972a872f	[Model][Spec Decode] Nemotron-H MTP and Mamba Speculative Decoding Support (#33726 ) Signed-off-by: Shahar Mor <smor@nvidia.com> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Shahar Mor <smor@nvidia.com> Co-authored-by: Roi Koren <roik@nvidia.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-24 09:49:56 -08:00
Nicolò Lucchesi	f91808ae0d	[MM] Allow audio chunking for offline LLM (#34628 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-23 21:04:28 -08:00
pschlan-amd	80d93fd6da	gpu_model_runner: Cache is_encoder_decoder from model config (#35099 ) Signed-off-by: Patrick Schlangen <pschlan@amd.com>	2026-02-23 19:08:34 -08:00
Rohan Potdar	2ff4e51152	[ROCm] AITER fused RoPE+KVCache (#33443 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: charlifu <charlifu@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>	2026-02-23 19:06:00 -08:00
Vincent Gimenes	aa08a30fc9	[CLEANING] Remove unused disable_by_batch_size from SpeculativeConfig (#35060 ) Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>	2026-02-23 05:05:36 -08:00
Cyrus Leung	987506bca6	[Refactor] Simplify dummy data generation (#35025 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-22 20:55:27 -08:00
qizixi	2bcf71b9c0	[Spec Decode] Reduce TP communication for speculative decoding draft token generation (#34049 ) Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2026-02-22 14:59:16 -08:00
Benjamin Chislett	682566b18e	[Bug] Refactor max_num_batched_tokens to account for drafting (#34898 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-02-22 11:18:46 -05:00
Nicolò Lucchesi	ab6f3487a6	[PD] Change kv_load_failure_policy Default from "recompute" to "fail" (#34896 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-21 01:34:57 -08:00
Lucas Wilkinson	aaefc58ee0	[CI] Revert PRs 34818 and 33600 (#34979 )	2026-02-20 13:25:50 -08:00
杨朱 · Kiki	07cab212f0	[Misc] Add deprecated environment variable utilities (#33677 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-19 21:33:25 -08:00
Matthew Bonanni	662205d34e	[Bugfix] Fix Basic Models Test (#34818 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-19 14:49:07 -08:00
Kyle Sayers	64ac1395e8	[Docs] Clean up speculators docs (#34065 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2026-02-18 13:48:11 -08:00
Luka Govedič	02e8f26cea	[torch.compile] Turn on silu+fp4 quant fusion by default for O1+ (#34718 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2026-02-18 03:29:15 +00:00
Matthew Bonanni	7743152957	[Attention] Refactor `check_and_update_config` (#33600 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-02-17 17:06:54 -08:00
Lucas Kabela	a3205beffb	[CI] Enable mypy coverage for individual excluded files (#34292 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-16 07:34:29 -08:00
Luka Govedič	23d825aba1	[torch.compile] Disable ar-rms fusion for ds3-fp4 & DP, fix CI test (#34392 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-15 06:33:57 -08:00
Thomas Parnell	d5fe3f702c	[Hybrid] Enable mamba prefix cache "align" mode with async scheduling (#33997 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2026-02-14 13:15:56 -08:00
Wei Zhao	b37b679770	[Feature][Perf] Support Selective CPU Weight Offloading (#34535 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-02-13 20:02:24 -08:00
Richard Zou	87789c8364	[Misc] vLLM's --enforce-eager should turn off compile and cudagraphs only (#34523 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-13 09:52:20 -08:00
Roger Wang	1dae7b7843	[Bugfix] Exclude `language_model_only` key from MM AOT compile hash but include in model one (#34508 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-13 13:59:00 +00:00
Harry Huang	6f019e6e0a	[BugFix] Add block_size validation for mamba cache align mode (#34445 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>	2026-02-12 18:18:07 -08:00
Cyrus Leung	fb455ed547	[V0 Deprecation] Remove code related to per-request logits processors (#34400 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 20:44:28 +08:00
Cyrus Leung	b96f7314b4	[Refactor] Pass Renderer to Input Processor (#34329 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-11 19:38:11 -08:00
SorenDreano	48134a2c22	[Docs] Fix typo ("defult") and double spacing (#34348 ) Signed-off-by: SorenDreano <71752785+SorenDreano@users.noreply.github.com> Co-authored-by: Soren Dreano <soren@numind.ai> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-11 09:02:27 -08:00
Harry Mellor	40b8f55358	[Docs] Reduce time spent generating API docs (#34255 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-11 02:56:02 -08:00
Luka Govedič	addac0e653	[torch.compile] Enable AR+rms fusion by default available for `-O2` (#34299 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2026-02-11 00:30:00 -08:00
Zhengkai Zhang	6f2f59f2b3	[Misc][Spec Decode] support different load config for draft model (#34022 ) Signed-off-by: zzhengkai <zzhengkai@devgpu049.ldc1.facebook.com> Co-authored-by: zzhengkai <zzhengkai@devgpu049.ldc1.facebook.com>	2026-02-10 14:52:43 -08:00
Pavani Majety	578977bb5e	[SM100] Resubmit FMHA FP8 prefill for MLA (#31195 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2026-02-10 16:18:43 -05:00
Qi Wang	33bcd3dc3b	[Misc] Introduce ec_both role EC (encoder cache) connector (#34182 ) Signed-off-by: Qi Wang <qiwa@nvidia.com>	2026-02-10 18:55:35 +00:00
Phúc H. Lê Khắc	94de871546	[Misc] allow specify is_mm_prefix_lm in hf_config (#34215 )	2026-02-10 11:16:21 +00:00
Cyrus Leung	25e48a3aae	[Doc] Update usage of `--limit-mm-per-prompt` (#34148 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-09 21:12:13 -08:00
Roger Wang	64a9c2528b	[UX] Add `--language-model-only` for hybrid models (#34120 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-09 14:57:33 +00:00
JJJYmmm	9562912cea	[MODEL] Adding Support for Qwen3.5 Models (#34110 ) Signed-off-by: JJJYmmm <1650675829@qq.com> Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: wulipc <wulipc@users.noreply.github.com> Co-authored-by: ywang96 <ywang96@users.noreply.github.com> Co-authored-by: Isotr0py <Isotr0py@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-02-09 21:12:58 +08:00
Jee Jee Li	978a37c823	[Model] GLM adaptation (#34124 )	2026-02-09 17:32:52 +08:00
danisereb	084aa19f02	Add support for ModelOpt MXFP8 dense models (#33786 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-02-08 11:16:48 -08:00
Richard Zou	4df841fe75	[torch.compile] Add an option to force-enable the MOE cold start optimization (#33735 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-08 18:42:56 +00:00
Reagan Lee	c4df59ad43	Add embedding input functionality for disabled modalities [remake] (#32493 ) Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”> Signed-off-by: Reagan Lee <reaganjlee@gmail.com> Signed-off-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com> Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-08 04:57:16 -08:00
Mohammad Miadh Angkad	dd6a6e1190	[Kernel] Add KernelConfig flag to enable/disable FlashInfer autotune (#34006 ) Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-07 05:24:44 -08:00
Wentao Ye	18e8545297	[Revert] Add util `handle_deprecated` back (#33998 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-07 04:14:45 +00:00
rasmith	ec28784fdc	[CI][AMD]Bugfix] Check that model_config is not None in enable_norm_pad_fusion (#34007 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-02-07 02:43:25 +00:00
Harry Mellor	51a7bda625	Update `WeightTransferConfig` to be more standard like the others (#33989 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-06 13:15:00 +00:00
SorenDreano	6e7b1c4b59	[Docs] Improve documentation (#33799 ) Co-authored-by: Soren Dreano <soren@numind.ai> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-02-06 12:57:09 +00:00
Luka Govedič	ac32e66cf9	[torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) (#33731 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-06 04:19:49 -08:00
chengchengpei	965525667b	Onboard voyage-4-nano (#33720 ) Signed-off-by: Chengcheng Pei <chengchengpei@outlook.com> Signed-off-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com> Co-authored-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-06 06:23:34 +00:00
emricksini-h	325ab6b0a8	[Feature] OTEL tracing during loading (#31162 )	2026-02-05 16:59:28 -08:00
Benjamin Chislett	af3162d3aa	[Spec Decode] Unified Parallel Drafting (#32887 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-02-05 12:37:18 -05:00
Aaron Hao	c1858b7ec8	[Feat][RL][1/2] Native Weight Syncing API: NCCL (#31943 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Co-authored-by: SumanthRH <sumanthrh99@gmail.com>	2026-02-05 12:13:23 -05:00
Luka Govedič	4d9513537d	[CI][torch.compile] Reduce e2e fusion test time (#33293 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-04 19:09:03 -05:00

1 2 3 4 5 ...

519 Commits