biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Chauncey	cbe7d18096	[Misc] Rename think_start_str/think_end_str to reasoning_start_str/reasoning_end_str (#38242 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-04-01 09:56:45 -07:00
Nicolò Lucchesi	7337ff7f03	[Docs] PD with Nixl compat matrix (#38628 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-31 15:01:21 +00:00
Flora Feng	3e802e8786	[Mypy] Fix adjust_request typing (#38264 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-31 04:21:18 +00:00
Cyrus Leung	ba2f0acc2d	[Misc] Reorganize inputs (#35182 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-25 10:22:54 -07:00
Sungjae Lee	4731884796	[Feature] limit thinking tokens (hard limit) (#20859 ) Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Signed-off-by: Sungjae Lee <sung-jae.lee@navercorp.com> Signed-off-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-24 09:53:07 -07:00
Yan Ma	d3fe857135	update doc for online fp8 quantization (#37851 ) Signed-off-by: Yan Ma <yan.ma@intel.com>	2026-03-23 05:19:03 +00:00
Ifta khairul Alam Adil	104605cbf2	Remove deprecated reasoning_content message field(part-2) (#37480 ) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Philip Ottesen <phiott256@gmail.com> Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai> Signed-off-by: Andy Lo <andy@mistral.ai> Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com> Signed-off-by: sihao.li <sihao.li@intel.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: JartX <sagformas@epdcenter.es> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Philip Ottesen <phiott256@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com> Co-authored-by: Andy Lo <andy@mistral.ai> Co-authored-by: Thillai Chithambaram <79466435+thillai-c@users.noreply.github.com> Co-authored-by: sihao_li <165983188+1643661061leo@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 15:20:08 +00:00
wang.yuqi	f9e2a38386	[Docs] Reorganize pooling docs. (#35592 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 11:25:47 +00:00
Bhoomit	3717a4dd47	[Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules (#34984 ) Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-17 14:36:41 +00:00
Yuchen Fama	31a458c091	[Doc] Clarify schema enforcement behavior for tool_choice modes (#37064 ) Signed-off-by: yfama <yuchengu@gmail.com>	2026-03-16 22:27:42 +00:00
Max de Bayser	9f9ecff4cd	Add simple granite4 tool parser (#36827 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2026-03-16 10:49:09 -07:00
sfeiqiang	8cb24d3aed	[KV Connector] Support using FlexKV as KV Cache Offloading option. (#34328 ) Signed-off-by: phaedonsun <phaedonsun@tencent.com> Co-authored-by: phaedonsun <phaedonsun@tencent.com>	2026-03-12 00:46:20 -07:00
Harry Mellor	a0f44bb616	Allow `markdownlint` to run locally (#36398 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-08 20:05:24 -07:00
Xiang Shi	e68de8adc0	docs: fix wrong cc in int8.md (#36209 ) Signed-off-by: Xiang Shi <realkevin@tutanota.com>	2026-03-06 06:01:02 +00:00
zihaoanllm	d106bf39f5	[Doc] Add Parallel Draft Models (#35973 ) Signed-off-by: <zihaoan2@amd.com> Signed-off-by: zihaoanllm <zihaoan2@amd.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 05:44:07 +00:00
Davina Zaman	138d891d7f	[Docs] Clarify structured outputs configuration for Qwen3 reasoning mode (#32441 ) Signed-off-by: Davina Zaman <davzaman@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 11:44:39 -08:00
Russell Bryant	2f2c1d73a7	[Docs] Upgrade dynamic LoRA warning to admonition block (#35218 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-03-04 10:01:42 -08:00
Xing Liu	7cc6058ac6	[Doc] Add MTP docs and update speculative decoding guidance (#35197 ) Signed-off-by: liuxing <945764858@qq.com>	2026-03-04 17:23:34 +00:00
Nicolò Lucchesi	f91808ae0d	[MM] Allow audio chunking for offline LLM (#34628 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-23 21:04:28 -08:00
petrpechman	bebfe55b1c	[Doc] Fix example of eagle3 (#34960 ) Signed-off-by: Petr Pechman <petr.pechman@firma.seznam.cz> Co-authored-by: Petr Pechman <petr.pechman@firma.seznam.cz>	2026-02-21 09:57:53 +00:00
Nicolò Lucchesi	ab6f3487a6	[PD] Change kv_load_failure_policy Default from "recompute" to "fail" (#34896 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-21 01:34:57 -08:00
BADAOUI Abdennacer	8dc8a99b56	[ROCm] Enable bitsandbytes quantization support on ROCm (#34688 ) Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>	2026-02-21 00:34:55 -08:00
Kyle Sayers	64ac1395e8	[Docs] Clean up speculators docs (#34065 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2026-02-18 13:48:11 -08:00
Harry Mellor	a21cedf4ff	Bump `lm-eval` version for Transformers v5 compatibility (#33994 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-16 05:24:35 -08:00
Parth Bansal	5653021094	[Doc] Add Mistral-7b-v0.3 model to the batch invariance validated model (#34584 ) Signed-off-by: Parth Bansal <parthbansal127@gmail.com>	2026-02-16 12:09:00 +08:00
Nicolò Lucchesi	334c715e0f	[Docs] Spec decoding docs warning removal (#34439 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-12 09:01:51 -08:00
Tianqi Ren	786806dd44	[Doc] Update Marlin support matrix for Turing (#34319 ) Signed-off-by: Tianqi Ren <tianqi.r@outlook.com>	2026-02-11 09:03:41 +00:00
Cyrus Leung	25e48a3aae	[Doc] Update usage of `--limit-mm-per-prompt` (#34148 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-09 21:12:13 -08:00
wang.yuqi	22b64948f6	[Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-02-09 06:42:38 +00:00
danisereb	084aa19f02	Add support for ModelOpt MXFP8 dense models (#33786 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-02-08 11:16:48 -08:00
liranschour	8322d4e47f	Enable Cross layers KV cache layout at NIXL Connector V2 (#33339 ) Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com> Co-authored-by: Or Ozeri <or@ozery.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-02-05 02:17:02 -08:00
Frank Wang	45f8fd6f97	[Feature] Enable `TRITON_ATTN` for Batch Invariance (#33688 ) Signed-off-by: frankwang28 <frank.wbb@hotmail.com>	2026-02-04 13:27:34 +08:00
dtc	0d6ccf68fa	[P/D] rework mooncake connector and introduce its bootstrap server (#31034 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2026-02-03 08:08:25 -08:00
Krish Gupta	2df2b3499d	Document NixlConnector backend selection via kv_connector_extra_config (#33552 ) Signed-off-by: KrxGu <krishom70@gmail.com>	2026-02-03 05:49:59 -08:00
Michael Goin	29fba76781	[UX] Use gguf `repo_id:quant_type` syntax for examples and docs (#33371 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-31 12:14:54 +08:00
Aidan Reilly	133765760b	[Docs] Adding links and intro to Speculators and LLM Compressor (#32849 ) Signed-off-by: Aidan Reilly <aireilly@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-29 14:12:35 -08:00
Or Ozeri	2e8de86777	Revert "Enable Cross layers KV cache layout at NIXL Connector (#30207 )" (#33241 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Kevin H. Luu <khluu000@gmail.com>	2026-01-28 04:36:00 -08:00
Robert Shaw	247d1a32ea	[Quantization][Deprecation] Remove BitBlas (#32683 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-28 11:06:22 +00:00
Alex Brooks	9ac818a551	[Misc] HF Hub LoRA Resolver (#20320 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2026-01-26 13:56:32 +00:00
Cyrus Leung	11b556878b	[Refactor] Use data parser for matching data items to multi-modal UUIDs (#32955 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-26 15:00:28 +08:00
zhanqiuhu	151e5451c2	[Doc] Add Qwen2.5 models to batch invariance tested models (#33016 ) Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>	2026-01-25 09:20:46 +00:00
Eldar Kurtić	44f08af3a7	Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) (#30141 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com> Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>	2026-01-22 13:29:57 -07:00
Cyrus Leung	d117a4d1a9	[Frontend] Introduce Renderer for processing chat messages (using `ModelConfig`) (#30200 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-22 12:44:22 +00:00
liranschour	64e3d67ac0	Enable Cross layers KV cache layout at NIXL Connector (#30207 ) Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com> Co-authored-by: Or Ozeri <or@ozery.com>	2026-01-22 10:12:58 +00:00
Jackmin801	12dab78f49	[Feat] allow inplace loading lora (#31326 ) Signed-off-by: Jackmin801 <ongjackm@gmail.com> Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-20 10:15:20 +08:00
Yuxuan Zhang	71832ba71e	[GLM-4.7] GLM Model support for GLM-Lite (#31386 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: Yuxuan Zhang <2448370773@qq.com>	2026-01-19 01:18:38 -08:00
Michael Goin	6388b50058	[Docs] Add docs about OOT Quantization Plugins (#32035 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-14 15:25:45 +08:00
Yi Liu	50632adc58	Consolidate Intel Quantization Toolkit Integration in vLLM (#31716 ) Signed-off-by: yiliu30 <yi4.liu@intel.com>	2026-01-14 07:11:30 +00:00
Nicolò Lucchesi	8c8653b672	[Docs] Nixl Usage recommend `fail` kv_load_failure_policy (#32198 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-13 12:51:57 +00:00
Andrew Bennett	f243abc92d	Fix various typos found in `docs` (#32212 ) Signed-off-by: Andrew Bennett <potatosaladx@meta.com>	2026-01-13 03:41:47 +00:00

1 2 3 4 5

227 Commits