biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Bhoomit	3717a4dd47	[Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules (#34984 ) Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-17 14:36:41 +00:00
Yuchen Fama	31a458c091	[Doc] Clarify schema enforcement behavior for tool_choice modes (#37064 ) Signed-off-by: yfama <yuchengu@gmail.com>	2026-03-16 22:27:42 +00:00
Max de Bayser	9f9ecff4cd	Add simple granite4 tool parser (#36827 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2026-03-16 10:49:09 -07:00
sfeiqiang	8cb24d3aed	[KV Connector] Support using FlexKV as KV Cache Offloading option. (#34328 ) Signed-off-by: phaedonsun <phaedonsun@tencent.com> Co-authored-by: phaedonsun <phaedonsun@tencent.com>	2026-03-12 00:46:20 -07:00
Harry Mellor	a0f44bb616	Allow `markdownlint` to run locally (#36398 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-08 20:05:24 -07:00
Xiang Shi	e68de8adc0	docs: fix wrong cc in int8.md (#36209 ) Signed-off-by: Xiang Shi <realkevin@tutanota.com>	2026-03-06 06:01:02 +00:00
zihaoanllm	d106bf39f5	[Doc] Add Parallel Draft Models (#35973 ) Signed-off-by: <zihaoan2@amd.com> Signed-off-by: zihaoanllm <zihaoan2@amd.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 05:44:07 +00:00
Davina Zaman	138d891d7f	[Docs] Clarify structured outputs configuration for Qwen3 reasoning mode (#32441 ) Signed-off-by: Davina Zaman <davzaman@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 11:44:39 -08:00
Russell Bryant	2f2c1d73a7	[Docs] Upgrade dynamic LoRA warning to admonition block (#35218 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-03-04 10:01:42 -08:00
Xing Liu	7cc6058ac6	[Doc] Add MTP docs and update speculative decoding guidance (#35197 ) Signed-off-by: liuxing <945764858@qq.com>	2026-03-04 17:23:34 +00:00
Nicolò Lucchesi	f91808ae0d	[MM] Allow audio chunking for offline LLM (#34628 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-23 21:04:28 -08:00
petrpechman	bebfe55b1c	[Doc] Fix example of eagle3 (#34960 ) Signed-off-by: Petr Pechman <petr.pechman@firma.seznam.cz> Co-authored-by: Petr Pechman <petr.pechman@firma.seznam.cz>	2026-02-21 09:57:53 +00:00
Nicolò Lucchesi	ab6f3487a6	[PD] Change kv_load_failure_policy Default from "recompute" to "fail" (#34896 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-21 01:34:57 -08:00
BADAOUI Abdennacer	8dc8a99b56	[ROCm] Enable bitsandbytes quantization support on ROCm (#34688 ) Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>	2026-02-21 00:34:55 -08:00
Kyle Sayers	64ac1395e8	[Docs] Clean up speculators docs (#34065 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2026-02-18 13:48:11 -08:00
Harry Mellor	a21cedf4ff	Bump `lm-eval` version for Transformers v5 compatibility (#33994 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-16 05:24:35 -08:00
Parth Bansal	5653021094	[Doc] Add Mistral-7b-v0.3 model to the batch invariance validated model (#34584 ) Signed-off-by: Parth Bansal <parthbansal127@gmail.com>	2026-02-16 12:09:00 +08:00
Nicolò Lucchesi	334c715e0f	[Docs] Spec decoding docs warning removal (#34439 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-12 09:01:51 -08:00
Tianqi Ren	786806dd44	[Doc] Update Marlin support matrix for Turing (#34319 ) Signed-off-by: Tianqi Ren <tianqi.r@outlook.com>	2026-02-11 09:03:41 +00:00
Cyrus Leung	25e48a3aae	[Doc] Update usage of `--limit-mm-per-prompt` (#34148 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-09 21:12:13 -08:00
wang.yuqi	22b64948f6	[Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-02-09 06:42:38 +00:00
danisereb	084aa19f02	Add support for ModelOpt MXFP8 dense models (#33786 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-02-08 11:16:48 -08:00
liranschour	8322d4e47f	Enable Cross layers KV cache layout at NIXL Connector V2 (#33339 ) Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com> Co-authored-by: Or Ozeri <or@ozery.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-02-05 02:17:02 -08:00
Frank Wang	45f8fd6f97	[Feature] Enable `TRITON_ATTN` for Batch Invariance (#33688 ) Signed-off-by: frankwang28 <frank.wbb@hotmail.com>	2026-02-04 13:27:34 +08:00
dtc	0d6ccf68fa	[P/D] rework mooncake connector and introduce its bootstrap server (#31034 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2026-02-03 08:08:25 -08:00
Krish Gupta	2df2b3499d	Document NixlConnector backend selection via kv_connector_extra_config (#33552 ) Signed-off-by: KrxGu <krishom70@gmail.com>	2026-02-03 05:49:59 -08:00
Michael Goin	29fba76781	[UX] Use gguf `repo_id:quant_type` syntax for examples and docs (#33371 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-31 12:14:54 +08:00
Aidan Reilly	133765760b	[Docs] Adding links and intro to Speculators and LLM Compressor (#32849 ) Signed-off-by: Aidan Reilly <aireilly@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-29 14:12:35 -08:00
Or Ozeri	2e8de86777	Revert "Enable Cross layers KV cache layout at NIXL Connector (#30207 )" (#33241 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Kevin H. Luu <khluu000@gmail.com>	2026-01-28 04:36:00 -08:00
Robert Shaw	247d1a32ea	[Quantization][Deprecation] Remove BitBlas (#32683 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-28 11:06:22 +00:00
Alex Brooks	9ac818a551	[Misc] HF Hub LoRA Resolver (#20320 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2026-01-26 13:56:32 +00:00
Cyrus Leung	11b556878b	[Refactor] Use data parser for matching data items to multi-modal UUIDs (#32955 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-26 15:00:28 +08:00
zhanqiuhu	151e5451c2	[Doc] Add Qwen2.5 models to batch invariance tested models (#33016 ) Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>	2026-01-25 09:20:46 +00:00
Eldar Kurtić	44f08af3a7	Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) (#30141 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com> Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>	2026-01-22 13:29:57 -07:00
Cyrus Leung	d117a4d1a9	[Frontend] Introduce Renderer for processing chat messages (using `ModelConfig`) (#30200 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-22 12:44:22 +00:00
liranschour	64e3d67ac0	Enable Cross layers KV cache layout at NIXL Connector (#30207 ) Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com> Co-authored-by: Or Ozeri <or@ozery.com>	2026-01-22 10:12:58 +00:00
Jackmin801	12dab78f49	[Feat] allow inplace loading lora (#31326 ) Signed-off-by: Jackmin801 <ongjackm@gmail.com> Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-20 10:15:20 +08:00
Yuxuan Zhang	71832ba71e	[GLM-4.7] GLM Model support for GLM-Lite (#31386 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: Yuxuan Zhang <2448370773@qq.com>	2026-01-19 01:18:38 -08:00
Michael Goin	6388b50058	[Docs] Add docs about OOT Quantization Plugins (#32035 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-14 15:25:45 +08:00
Yi Liu	50632adc58	Consolidate Intel Quantization Toolkit Integration in vLLM (#31716 ) Signed-off-by: yiliu30 <yi4.liu@intel.com>	2026-01-14 07:11:30 +00:00
Nicolò Lucchesi	8c8653b672	[Docs] Nixl Usage recommend `fail` kv_load_failure_policy (#32198 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-13 12:51:57 +00:00
Andrew Bennett	f243abc92d	Fix various typos found in `docs` (#32212 ) Signed-off-by: Andrew Bennett <potatosaladx@meta.com>	2026-01-13 03:41:47 +00:00
Andy Zhang	e68b0dad8b	doc: Update model name for Qwen3-Coder in documentation (#32185 ) Signed-off-by: Andy Zhang <xiazhang@microsoft.com>	2026-01-12 07:10:50 -08:00
Or Ozeri	9cddbdba6d	OffloadingConnector: Add cpu_bytes_to_use configuration (#24498 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-12 15:00:43 +00:00
Jee Jee Li	05e8981234	[Doc] Improve LoRA docs (#32159 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-12 02:19:17 -08:00
Jeremy Teboul	657e9c0e18	[Fix] Introduce audio channels spec (#31595 ) Signed-off-by: Jeremy Teboul <jeremyte@meta.com>	2026-01-09 19:34:51 +00:00
vSeamar	6f351548b2	[Frontend] Implement robust video frame recovery for corrupted videos (#29197 ) Signed-off-by: cmartinez <cmartinez@roblox.com> Signed-off-by: vSeamar <cmartinez@roblox.com>	2026-01-07 01:13:24 +00:00
Jee Jee Li	cbd4690a03	[LoRA]Disable linear LoRA kernel PDL (#31777 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-06 23:12:25 +08:00
BlankR	6ebb66ccea	[Doc] Fix format of multimodal_inputs.md (#31800 ) Signed-off-by: BlankR <hjyblanche@gmail.com>	2026-01-06 03:30:24 -08:00
labAxiaoming	a01f2faedf	Add multimodal input method in the documentation (#31601 ) Signed-off-by: xiaoming <1259730330@qq.com>	2026-01-02 12:43:30 +00:00

1 2 3 4 5

219 Commits