Commit Graph

227 Commits

Author SHA1 Message Date
Chauncey
cbe7d18096 [Misc] Rename think_start_str/think_end_str to reasoning_start_str/reasoning_end_str (#38242)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-04-01 09:56:45 -07:00
Nicolò Lucchesi
7337ff7f03 [Docs] PD with Nixl compat matrix (#38628)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-03-31 15:01:21 +00:00
Flora Feng
3e802e8786 [Mypy] Fix adjust_request typing (#38264)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-31 04:21:18 +00:00
Cyrus Leung
ba2f0acc2d [Misc] Reorganize inputs (#35182)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-25 10:22:54 -07:00
Sungjae Lee
4731884796 [Feature] limit thinking tokens (hard limit) (#20859)
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Signed-off-by: Sungjae Lee <sung-jae.lee@navercorp.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-24 09:53:07 -07:00
Yan Ma
d3fe857135 update doc for online fp8 quantization (#37851)
Signed-off-by: Yan Ma <yan.ma@intel.com>
2026-03-23 05:19:03 +00:00
Ifta khairul Alam Adil
104605cbf2 Remove deprecated reasoning_content message field(part-2) (#37480)
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Philip Ottesen <phiott256@gmail.com>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Andy Lo <andy@mistral.ai>
Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com>
Signed-off-by: sihao.li <sihao.li@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: JartX <sagformas@epdcenter.es>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Philip Ottesen <phiott256@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com>
Co-authored-by: Andy Lo <andy@mistral.ai>
Co-authored-by: Thillai Chithambaram <79466435+thillai-c@users.noreply.github.com>
Co-authored-by: sihao_li <165983188+1643661061leo@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-19 15:20:08 +00:00
wang.yuqi
f9e2a38386 [Docs] Reorganize pooling docs. (#35592)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-19 11:25:47 +00:00
Bhoomit
3717a4dd47 [Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules (#34984)
Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-17 14:36:41 +00:00
Yuchen Fama
31a458c091 [Doc] Clarify schema enforcement behavior for tool_choice modes (#37064)
Signed-off-by: yfama <yuchengu@gmail.com>
2026-03-16 22:27:42 +00:00
Max de Bayser
9f9ecff4cd Add simple granite4 tool parser (#36827)
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2026-03-16 10:49:09 -07:00
sfeiqiang
8cb24d3aed [KV Connector] Support using FlexKV as KV Cache Offloading option. (#34328)
Signed-off-by: phaedonsun <phaedonsun@tencent.com>
Co-authored-by: phaedonsun <phaedonsun@tencent.com>
2026-03-12 00:46:20 -07:00
Harry Mellor
a0f44bb616 Allow markdownlint to run locally (#36398)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-08 20:05:24 -07:00
Xiang Shi
e68de8adc0 docs: fix wrong cc in int8.md (#36209)
Signed-off-by: Xiang Shi <realkevin@tutanota.com>
2026-03-06 06:01:02 +00:00
zihaoanllm
d106bf39f5 [Doc] Add Parallel Draft Models (#35973)
Signed-off-by: <zihaoan2@amd.com>
Signed-off-by: zihaoanllm <zihaoan2@amd.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-05 05:44:07 +00:00
Davina Zaman
138d891d7f [Docs] Clarify structured outputs configuration for Qwen3 reasoning mode (#32441)
Signed-off-by: Davina Zaman <davzaman@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-04 11:44:39 -08:00
Russell Bryant
2f2c1d73a7 [Docs] Upgrade dynamic LoRA warning to admonition block (#35218)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2026-03-04 10:01:42 -08:00
Xing Liu
7cc6058ac6 [Doc] Add MTP docs and update speculative decoding guidance (#35197)
Signed-off-by: liuxing <945764858@qq.com>
2026-03-04 17:23:34 +00:00
Nicolò Lucchesi
f91808ae0d [MM] Allow audio chunking for offline LLM (#34628)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-02-23 21:04:28 -08:00
petrpechman
bebfe55b1c [Doc] Fix example of eagle3 (#34960)
Signed-off-by: Petr Pechman <petr.pechman@firma.seznam.cz>
Co-authored-by: Petr Pechman <petr.pechman@firma.seznam.cz>
2026-02-21 09:57:53 +00:00
Nicolò Lucchesi
ab6f3487a6 [PD] Change kv_load_failure_policy Default from "recompute" to "fail" (#34896)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-21 01:34:57 -08:00
BADAOUI Abdennacer
8dc8a99b56 [ROCm] Enable bitsandbytes quantization support on ROCm (#34688)
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
2026-02-21 00:34:55 -08:00
Kyle Sayers
64ac1395e8 [Docs] Clean up speculators docs (#34065)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2026-02-18 13:48:11 -08:00
Harry Mellor
a21cedf4ff Bump lm-eval version for Transformers v5 compatibility (#33994)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-16 05:24:35 -08:00
Parth Bansal
5653021094 [Doc] Add Mistral-7b-v0.3 model to the batch invariance validated model (#34584)
Signed-off-by: Parth Bansal <parthbansal127@gmail.com>
2026-02-16 12:09:00 +08:00
Nicolò Lucchesi
334c715e0f [Docs] Spec decoding docs warning removal (#34439)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-02-12 09:01:51 -08:00
Tianqi Ren
786806dd44 [Doc] Update Marlin support matrix for Turing (#34319)
Signed-off-by: Tianqi Ren <tianqi.r@outlook.com>
2026-02-11 09:03:41 +00:00
Cyrus Leung
25e48a3aae [Doc] Update usage of --limit-mm-per-prompt (#34148)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-09 21:12:13 -08:00
wang.yuqi
22b64948f6 [Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-02-09 06:42:38 +00:00
danisereb
084aa19f02 Add support for ModelOpt MXFP8 dense models (#33786)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
2026-02-08 11:16:48 -08:00
liranschour
8322d4e47f Enable Cross layers KV cache layout at NIXL Connector V2 (#33339)
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: liranschour <liranschour@users.noreply.github.com>
Co-authored-by: Or Ozeri <or@ozery.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2026-02-05 02:17:02 -08:00
Frank Wang
45f8fd6f97 [Feature] Enable TRITON_ATTN for Batch Invariance (#33688)
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
2026-02-04 13:27:34 +08:00
dtc
0d6ccf68fa [P/D] rework mooncake connector and introduce its bootstrap server (#31034)
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
2026-02-03 08:08:25 -08:00
Krish Gupta
2df2b3499d Document NixlConnector backend selection via kv_connector_extra_config (#33552)
Signed-off-by: KrxGu <krishom70@gmail.com>
2026-02-03 05:49:59 -08:00
Michael Goin
29fba76781 [UX] Use gguf repo_id:quant_type syntax for examples and docs (#33371)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-01-31 12:14:54 +08:00
Aidan Reilly
133765760b [Docs] Adding links and intro to Speculators and LLM Compressor (#32849)
Signed-off-by: Aidan Reilly <aireilly@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-01-29 14:12:35 -08:00
Or Ozeri
2e8de86777 Revert "Enable Cross layers KV cache layout at NIXL Connector (#30207)" (#33241)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
2026-01-28 04:36:00 -08:00
Robert Shaw
247d1a32ea [Quantization][Deprecation] Remove BitBlas (#32683)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-01-28 11:06:22 +00:00
Alex Brooks
9ac818a551 [Misc] HF Hub LoRA Resolver (#20320)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2026-01-26 13:56:32 +00:00
Cyrus Leung
11b556878b [Refactor] Use data parser for matching data items to multi-modal UUIDs (#32955)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-26 15:00:28 +08:00
zhanqiuhu
151e5451c2 [Doc] Add Qwen2.5 models to batch invariance tested models (#33016)
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>
2026-01-25 09:20:46 +00:00
Eldar Kurtić
44f08af3a7 Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) (#30141)
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>
2026-01-22 13:29:57 -07:00
Cyrus Leung
d117a4d1a9 [Frontend] Introduce Renderer for processing chat messages (using ModelConfig) (#30200)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-22 12:44:22 +00:00
liranschour
64e3d67ac0 Enable Cross layers KV cache layout at NIXL Connector (#30207)
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: liranschour <liranschour@users.noreply.github.com>
Co-authored-by: Or Ozeri <or@ozery.com>
2026-01-22 10:12:58 +00:00
Jackmin801
12dab78f49 [Feat] allow inplace loading lora (#31326)
Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-20 10:15:20 +08:00
Yuxuan Zhang
71832ba71e [GLM-4.7] GLM Model support for GLM-Lite (#31386)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Yuxuan Zhang <2448370773@qq.com>
2026-01-19 01:18:38 -08:00
Michael Goin
6388b50058 [Docs] Add docs about OOT Quantization Plugins (#32035)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-01-14 15:25:45 +08:00
Yi Liu
50632adc58 Consolidate Intel Quantization Toolkit Integration in vLLM (#31716)
Signed-off-by: yiliu30 <yi4.liu@intel.com>
2026-01-14 07:11:30 +00:00
Nicolò Lucchesi
8c8653b672 [Docs] Nixl Usage recommend fail kv_load_failure_policy (#32198)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-01-13 12:51:57 +00:00
Andrew Bennett
f243abc92d Fix various typos found in docs (#32212)
Signed-off-by: Andrew Bennett <potatosaladx@meta.com>
2026-01-13 03:41:47 +00:00