Bhoomit
|
3717a4dd47
|
[Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules (#34984)
Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-17 14:36:41 +00:00 |
|
Yuchen Fama
|
31a458c091
|
[Doc] Clarify schema enforcement behavior for tool_choice modes (#37064)
Signed-off-by: yfama <yuchengu@gmail.com>
|
2026-03-16 22:27:42 +00:00 |
|
Max de Bayser
|
9f9ecff4cd
|
Add simple granite4 tool parser (#36827)
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2026-03-16 10:49:09 -07:00 |
|
sfeiqiang
|
8cb24d3aed
|
[KV Connector] Support using FlexKV as KV Cache Offloading option. (#34328)
Signed-off-by: phaedonsun <phaedonsun@tencent.com>
Co-authored-by: phaedonsun <phaedonsun@tencent.com>
|
2026-03-12 00:46:20 -07:00 |
|
Harry Mellor
|
a0f44bb616
|
Allow markdownlint to run locally (#36398)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-08 20:05:24 -07:00 |
|
Xiang Shi
|
e68de8adc0
|
docs: fix wrong cc in int8.md (#36209)
Signed-off-by: Xiang Shi <realkevin@tutanota.com>
|
2026-03-06 06:01:02 +00:00 |
|
zihaoanllm
|
d106bf39f5
|
[Doc] Add Parallel Draft Models (#35973)
Signed-off-by: <zihaoan2@amd.com>
Signed-off-by: zihaoanllm <zihaoan2@amd.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 05:44:07 +00:00 |
|
Davina Zaman
|
138d891d7f
|
[Docs] Clarify structured outputs configuration for Qwen3 reasoning mode (#32441)
Signed-off-by: Davina Zaman <davzaman@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 11:44:39 -08:00 |
|
Russell Bryant
|
2f2c1d73a7
|
[Docs] Upgrade dynamic LoRA warning to admonition block (#35218)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2026-03-04 10:01:42 -08:00 |
|
Xing Liu
|
7cc6058ac6
|
[Doc] Add MTP docs and update speculative decoding guidance (#35197)
Signed-off-by: liuxing <945764858@qq.com>
|
2026-03-04 17:23:34 +00:00 |
|
Nicolò Lucchesi
|
f91808ae0d
|
[MM] Allow audio chunking for offline LLM (#34628)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-23 21:04:28 -08:00 |
|
petrpechman
|
bebfe55b1c
|
[Doc] Fix example of eagle3 (#34960)
Signed-off-by: Petr Pechman <petr.pechman@firma.seznam.cz>
Co-authored-by: Petr Pechman <petr.pechman@firma.seznam.cz>
|
2026-02-21 09:57:53 +00:00 |
|
Nicolò Lucchesi
|
ab6f3487a6
|
[PD] Change kv_load_failure_policy Default from "recompute" to "fail" (#34896)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-21 01:34:57 -08:00 |
|
BADAOUI Abdennacer
|
8dc8a99b56
|
[ROCm] Enable bitsandbytes quantization support on ROCm (#34688)
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>
|
2026-02-21 00:34:55 -08:00 |
|
Kyle Sayers
|
64ac1395e8
|
[Docs] Clean up speculators docs (#34065)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-02-18 13:48:11 -08:00 |
|
Harry Mellor
|
a21cedf4ff
|
Bump lm-eval version for Transformers v5 compatibility (#33994)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-16 05:24:35 -08:00 |
|
Parth Bansal
|
5653021094
|
[Doc] Add Mistral-7b-v0.3 model to the batch invariance validated model (#34584)
Signed-off-by: Parth Bansal <parthbansal127@gmail.com>
|
2026-02-16 12:09:00 +08:00 |
|
Nicolò Lucchesi
|
334c715e0f
|
[Docs] Spec decoding docs warning removal (#34439)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-12 09:01:51 -08:00 |
|
Tianqi Ren
|
786806dd44
|
[Doc] Update Marlin support matrix for Turing (#34319)
Signed-off-by: Tianqi Ren <tianqi.r@outlook.com>
|
2026-02-11 09:03:41 +00:00 |
|
Cyrus Leung
|
25e48a3aae
|
[Doc] Update usage of --limit-mm-per-prompt (#34148)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-09 21:12:13 -08:00 |
|
wang.yuqi
|
22b64948f6
|
[Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-09 06:42:38 +00:00 |
|
danisereb
|
084aa19f02
|
Add support for ModelOpt MXFP8 dense models (#33786)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-02-08 11:16:48 -08:00 |
|
liranschour
|
8322d4e47f
|
Enable Cross layers KV cache layout at NIXL Connector V2 (#33339)
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: liranschour <liranschour@users.noreply.github.com>
Co-authored-by: Or Ozeri <or@ozery.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-02-05 02:17:02 -08:00 |
|
Frank Wang
|
45f8fd6f97
|
[Feature] Enable TRITON_ATTN for Batch Invariance (#33688)
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
|
2026-02-04 13:27:34 +08:00 |
|
dtc
|
0d6ccf68fa
|
[P/D] rework mooncake connector and introduce its bootstrap server (#31034)
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
|
2026-02-03 08:08:25 -08:00 |
|
Krish Gupta
|
2df2b3499d
|
Document NixlConnector backend selection via kv_connector_extra_config (#33552)
Signed-off-by: KrxGu <krishom70@gmail.com>
|
2026-02-03 05:49:59 -08:00 |
|
Michael Goin
|
29fba76781
|
[UX] Use gguf repo_id:quant_type syntax for examples and docs (#33371)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-31 12:14:54 +08:00 |
|
Aidan Reilly
|
133765760b
|
[Docs] Adding links and intro to Speculators and LLM Compressor (#32849)
Signed-off-by: Aidan Reilly <aireilly@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-29 14:12:35 -08:00 |
|
Or Ozeri
|
2e8de86777
|
Revert "Enable Cross layers KV cache layout at NIXL Connector (#30207)" (#33241)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-01-28 04:36:00 -08:00 |
|
Robert Shaw
|
247d1a32ea
|
[Quantization][Deprecation] Remove BitBlas (#32683)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-28 11:06:22 +00:00 |
|
Alex Brooks
|
9ac818a551
|
[Misc] HF Hub LoRA Resolver (#20320)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2026-01-26 13:56:32 +00:00 |
|
Cyrus Leung
|
11b556878b
|
[Refactor] Use data parser for matching data items to multi-modal UUIDs (#32955)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-26 15:00:28 +08:00 |
|
zhanqiuhu
|
151e5451c2
|
[Doc] Add Qwen2.5 models to batch invariance tested models (#33016)
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>
|
2026-01-25 09:20:46 +00:00 |
|
Eldar Kurtić
|
44f08af3a7
|
Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) (#30141)
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>
|
2026-01-22 13:29:57 -07:00 |
|
Cyrus Leung
|
d117a4d1a9
|
[Frontend] Introduce Renderer for processing chat messages (using ModelConfig) (#30200)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-22 12:44:22 +00:00 |
|
liranschour
|
64e3d67ac0
|
Enable Cross layers KV cache layout at NIXL Connector (#30207)
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: liranschour <liranschour@users.noreply.github.com>
Co-authored-by: Or Ozeri <or@ozery.com>
|
2026-01-22 10:12:58 +00:00 |
|
Jackmin801
|
12dab78f49
|
[Feat] allow inplace loading lora (#31326)
Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Jackmin801 <56836461+Jackmin801@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-20 10:15:20 +08:00 |
|
Yuxuan Zhang
|
71832ba71e
|
[GLM-4.7] GLM Model support for GLM-Lite (#31386)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Yuxuan Zhang <2448370773@qq.com>
|
2026-01-19 01:18:38 -08:00 |
|
Michael Goin
|
6388b50058
|
[Docs] Add docs about OOT Quantization Plugins (#32035)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-14 15:25:45 +08:00 |
|
Yi Liu
|
50632adc58
|
Consolidate Intel Quantization Toolkit Integration in vLLM (#31716)
Signed-off-by: yiliu30 <yi4.liu@intel.com>
|
2026-01-14 07:11:30 +00:00 |
|
Nicolò Lucchesi
|
8c8653b672
|
[Docs] Nixl Usage recommend fail kv_load_failure_policy (#32198)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-13 12:51:57 +00:00 |
|
Andrew Bennett
|
f243abc92d
|
Fix various typos found in docs (#32212)
Signed-off-by: Andrew Bennett <potatosaladx@meta.com>
|
2026-01-13 03:41:47 +00:00 |
|
Andy Zhang
|
e68b0dad8b
|
doc: Update model name for Qwen3-Coder in documentation (#32185)
Signed-off-by: Andy Zhang <xiazhang@microsoft.com>
|
2026-01-12 07:10:50 -08:00 |
|
Or Ozeri
|
9cddbdba6d
|
OffloadingConnector: Add cpu_bytes_to_use configuration (#24498)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-12 15:00:43 +00:00 |
|
Jee Jee Li
|
05e8981234
|
[Doc] Improve LoRA docs (#32159)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-12 02:19:17 -08:00 |
|
Jeremy Teboul
|
657e9c0e18
|
[Fix] Introduce audio channels spec (#31595)
Signed-off-by: Jeremy Teboul <jeremyte@meta.com>
|
2026-01-09 19:34:51 +00:00 |
|
vSeamar
|
6f351548b2
|
[Frontend] Implement robust video frame recovery for corrupted videos (#29197)
Signed-off-by: cmartinez <cmartinez@roblox.com>
Signed-off-by: vSeamar <cmartinez@roblox.com>
|
2026-01-07 01:13:24 +00:00 |
|
Jee Jee Li
|
cbd4690a03
|
[LoRA]Disable linear LoRA kernel PDL (#31777)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-06 23:12:25 +08:00 |
|
BlankR
|
6ebb66ccea
|
[Doc] Fix format of multimodal_inputs.md (#31800)
Signed-off-by: BlankR <hjyblanche@gmail.com>
|
2026-01-06 03:30:24 -08:00 |
|
labAxiaoming
|
a01f2faedf
|
Add multimodal input method in the documentation (#31601)
Signed-off-by: xiaoming <1259730330@qq.com>
|
2026-01-02 12:43:30 +00:00 |
|