Seungmin Kim
160424a937
[Bugfix] Fix CUDA compatibility path setting for both datacenter and consumer NVIDIA GPUs ( #33992 )
...
Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com >
Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com >
Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-25 18:15:51 -08:00
Joao Gante
709eadbb0b
Doc link typo ( #35281 )
...
Signed-off-by: Joao Gante <joaofranciscocardosogante@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-25 03:00:31 -08:00
Yanwen Lin
80e60a6133
[Doc] Suggest "--managed-python" flag when installing python using uv ( #33069 )
...
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com >
2026-02-25 08:19:43 +00:00
jonoillar
26e722f906
[DOC][BugFix] Specfiy build dependency installation ( #34513 )
...
Signed-off-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com >
Co-authored-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com >
2026-02-25 08:04:06 +00:00
lichuang
2c619e5e3f
[Docs]Fix documentation formatting in architecture overview ( #34679 )
...
Signed-off-by: codedump <lichuang1982@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-25 08:00:15 +00:00
Simon Mo
8a685be8d9
docs: document committer proposal process in governance ( #35225 )
...
Signed-off-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-02-25 07:58:48 +00:00
Harry Mellor
f7967577f5
Remove requirement to use --hf-overrides for DeepseekVLV2ForCausalLM ( #35203 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-24 22:00:06 -08:00
Nicolò Lucchesi
f91808ae0d
[MM] Allow audio chunking for offline LLM ( #34628 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-23 21:04:28 -08:00
Mark McLoughlin
5cc7c4452e
[Metrics] Add Prometheus counters for Model FLOPs Utilization (MFU) ( #30950 )
...
Export the existing Model FLOPs Utilization (MFU) metrics via Prometheus.
`--enable-mfu-metrics` is required for these to be exposed.
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-02-23 15:01:07 +00:00
Cyrus Leung
987506bca6
[Refactor] Simplify dummy data generation ( #35025 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-22 20:55:27 -08:00
Athrael Soju
970861ac0c
[New Model] Add ColModernVBERT ( #34558 )
...
Signed-off-by: Athrael Soju <athrael.soju@gmail.com >
Signed-off-by: athrael-soju <athrael-soju@users.noreply.github.com >
2026-02-22 12:23:41 +08:00
petrpechman
bebfe55b1c
[Doc] Fix example of eagle3 ( #34960 )
...
Signed-off-by: Petr Pechman <petr.pechman@firma.seznam.cz >
Co-authored-by: Petr Pechman <petr.pechman@firma.seznam.cz >
2026-02-21 09:57:53 +00:00
Nicolò Lucchesi
ab6f3487a6
[PD] Change kv_load_failure_policy Default from "recompute" to "fail" ( #34896 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-21 01:34:57 -08:00
BADAOUI Abdennacer
8dc8a99b56
[ROCm] Enable bitsandbytes quantization support on ROCm ( #34688 )
...
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com >
2026-02-21 00:34:55 -08:00
Kata Coder
5719a4e4e6
[Frontend] Support multimodal inputs for late-interaction scoring (ColQwen3) + NewModel: nvidia/nemotron-colembed ( #34574 )
...
Signed-off-by: craftsangjae <craftsangjae@gmail.com >
2026-02-20 20:01:40 -08:00
Kyle Sayers
64ac1395e8
[Docs] Clean up speculators docs ( #34065 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2026-02-18 13:48:11 -08:00
Cyrus Leung
a766b30349
[Renderer] Deprecate code paths for old input processing ( #34775 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-18 00:35:04 -08:00
Matthew Bonanni
dc5fa77a4e
[Bugfix][MTP][Sparse MLA] Allow sparse MLA with MTP to run with FULL cudagraphs ( #34457 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-17 14:01:27 -05:00
Harry Mellor
a21cedf4ff
Bump lm-eval version for Transformers v5 compatibility ( #33994 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-16 05:24:35 -08:00
Parth Bansal
5653021094
[Doc] Add Mistral-7b-v0.3 model to the batch invariance validated model ( #34584 )
...
Signed-off-by: Parth Bansal <parthbansal127@gmail.com >
2026-02-16 12:09:00 +08:00
Maryam Tahhan
f07a128413
[CPU][ARM] Add ARM BF16 cross-compilation support and improve documen… ( #33079 )
...
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2026-02-15 06:33:08 -08:00
Isotr0py
19fab44152
[Doc] Update Encoder-Decoder models support doc with Florence-2 ( #34581 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-15 04:18:57 -08:00
Kata Coder
d1ea65d0a1
[new model] add COLQwen3 code & Inference ( #34398 )
...
Signed-off-by: craftsangjae <craftsangjae@gmail.com >
Signed-off-by: katacoder <craftsangjae@gmail.com >
2026-02-14 12:15:19 +08:00
Ilya Boytsov
071d863e20
Extend ColBERT support to non-standard BERT backbones ( #34170 )
...
Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com >
2026-02-13 09:53:09 +00:00
myselvess
bcf0731aa0
[New Model] support new model ovis2.6 ( #34426 )
...
Signed-off-by: myselvess <23743269+myselvess@users.noreply.github.com >
2026-02-13 00:12:45 -08:00
Matthew Bonanni
f2c47886fd
[Attention] Add FlashInfer Sparse MLA backend ( #33451 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2026-02-12 17:21:54 +00:00
Nicolò Lucchesi
334c715e0f
[Docs] Spec decoding docs warning removal ( #34439 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-12 09:01:51 -08:00
Louie Tsai
55a1a9563a
Vllm CPU benchmark suite improvement ( #34128 )
...
Signed-off-by: louie-tsai <louie.tsai@intel.com >
2026-02-12 16:04:44 +08:00
Tianqi Ren
786806dd44
[Doc] Update Marlin support matrix for Turing ( #34319 )
...
Signed-off-by: Tianqi Ren <tianqi.r@outlook.com >
2026-02-11 09:03:41 +00:00
Kunshang Ji
cb9574eb85
[XPU][9/N] clean up existing ipex code/doc ( #34111 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-11 00:27:15 -08:00
AllenDou
21dfb842d7
[model] support FunASR model ( #33247 )
...
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com >
2026-02-11 07:37:09 +00:00
Cyrus Leung
c9a1923bb4
[Plugin] Simplify IO Processor Plugin interface ( #34236 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-10 19:47:39 -08:00
bnellnm
d1481ba783
[MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner ( #32344 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-02-10 19:51:07 -05:00
Cyrus Leung
25e48a3aae
[Doc] Update usage of --limit-mm-per-prompt ( #34148 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-09 21:12:13 -08:00
Michael Goin
5e75a14a66
[Doc] Add DCP support to attention backend doc ( #33936 )
2026-02-09 18:33:43 -05:00
JJJYmmm
9562912cea
[MODEL] Adding Support for Qwen3.5 Models ( #34110 )
...
Signed-off-by: JJJYmmm <1650675829@qq.com >
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: wulipc <wulipc@users.noreply.github.com >
Co-authored-by: ywang96 <ywang96@users.noreply.github.com >
Co-authored-by: Isotr0py <Isotr0py@users.noreply.github.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-09 21:12:58 +08:00
Ekagra Ranjan
1d5922fade
[ASR] Fix audio benchmark and add RTFx metric ( #32300 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2026-02-09 10:02:37 +00:00
wang.yuqi
22b64948f6
[Frontend][last/5] Make pooling entrypoints request schema consensus. ( #31127 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-09 06:42:38 +00:00
danisereb
084aa19f02
Add support for ModelOpt MXFP8 dense models ( #33786 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-02-08 11:16:48 -08:00
Jee Jee Li
db4ede9743
[Model] Enable Step3p5ForCausalLM testing ( #33755 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-07 05:25:24 -08:00
果冻虾仁
6f7adc533a
fix description in plugin_system.md ( #33999 )
2026-02-06 19:37:02 -08:00
vllmellm
aaa2efbe98
[DOC] [ROCm] Update docker deployment doc ( #33971 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 10:05:35 -08:00
Michael Goin
c39ee9ee2b
[Docs] Add sections on process architecture and minimum CPU resources ( #33940 )
...
It seems users can be confused about vLLM's performance when running
with very small amounts of CPU cores available. We are missing a clear
overview of what vLLM's process architecture is, so I added this along with
some diagrams in arch_overview.md, and included a section on CPU resource
recommendations in optimization.md
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-06 15:26:43 +00:00
Raushan Turganbay
85ee1d962b
[Bugfix] Fix models and tests for transformers v5 ( #33977 )
...
Signed-off-by: raushan <raushan@huggingface.co >
Signed-off-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 21:47:41 +08:00
SorenDreano
6e7b1c4b59
[Docs] Improve documentation ( #33799 )
...
Co-authored-by: Soren Dreano <soren@numind.ai >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-02-06 12:57:09 +00:00
chengchengpei
965525667b
Onboard voyage-4-nano ( #33720 )
...
Signed-off-by: Chengcheng Pei <chengchengpei@outlook.com >
Signed-off-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com >
Co-authored-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-06 06:23:34 +00:00
Simon Mo
5819ca8944
[Docs] Add reo analytics ( #33957 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2026-02-05 17:42:22 -08:00
Nicolò Lucchesi
81a90e5277
[Docs] Add bart-plugin to docs ( #33905 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-05 12:20:25 +00:00
liranschour
8322d4e47f
Enable Cross layers KV cache layout at NIXL Connector V2 ( #33339 )
...
Signed-off-by: Liran Schour <lirans@il.ibm.com >
Signed-off-by: liranschour <liranschour@users.noreply.github.com >
Co-authored-by: Or Ozeri <or@ozery.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-05 02:17:02 -08:00
rinbaro
007b183d74
[docs] fix unintentional misspellings ( #33863 )
...
Signed-off-by: rinbaro <ilgomishra@gmail.com >
2026-02-04 20:50:59 -08:00