JJJYmmm
9562912cea
[MODEL] Adding Support for Qwen3.5 Models ( #34110 )
...
Signed-off-by: JJJYmmm <1650675829@qq.com >
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: wulipc <wulipc@users.noreply.github.com >
Co-authored-by: ywang96 <ywang96@users.noreply.github.com >
Co-authored-by: Isotr0py <Isotr0py@users.noreply.github.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-02-09 21:12:58 +08:00
Ekagra Ranjan
1d5922fade
[ASR] Fix audio benchmark and add RTFx metric ( #32300 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2026-02-09 10:02:37 +00:00
wang.yuqi
22b64948f6
[Frontend][last/5] Make pooling entrypoints request schema consensus. ( #31127 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-09 06:42:38 +00:00
danisereb
084aa19f02
Add support for ModelOpt MXFP8 dense models ( #33786 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-02-08 11:16:48 -08:00
Jee Jee Li
db4ede9743
[Model] Enable Step3p5ForCausalLM testing ( #33755 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-07 05:25:24 -08:00
果冻虾仁
6f7adc533a
fix description in plugin_system.md ( #33999 )
2026-02-06 19:37:02 -08:00
vllmellm
aaa2efbe98
[DOC] [ROCm] Update docker deployment doc ( #33971 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 10:05:35 -08:00
Michael Goin
c39ee9ee2b
[Docs] Add sections on process architecture and minimum CPU resources ( #33940 )
...
It seems users can be confused about vLLM's performance when running
with very small amounts of CPU cores available. We are missing a clear
overview of what vLLM's process architecture is, so I added this along with
some diagrams in arch_overview.md, and included a section on CPU resource
recommendations in optimization.md
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-06 15:26:43 +00:00
Raushan Turganbay
85ee1d962b
[Bugfix] Fix models and tests for transformers v5 ( #33977 )
...
Signed-off-by: raushan <raushan@huggingface.co >
Signed-off-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-06 21:47:41 +08:00
SorenDreano
6e7b1c4b59
[Docs] Improve documentation ( #33799 )
...
Co-authored-by: Soren Dreano <soren@numind.ai >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-02-06 12:57:09 +00:00
chengchengpei
965525667b
Onboard voyage-4-nano ( #33720 )
...
Signed-off-by: Chengcheng Pei <chengchengpei@outlook.com >
Signed-off-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com >
Co-authored-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-06 06:23:34 +00:00
Simon Mo
5819ca8944
[Docs] Add reo analytics ( #33957 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2026-02-05 17:42:22 -08:00
Nicolò Lucchesi
81a90e5277
[Docs] Add bart-plugin to docs ( #33905 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-05 12:20:25 +00:00
liranschour
8322d4e47f
Enable Cross layers KV cache layout at NIXL Connector V2 ( #33339 )
...
Signed-off-by: Liran Schour <lirans@il.ibm.com >
Signed-off-by: liranschour <liranschour@users.noreply.github.com >
Co-authored-by: Or Ozeri <or@ozery.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-05 02:17:02 -08:00
rinbaro
007b183d74
[docs] fix unintentional misspellings ( #33863 )
...
Signed-off-by: rinbaro <ilgomishra@gmail.com >
2026-02-04 20:50:59 -08:00
Ilya Boytsov
439afa4eea
feat: Add ColBERT late interaction model support ( #33686 )
...
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com >
Signed-off-by: Ilya Boytsov <boytsovpanamera@mail.ru >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-05 08:05:13 +08:00
Muhammad Hashmi
535de06cb1
[Model] Add transcription support for Qwen3-Omni ( #29828 )
...
Signed-off-by: Muhammad Hashmi <mhashmi@berkeley.edu >
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: NickLucche <nlucches@redhat.com >
2026-02-04 21:17:47 +00:00
Wentao Ye
d88a1df699
[Deprecation] Deprecate profiling envs ( #33722 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-04 05:58:21 +00:00
Frank Wang
45f8fd6f97
[Feature] Enable TRITON_ATTN for Batch Invariance ( #33688 )
...
Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
2026-02-04 13:27:34 +08:00
dtc
0d6ccf68fa
[P/D] rework mooncake connector and introduce its bootstrap server ( #31034 )
...
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2026-02-03 08:08:25 -08:00
Krish Gupta
2df2b3499d
Document NixlConnector backend selection via kv_connector_extra_config ( #33552 )
...
Signed-off-by: KrxGu <krishom70@gmail.com >
2026-02-03 05:49:59 -08:00
zxy
a3acfa1071
[Models] Intern-S1-Pro ( #33636 )
...
Signed-off-by: zxy <zhou0493@e.ntu.edu.sg >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-03 05:49:45 -08:00
Richard Zou
fd9c83d0e0
[torch.compile] Document the workaround to standalone_compile failing ( #33571 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-03 07:16:55 +00:00
Komal Kumar Teru
ba871fb788
[Misc] support arbitrary MM datasets in spec dec bench ( #33486 )
...
Signed-off-by: kkt-cohere <komal@cohere.com >
Signed-off-by: Komal Kumar Teru <162363718+kkt-cohere@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-02 08:49:48 +00:00
RED
808dd87b30
[Model] Support DeepSeek-OCR-2 ( #33165 )
...
Signed-off-by: liuli <ll407707@alibaba-inc.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: liuli <ll407707@alibaba-inc.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-02 06:24:10 +00:00
Sawyer Bowerman
ce88756b96
[Doc]: update paths for Offline/Online/Others example sections ( #33494 )
...
Signed-off-by: Sawyer Bowerman <sbowerma@redhat.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-02 03:56:53 +00:00
Paco Xu
a3154a6092
[Doc] add missing model entries in supported_models.md ( #33220 )
...
Signed-off-by: Paco Xu <paco.xu@daocloud.io >
2026-02-02 03:37:25 +00:00
csy0225
c3b40dc3e7
[Models] Step-3.5-Flash ( #33523 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: i-zhangmingming <i-zhangmingming@stepfun.com >
Co-authored-by: xiewuxun <xiewuxun@stepfun.com >
Co-authored-by: zetaohong <i-hongzetao@stepfun.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-02 10:21:18 +08:00
Cyrus Leung
79b6ec6aab
[Bugfix] Fix inconsistent handling of cache reset ( #33481 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-31 20:23:41 -08:00
Cyrus Leung
92924b2ddd
[Deprecation] Remove deprecated items related to pooling ( #33477 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-31 08:44:40 -08:00
jma99_2333
22d9a056d5
Support clear mm and encoder cache ( #33452 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-01-31 15:22:25 +00:00
Cyrus Leung
793af538a3
[Doc] Update plugin deprecation notices ( #33476 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-31 22:48:28 +08:00
jennyyyyzhen
527bcd14d4
[ROCM] Enable aiter attn backend for qwen3-next model ( #32492 )
...
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu >
2026-01-31 17:03:57 +08:00
Patrick von Platen
15e0bb9c42
[Streaming -> Realtime] Rename all voxtral related classes, fn, files ( #33415 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
2026-01-31 04:49:00 +00:00
Michael Goin
29fba76781
[UX] Use gguf repo_id:quant_type syntax for examples and docs ( #33371 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-31 12:14:54 +08:00
Nathan Weinberg
58cb55e4de
[Doc] Enhance documentation around CPU container images ( #32286 )
...
Signed-off-by: Nathan Weinberg <nweinber@redhat.com >
2026-01-30 13:36:20 +00:00
vllmellm
174f16700b
[Doc] [ROCm] Update Documentation to reflect v0.15.0 release ( #33388 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-01-30 19:06:08 +08:00
Patrick von Platen
10152d2194
[Realtime API] Adds minimal realtime API based on websockets ( #33187 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-30 18:41:29 +08:00
hujiaxin0
ba45bedfd1
[model] Add support for openPangu7B-VL ( #32449 )
...
Signed-off-by: hujiaxin <524446785@qq.com >
Signed-off-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com >
Co-authored-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com >
2026-01-30 15:54:27 +08:00
Wang Haoyu
c46b0cd0af
[Model][Multimodal] Add explicit MusicFlamingo adapter ( #32696 )
...
Signed-off-by: WangHaoyuuu <mailwhaoyu@gmail.com >
2026-01-30 11:01:29 +08:00
Aidan Reilly
133765760b
[Docs] Adding links and intro to Speculators and LLM Compressor ( #32849 )
...
Signed-off-by: Aidan Reilly <aireilly@redhat.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-29 14:12:35 -08:00
Roger Wang
8b3f0a99dd
[Models] Qwen3-ASR ( #33312 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-01-29 19:27:15 +08:00
graftim
d697581a7c
[Doc] Update outdated link to Ray documentation ( #32660 )
...
Signed-off-by: graftim <38649219+graftim@users.noreply.github.com >
2026-01-29 00:56:06 -08:00
Didier Durand
31b25f6516
[Doc]: fixing multiple typos in diverse files ( #33256 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-29 16:52:03 +08:00
Matthew Bonanni
77c4f45c6c
[7/N][Attention][Docs] Add documentation for attention backends ( #32477 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-28 17:20:22 -05:00
Or Ozeri
2e8de86777
Revert "Enable Cross layers KV cache layout at NIXL Connector ( #30207 )" ( #33241 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
2026-01-28 04:36:00 -08:00
Robert Shaw
247d1a32ea
[Quantization][Deprecation] Remove BitBlas ( #32683 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-28 11:06:22 +00:00
Maryam Tahhan
2dd359f953
[Docs] Simplify CPU x86 Docker build documentation ( #33071 )
...
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com >
2026-01-28 06:37:09 +00:00
Harry Mellor
706f123b23
[Docs] Use definition lists for CLI reference docs ( #33186 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com >
2026-01-28 02:22:48 +00:00
Angela Yi
fb7abfc1d0
[docs] Improve tlparse section ( #33211 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-01-28 02:07:37 +00:00