Jared Wen
6ee7f18f33
[Logging] add --disable-access-log-for-endpoints CLI option ( #30011 )
...
Add a new CLI option --disable-access-log-for-endpoints to suppress
uvicorn access logs for specified endpoints (e.g., /health, /metrics, /ping).
This addresses the need to reduce log noise in production environments
where health check endpoints are frequently polled by load balancers or
monitoring systems, generating excessive log entries that obscure
meaningful request logs.
Fixes #29982
Signed-off-by: JaredforReal <w13431838023@gmail.com >
2026-01-26 21:49:03 +00:00
Yuxuan Zhang
bb17e8f11c
[GLM-OCR] GLM-OCR with MTP Support ( #33005 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-26 06:24:43 -08:00
Itay Etelis
6ca2c91b96
[Model] Use mm_position to compute mrope positions for Qwen3-Omni ( #33010 )
...
Signed-off-by: Itay Etelis <itay.etelis@ibm.com >
Co-authored-by: Itay Etelis <itay.etelis@ibm.com >
2026-01-26 13:48:07 +00:00
ltd0924
b40db4dfec
[StepVL] add step vl offline example ( #33054 )
...
Signed-off-by: luotingdan <luotingdan@stepfun.com >
Co-authored-by: luotingdan <luotingdan@stepfun.com >
2026-01-26 01:00:32 -08:00
Cyrus Leung
11b556878b
[Refactor] Use data parser for matching data items to multi-modal UUIDs ( #32955 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-26 15:00:28 +08:00
Itay Etelis
a698e8e7ad
[Model] Use mm_position to compute mrope positions for Qwen2.5-Omni ( #32772 )
...
Signed-off-by: Itay Etelis <itay.etelis@ibm.com >
Co-authored-by: Itay Etelis <itay.etelis@ibm.com >
2026-01-25 20:15:53 +08:00
Mark McLoughlin
1cb4341fbc
[ROCm][PD] Remove unused moriio connector proxy code ( #32939 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-01-23 15:59:04 +00:00
wang.yuqi
05f3d714db
[Frontend][3/n] Make pooling entrypoints request schema consensus | EmbedRequest & ClassifyRequest ( #32905 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-23 12:03:44 +00:00
wang.yuqi
328cbb2773
[Frontend][2/n] Make pooling entrypoints request schema consensus | ChatRequest ( #32574 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-22 10:32:44 +00:00
Robert Shaw
cea3c754c4
[Quantization][Deprecation] Remove DeepSpeedFp8 ( #32679 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-21 09:32:12 -05:00
Kim Hee Su
7727ce35c2
[Model] Add Eagle2.5-8B Vision-Language Model support ( #32456 )
...
Signed-off-by: kimheesu <wlskaka4@gmail.com >
2026-01-21 09:39:53 +00:00
Cyrus Leung
09194b90a5
[Doc] Update docs for MM model development with context usage ( #32691 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-20 10:37:35 -08:00
杨朱 · Kiki
bb9172030e
[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric ( #32661 )
...
This PR completes the removal of the deprecated vllm:time_per_output_token_seconds
metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13,
but delayed until v0.15.
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com >
2026-01-20 12:28:41 +00:00
Cyrus Leung
4753f3bf69
[Model] Use context managers for encoder- and LM-only mode ( #32605 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-20 11:43:38 +08:00
Tomas Ruiz
4a5299c93f
feat: spec decode with draft models ( #24322 )
...
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com >
2026-01-19 16:05:46 -05:00
wang.yuqi
c88860d759
[Frontend] Score entrypoint support data_1 & data_2 and queries & documents as inputs ( #32577 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-19 14:07:46 +00:00
Isotr0py
38bf2ffb21
[Bugfix] Fix GLM-ASR audio encoder RoPE dim ( #32540 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-18 19:17:59 +08:00
wang.yuqi
4ae77dfd42
[Frontend][1/n] Make pooling entrypoints request schema consensus | CompletionRequest ( #32395 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-16 06:17:04 +00:00
sangho.lee
7e6f123810
Add Molmo2 multimodal model support ( #30997 )
...
Signed-off-by: sanghol <sanghol@allenai.org >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-14 15:33:09 +08:00
HappyAmazonian
2f4a71daf2
[Misc] Add In-Container restart capability through supervisord for sagemaker entrypoint ( #28502 )
...
Signed-off-by: Shen Teng <sheteng@amazon.com >
Signed-off-by: HappyAmazonian <91216626+HappyAmazonian@users.noreply.github.com >
2026-01-13 13:06:10 -08:00
Cyrus Leung
232214b2ae
[Bugfix] Replace PoolingParams.normalize with use_activation ( #32243 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-13 10:45:42 +00:00
Jaehyun An
6bc9c8473e
[MODEL] New model support for kakaocorp/kanana-1.5-v-3b-instruct ( #29384 )
...
Signed-off-by: Jaehyun An <steve.ai@kakaocorp.com >
2026-01-12 16:39:02 +00:00
Isotr0py
9dbe1fe960
[Bugfix] Fix missing scale passing for encoder Triton Attention implementation ( #32149 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-12 11:13:41 +00:00
wang.yuqi
60446cd684
[Model] Improve multimodal pooling examples ( #32085 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-12 07:54:09 +00:00
Ning Xie
d74132ca3b
fix offline inference chat response prompt ( #32088 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-01-11 14:01:18 +00:00
Ning Xie
14fc7a68c7
[Bugfix] fix offline chat output prompt ( #32076 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-01-10 07:50:57 +00:00
Matthew Bonanni
2612ba9285
[1/N][Attention] Restructure attention: move files ( #31916 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-09 13:10:24 -08:00
Isotr0py
2d0c5b630e
[Doc] Remove hardcoded Whisper in example openai translation client ( #32027 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-09 14:44:52 +00:00
inkcherry
4505849b30
[ROCm][PD] add moriio kv connector. ( #29304 )
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com >
2026-01-09 14:01:57 +00:00
tianshu-Michael-yu
03fd76c570
[Model] Add LFM2-VL model support ( #31758 )
...
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-01-08 05:00:27 -08:00
Isotr0py
eac3b96ec0
[Models] Allow converting Qwen3-VL into Reranker model ( #31890 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-08 08:10:15 +00:00
wang.yuqi
96860af655
[Model] rename use_pad_token to use_sep_token ( #31784 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-06 14:16:04 +00:00
Cyrus Leung
da71d44410
[Doc] Show that use_audio_in_video is supported in docs ( #30837 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-05 23:27:19 -08:00
baonudesifeizhai
02dbb933cb
Fix GLM-4.6v flash tool calling in transformers 5.x ( #31622 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
2026-01-05 11:32:43 -08:00
wang.yuqi
911d38ed99
[Model] Let more models to support the score template. ( #31335 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-01-05 11:54:26 +00:00
labAxiaoming
a01f2faedf
Add multimodal input method in the documentation ( #31601 )
...
Signed-off-by: xiaoming <1259730330@qq.com >
2026-01-02 12:43:30 +00:00
Ekagra Ranjan
adcf682fc7
[Audio] Improve Audio Inference Scripts (offline/online) ( #29279 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
2025-12-31 23:34:18 +00:00
baonudesifeizhai
d722e9e614
Add GLM-ASR multimodal support ( #31436 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-31 23:12:24 +08:00
Sage
39512aba72
[Prefix Cache] Include lora_name in BlockStored event for deterministic KV-cache reconstruction ( #27577 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
Co-authored-by: Sage <80211083+sagiahrac@users.noreply.github.com >
2025-12-30 00:17:16 +00:00
Isotr0py
40a8756224
[Chore]: Remove HF format Phi4-MM examples ( #31405 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-27 13:42:02 +00:00
Mark Gatere
ba25a65992
[Frontend] add FunctionGemma tool parser support ( #31218 )
...
Signed-off-by: gateremark <gateremg@gmail.com >
2025-12-25 15:29:25 +08:00
Jakub Zakrzewski
23daef548d
[Frontend] Support using chat template as custom score template for reranking models ( #30550 )
...
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2025-12-23 11:19:16 +00:00
dengyunyang
8f8f469b1b
[BugFix] skip language model in Encoder ( #30242 )
...
Signed-off-by: dengyunyang <584797741@qq.com >
2025-12-22 05:25:59 -08:00
Lucas Wilkinson
7e065eba59
[CI] Fix "2 Node Tests (4 GPUs in total)" ( #31090 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-12-22 10:32:40 +08:00
Lucas Wilkinson
ae0770fa6b
[CI] Fix H200 Distributed test ( #31054 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-12-20 16:48:49 -05:00
Elizabeth Thomas
41b6f9200f
Remove all2all backend envvar ( #30363 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-18 19:46:28 +00:00
汪志鹏
1adeb3b84c
[New Model] BAGEL support (AR only) ( #28439 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com >
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-15 14:58:23 +08:00
Lasha Koroshinadze
3a20450d31
Add AudioFlamingo3 model support ( #30539 )
...
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com >
Signed-off-by: Lasha Koroshinadze <26011196+lashahub@users.noreply.github.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-14 02:14:55 -08:00
Didier Durand
1a55cfafcb
[Doc]: fixing typos in various files ( #30540 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-12-14 02:14:37 -08:00
Ryan Rock
197473c4e7
[CI/Build] Use spawn subprocess for ROCm ( #30272 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com >
2025-12-12 03:33:17 +00:00