Mark McLoughlin
1cb4341fbc
[ROCm][PD] Remove unused moriio connector proxy code ( #32939 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-01-23 15:59:04 +00:00
Cyrus Leung
09194b90a5
[Doc] Update docs for MM model development with context usage ( #32691 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-20 10:37:35 -08:00
杨朱 · Kiki
bb9172030e
[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric ( #32661 )
...
This PR completes the removal of the deprecated vllm:time_per_output_token_seconds
metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13,
but delayed until v0.15.
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com >
2026-01-20 12:28:41 +00:00
Cyrus Leung
4753f3bf69
[Model] Use context managers for encoder- and LM-only mode ( #32605 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-20 11:43:38 +08:00
Tomas Ruiz
4a5299c93f
feat: spec decode with draft models ( #24322 )
...
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com >
2026-01-19 16:05:46 -05:00
HappyAmazonian
2f4a71daf2
[Misc] Add In-Container restart capability through supervisord for sagemaker entrypoint ( #28502 )
...
Signed-off-by: Shen Teng <sheteng@amazon.com >
Signed-off-by: HappyAmazonian <91216626+HappyAmazonian@users.noreply.github.com >
2026-01-13 13:06:10 -08:00
Isotr0py
2d0c5b630e
[Doc] Remove hardcoded Whisper in example openai translation client ( #32027 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-09 14:44:52 +00:00
inkcherry
4505849b30
[ROCm][PD] add moriio kv connector. ( #29304 )
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com >
2026-01-09 14:01:57 +00:00
labAxiaoming
a01f2faedf
Add multimodal input method in the documentation ( #31601 )
...
Signed-off-by: xiaoming <1259730330@qq.com >
2026-01-02 12:43:30 +00:00
Ekagra Ranjan
adcf682fc7
[Audio] Improve Audio Inference Scripts (offline/online) ( #29279 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
2025-12-31 23:34:18 +00:00
Sage
39512aba72
[Prefix Cache] Include lora_name in BlockStored event for deterministic KV-cache reconstruction ( #27577 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
Co-authored-by: Sage <80211083+sagiahrac@users.noreply.github.com >
2025-12-30 00:17:16 +00:00
dengyunyang
8f8f469b1b
[BugFix] skip language model in Encoder ( #30242 )
...
Signed-off-by: dengyunyang <584797741@qq.com >
2025-12-22 05:25:59 -08:00
Elizabeth Thomas
41b6f9200f
Remove all2all backend envvar ( #30363 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-18 19:46:28 +00:00
Didier Durand
1a55cfafcb
[Doc]: fixing typos in various files ( #30540 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-12-14 02:14:37 -08:00
Ev Lacey
d527cf0b3d
[FIX]Patch run-cluster.sh (fix for #28328 ) ( #30002 )
...
Signed-off-by: elacey <elacey@nvidia.com >
Signed-off-by: Ev Lacey <github@everettlacey.com >
2025-12-11 23:36:31 +00:00
Or Ozeri
4c6fd25880
kv_transfer: Rename the shared storage connectors ( #30201 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-12-08 20:46:09 -08:00
Andrew Xia
da7bc54ea8
[responsesAPI][5] ResponsesParser with tools for full MCP python loop ( #29798 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Signed-off-by: Andrew Xia <axia@meta.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-12-05 11:11:50 -05:00
Cyrus Leung
9ae2f60374
[Misc] Various cleanups for MM input processing ( #29970 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-04 06:22:20 +00:00
wang.yuqi
2eb4fe9129
[examples] Resettle pooling examples. ( #29365 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-02 15:54:28 +00:00
daniel-salib
014ece97c7
[Frontend] Add tool filtering support to ToolServer ( #29224 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-12-01 08:03:57 +00:00
Roger Wang
0ff70821c9
[Core] Deprecate xformers ( #29262 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-11-24 04:18:55 +00:00
Josh Moore
c309bb5245
[Bugfix] Update Gradio OpenAI Chatbot Webserver example to new Gradio message history format ( #29249 )
...
Signed-off-by: joshiemoore <joshiemoore98@gmail.com >
2025-11-24 00:47:54 +00:00
Andrew Xia
742e9ff6b3
[responsesAPI] parse reasoning item input ( #28248 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-22 15:42:11 +08:00
Pan Li
e5bfcb6a88
[BugFix][PD]: make example proxy usable with P2pNcclConnector ( #26628 )
...
Signed-off-by: PAN <1162953505@qq.com >
2025-11-20 17:38:31 +00:00
Didier Durand
7ed27f3cb5
[Doc]: fix typos in various files ( #28945 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-11-18 22:52:30 -08:00
Kevin H. Luu
c64c0b78de
[chore] Move the rest of wikimedia url to S3 ( #28921 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-18 09:44:18 -08:00
Nicolò Lucchesi
6f1e7f7226
[DisaggEverything] Tokens in<>out /generate endpoint ( #24261 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-14 09:58:01 -07:00
Chauncey
5c9ad138d5
[Frontend] supports interleaved thinking ( #28531 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-13 16:14:13 +08:00
Chenguang Zheng
4ccffe561f
[Core] Encoder separation for Encode-Prefill-Decode Disaggregation ( #25233 )
...
Signed-off-by: n00909098 <nguyen.kha.long@huawei.com >
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Signed-off-by: herotai214 <herotai214@gmail.com >
Signed-off-by: Khuong Le <khuong.le.manh@huawei.com >
Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com >
Co-authored-by: n00909098 <nguyen.kha.long@huawei.com >
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Co-authored-by: herotai214 <herotai214@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Khuong Le <khuong.le.manh@huawei.com >
Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com >
2025-11-11 18:58:33 -08:00
Kevin H. Luu
05f8d69077
[chore] Move some wikimedia images to S3 ( #28351 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2025-11-09 01:58:26 +00:00
Ev Lacey
77d702a22b
Enhance run_cluster.sh for multi-NIC support ( #28328 )
...
Signed-off-by: Ev Lacey <elacey@nvidia.com >
2025-11-08 22:04:16 +00:00
Harry Mellor
d9ab1ad9d1
reasoning_content -> reasoning (#27752 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-08 12:15:08 +00:00
Fang Han
da855b42d2
[Doc]: Make extraInit containers fully configurable in helm chart ( #27497 )
...
Signed-off-by: Fang Han <fhan0520@gmail.com >
2025-11-06 20:27:16 +00:00
Chauncey
59a50afa08
[Frontend] OpenAI Responses API supports Tool/Function calling - non-harmony ( #26874 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-06 10:40:03 +00:00
wang.yuqi
4464723f22
[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. ( #25524 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-30 12:13:05 +00:00
Yeshwanth N
71b1c8b667
[Chore]:Extract math and argparse utilities to separate modules ( #27188 )
...
Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com >
Signed-off-by: Yeshwanth N <yeshsurya@gmail.com >
Signed-off-by: yeshsurya <yeshsurya@gmail.com >
2025-10-26 04:03:32 -07:00
Yu Jiaqi
0552cfb195
[Model] Siglip Embedding Support ( #27324 )
...
Signed-off-by: piood <2477084691@qq.com >
2025-10-23 20:19:48 +00:00
wang.yuqi
3fa2c12185
[Frontend][4/N] Improve all pooling task | Add plugin pooling task ( #26973 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Christian Pinto <christian.pinto@ibm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com >
2025-10-23 14:46:18 +00:00
fangpings
7e0941055f
[Bugfix] Fix incorrect kv cache metrics in grafana.json ( #27133 )
...
Signed-off-by: Fangping Shi <fangping_shi@apple.com >
Co-authored-by: Fangping Shi <fangping_shi@apple.com >
2025-10-22 20:58:36 -07:00
Russell Bryant
58fab50d82
[Frontend] Require flag for loading text and image embeds ( #27204 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-22 15:52:02 +00:00
wang.yuqi
1f633b8632
[Frontend][3/N] Improve all pooling task | Support binary embedding response ( #27066 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-22 18:38:57 +08:00
wang.yuqi
f54f85129e
[Model][2/N] Improve all pooling task | Support multi-vector retrieval ( #25370 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-15 11:14:41 +00:00
Lucia Fang
8317f72354
[Misc][DP] support customized aggregated logger for dp ( #24354 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
2025-10-13 17:45:59 -07:00
wang.yuqi
d2a7938582
[Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default). ( #26414 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-13 19:06:43 +00:00
wang.yuqi
767c3ab869
[Model][0/N] Improve all pooling task | clean up ( #25817 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-13 16:44:50 +08:00
yihong
3cd36660f7
docs: wrong command in structured_outputs README ( #26677 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-10-12 20:59:01 -07:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-12 09:51:31 -07:00
Harry Mellor
e09d1753ec
Remove Python 3.9 support ahead of PyTorch 2.9 in v0.11.1 ( #26416 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-08 10:40:42 -07:00
Cyrus Leung
7e4cd070b0
[V0 Deprecation] Remove VLLM_USE_V1 from docs and scripts ( #26336 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-07 16:46:44 +08:00
Snehlata
46b0779996
[BugFix] Update KV block hash type from BlockHash to ExternalBlockHash in kv_events_subscriber - #26264 ( #26265 )
...
Signed-off-by: atalhens <sneh.lata@nutanix.com >
2025-10-07 08:42:28 +00:00