Cyrus Leung
48312e579a
[Misc] Make PlaceholderRange.get_num_embeds a method ( #34035 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-07 05:30:17 +00:00
Aaron Hao
89a385d79f
[Feat][RL] Pause and Resume with keep requests for single engine ( #32351 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: Aaron Hao <ahao@anyscale.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-07 00:08:58 +00:00
Cyrus Leung
cd8b405bd0
[Refactor] Consolidate sequence normalization and enc-dec parsing ( #33928 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-06 15:43:47 +00:00
emricksini-h
325ab6b0a8
[Feature] OTEL tracing during loading ( #31162 )
2026-02-05 16:59:28 -08:00
Aaron Hao
c1858b7ec8
[Feat][RL][1/2] Native Weight Syncing API: NCCL ( #31943 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: Aaron Hao <ahao@anyscale.com >
Co-authored-by: SumanthRH <sumanthrh99@gmail.com >
2026-02-05 12:13:23 -05:00
Cyrus Leung
038914b7c8
[Refactor] Move task outside of PoolingParams.verify ( #33796 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-02-05 09:33:11 +00:00
Chauncey
6abb0454ad
[Perf] Optimize the performance of structured output + reasoning ( #33557 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-02-05 15:45:29 +08:00
Nick Hill
add9f1fbd9
[Minor] Include StreamingInput in inputs package ( #33856 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-05 04:38:20 +00:00
zhanqiuhu
4403e3ed4c
[Metrics] Add labeled prompt token metrics for P/D disaggregation ( #33290 )
...
Add labeled Prometheus metrics to distinguish where prompt tokens come
from in P/D disaggregated deployments.
In P/D disaggregation, decode instances receive KV cache from prefill instances.
Currently, decode reports inflated prompt throughput because it counts all
prompt tokens as "computed", even though most were transferred.
This PR adds labeled metrics so users can understand actual compute work vs
transferred work:
vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally
vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer
vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache
vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu >
2026-02-04 07:46:48 +00:00
Cyrus Leung
d7e17aaacd
[Refactor] Move profiling methods to MM budget ( #33559 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-02 23:27:00 +08:00
Cyrus Leung
a502831d36
[Chore] Remove redundant input parsing methods ( #33542 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-02 10:50:47 +00:00
Cyrus Leung
21997f45b1
[Redo] #33110 with threading limit ( #33502 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: YunzhuLu <lucia.yunzhu@gmail.com >
2026-02-01 09:18:11 +00:00
Cyrus Leung
b6bb2842cf
[Critical] Revert #33110 ( #33500 )
2026-01-31 21:06:42 -08:00
Cyrus Leung
79b6ec6aab
[Bugfix] Fix inconsistent handling of cache reset ( #33481 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-31 20:23:41 -08:00
Cyrus Leung
a358e4dffe
[Refactor] Make Renderer an abstract class ( #33479 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-01 10:36:30 +08:00
Cyrus Leung
88c3e114d8
[Refactor] Move MM data parsing outside processor ( #33408 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-31 16:46:14 +00:00
YunzhuLu
27cb2f678f
[Bugfix] Early-reject requests with MM data longer than encode cache capacity ( #33110 )
...
Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-01-31 08:41:13 -08:00
jma99_2333
22d9a056d5
Support clear mm and encoder cache ( #33452 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-01-31 15:22:25 +00:00
Cyrus Leung
f0a1c8453a
[Frontend] Use new Renderer for Completions and Tokenize API ( #32863 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-31 04:51:15 -08:00
Nick Hill
876a16f4fb
[ModelRunner V2] Fix spec decoding + logprobs ( #33391 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-31 03:33:26 +00:00
Alberto Ferrer
64a40a7ab4
[Bugfix] Fix typo in read_offset variable name ( #33426 )
...
Signed-off-by: Alberto Ferrer <albertof@barrahome.org >
2026-01-31 01:26:15 +00:00
Patrick von Platen
10152d2194
[Realtime API] Adds minimal realtime API based on websockets ( #33187 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-30 18:41:29 +08:00
杨朱 · Kiki
1a7894dbdf
[Misc] Replace Optional[X] with X | None syntax ( #33332 )
...
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-30 01:56:59 -08:00
Harry Mellor
2eb673a088
Add flake8-implicit-str-concat rules to Ruff ( #33191 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 04:56:10 +00:00
Nick Hill
0cd259b2d8
[BugFix] Fix P/D with non-MoE DP ( #33037 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-27 08:03:47 -08:00
Cyrus Leung
11b556878b
[Refactor] Use data parser for matching data items to multi-modal UUIDs ( #32955 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-26 15:00:28 +08:00
Joshua Deng
91601ff478
[Feature] add session based streaming input support to v1 ( #28973 )
...
Signed-off-by: Joshua Deng <joshuakdeng@gmail.com >
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-24 12:06:28 -08:00
Cyrus Leung
d117a4d1a9
[Frontend] Introduce Renderer for processing chat messages (using ModelConfig) ( #30200 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-22 12:44:22 +00:00
Walter Beller-Morales
8be263c3fb
[Core] Cleanup shm based object store on engine shutdown ( #32429 )
...
Signed-off-by: walterbm <walter.beller.morales@gmail.com >
2026-01-20 08:53:37 +00:00
Cyrus Leung
cbbae38f93
[2/N] Move cache factories to MM registry ( #32382 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-15 01:02:30 -08:00
dtc
1e584823f8
[Bugfix] Strengthen the check of X-data-parallel-rank in Hybrid LB mode ( #32314 )
...
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com >
2026-01-15 16:31:16 +08:00
Ning Xie
9d7ae3fcdb
[code clean] remove duplicate check ( #32376 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-01-15 05:29:34 +00:00
Cyrus Leung
9ea07b41da
[1/N] Reorganize multimodal processing code ( #32327 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-14 15:25:31 +00:00
Cyrus Leung
eb28e8068d
[Refactor] Remove get_encoder_dummy_data ( #32241 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-13 09:21:23 +00:00
Wentao Ye
2a719e0865
[Perf] Optimize requests abort ( #32211 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-13 04:11:37 +00:00
Nick Hill
c6bb5b5603
[BugFix] Fix engine crash caused by chat tools + response_format ( #32127 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-13 10:33:14 +08:00
Nicolò Lucchesi
08e8e99ce7
[Misc] Change log level for batch queue log ( #32192 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-12 18:59:31 +00:00
Roger Wang
16abe6b85a
[Misc] Set default torch num threads for input processing ( #31879 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-01-12 10:28:16 -08:00
Hongxin Xu
49e6b86c91
[Feature] Support recording expert indices for rollout router replay ( #28284 )
...
Signed-off-by: xhx1022 <1737006628@qq.com >
Signed-off-by: Hongxin Xu <70438206+xhx1022@users.noreply.github.com >
Signed-off-by: arlenxu <arlenxu@tencent.com >
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: arlenxu <arlenxu@tencent.com >
2026-01-12 06:23:04 -08:00
rongfu.leng
d70249e2e9
[Misc] fix this log format not space ( #32112 )
...
Signed-off-by: lengrongfu <lenronfu@gmail.com >
2026-01-11 05:01:16 -08:00
Wentao Ye
e18464a57d
[Perf] Optimize async scheduling placeholder using empty ( #32056 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-10 00:46:11 +00:00
Wentao Ye
28ae32a5d3
[Refactor] Remove numpy split in async scheduling ( #32034 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-09 19:09:02 +00:00
zhrrr
8ff4a99566
[Async][Feat] support apply penalty or bad_words for async + spec ( #30495 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Signed-off-by: izhuhaoran <izhuhaoran@qq.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-09 02:31:50 +00:00
Max Hu
6ebe34d6fa
[Feature] Add iteration level logging and enhance nvtx marker ( #31193 )
...
Signed-off-by: Max Hu <maxhu@nvidia.com >
Signed-off-by: Max Hu <hyoung2991@gmail.com >
Co-authored-by: Max Hu <maxhu@nvidia.com >
2026-01-09 00:13:39 +00:00
Nick Hill
11cec296dd
[BugFix] Add spec-decode-incompatible request param validation ( #31982 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-09 00:08:21 +00:00
prashanth058
d3235cb503
[Fix] Enable mm_processor_cache with vision LoRA ( #31927 )
...
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com >
2026-01-08 15:31:51 +08:00
R3hankhan
1ab055efe6
[OpenAI] Extend VLLMValidationError to additional validation parameters ( #31870 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2026-01-07 14:45:49 +00:00
Benjamin Chislett
f7008ce1c4
[Perf] Async Scheduling + Speculative Decoding + Structured Outputs ( #29821 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-06 18:50:37 +00:00
Masataro Asai
142c4d1738
make 500: InternalServerError more informative ( #20610 )
...
Signed-off-by: Masataro Asai <guicho2.71828@gmail.com >
2026-01-06 17:36:24 +00:00
John Calderon
2f4e6548ef
[Bugfix] vLLM produces invalid UTF-8 tokens and “�” ( #28874 )
...
Signed-off-by: John Calderon <jcalderon@nvidia.com >
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com >
2026-01-06 00:23:00 +00:00