biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
SungMinCho	a0b782f9cc	[Metrics] Model FLOPs Utilization estimation (#30738 ) Signed-off-by: SungMinCho <tjdals4565@gmail.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-12-18 01:40:51 +00:00
Roger Wang	f5f51e5931	[Core][MM] Optimize encoder cache manager by operating with embeddings only (#30475 ) Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Sun Kim <sunytokki@gmail.com>	2025-12-16 14:18:17 -08:00
Nicolò Lucchesi	75eb302a2e	[Bugfix] Whisper fix number of allocated CrossAttn blocks per-request (#30772 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-16 14:20:19 +00:00
Nick Hill	1cec5b7ea9	[Scheduer] Simplify stop checking for pooling models (#30591 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-13 09:45:26 +00:00
Nicolò Lucchesi	0efd9f867c	[Core] Whisper Enable Encoder Batching (#29421 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-11 21:06:51 +00:00
shivampr	8580919ac3	[Bugfix] fix confusing OOM errors during v1 init (#28051 ) Signed-off-by: Shivam <shivamprasad91@gmail.com> Signed-off-by: shivampr <shivampr.dev@gmail.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-12-10 23:17:41 +00:00
Will Eaton	a9e4106f28	[P/D] KV Load Failure Recovery/Abort Configuration (#26813 ) Signed-off-by: Will Eaton <weaton@redhat.com> Signed-off-by: Will Eaton <me@wseaton.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-12-10 11:00:52 -08:00
Yifan Qiao	1b0482b9d1	[Misc][Core] Remove unused `req_index` increment in scheduler (#30176 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2025-12-07 08:39:21 +00:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Tova Movshovitz	adb315060c	[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache (#27170 ) Signed-off-by: tovam <tovam@pliops.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-05 18:33:26 +00:00
Lumis Chen	9bcf92295a	[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 ) Signed-off-by: LuminolT <lumischen01@gmail.com> Signed-off-by: Lumis Chen <lumischen01@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-12-03 16:06:57 +00:00
Yong Hoon Shin	69520bc695	Add logging for cudagraph related info (#29825 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-12-03 01:01:48 -08:00
maang-h	5d91d2b292	[Doc] Add allocate_slots parameter docs (#29777 ) Signed-off-by: maang <maang_h@163.com> Signed-off-by: maang-h <55082429+maang-h@users.noreply.github.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-12-02 23:23:09 +00:00
Chauncey	0a9caca9f5	[Bugfix] fix --scheduling-policy=priority & n>1 crashes engine (#29764 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 22:42:28 +00:00
Zhuohan Li	d0cd728907	[Core] Support reseting all running requests' KV while calling `reset_prefix_cache` (#28827 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 02:25:05 +00:00
Nick Hill	44822d7ff2	[BugFix] Preserve spec decoding uniform decode when scheduling (#29759 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-01 17:15:52 -08:00
shivampr	cabc77cc86	[Core][Observability] Add KV cache residency metrics (#27793 ) Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by: Shivam <shivamprasad91@gmail.com>	2025-12-01 18:27:53 +00:00
Mickaël Seznec	86e178f7c4	[crashfix] Eagle + multimodal can crash on mm cache miss (#29750 ) Signed-off-by: Mickael Seznec <mickael@mistral.ai> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-12-01 17:29:33 +08:00
Nick Hill	8e7a891602	[BugFix] Fix spec decoding max_tokens scheduling perf issue (#29542 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-28 20:52:23 +08:00
maang-h	c7ba1f6bc7	[BugFix] Fix ValueError in NewRequestData repr methods (#29392 ) Signed-off-by: maang <maang_h@163.com>	2025-11-28 13:42:30 +08:00
Cyrus Leung	a24ea5414b	[Deprecation] Advance deprecation status (#29617 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-27 19:04:58 +00:00
Nick Hill	4e57c6587f	[Core] Support logprobs with spec decode + async scheduling (#29223 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-25 12:55:24 -08:00
Yifan Qiao	48ddb02b79	[Hybrid Allocator] Support KV cache groups with different block_size (#29143 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-11-25 10:30:57 -05:00
wang.yuqi	67fc16cd8c	[Bugfix] If chunked_prefill is disabled, end the scheduling early. (#28911 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-11-25 16:06:09 +08:00
Chen Zhang	71df2a57ef	[Hybrid Allocator] Better layer padding strategy for gpt-oss eagle (#29303 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-11-24 14:28:32 -08:00
Cyrus Leung	5a4802588e	[Misc] Further clean up chunked prefill and prefix caching init (#29186 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-22 19:34:15 +08:00
Mark McLoughlin	c6fa3895e9	[KV Connector] Fix async connector prefix cache metrics (#28585 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-11-21 17:45:00 -05:00
Wentao Ye	a42ab317ac	[Log] Optimize startup log (#28948 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-21 08:46:20 -08:00
Woosuk Kwon	30b44a1598	GPU Model Runner V2 (#25266 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-21 08:20:55 -08:00
Jialin Ouyang	30b9c67743	Revert "[Redo] #26368 (#28771 )" (#29121 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-20 21:27:45 -08:00
Qiu	2fd893b4ce	[Feature] Prefill Context Parallel (PCP) basic support (#28718 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com> Signed-off-by: LookAround <lixushi@huawei.com> Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com> Co-authored-by: LookAround <lixushi@huawei.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com> Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>	2025-11-19 15:52:44 -05:00
Zhuohan Li	552cac95b5	[Misc] Fix wrong comment in scheduler (#28880 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-11-17 15:32:22 -08:00
Ronald	d8874c61a5	[Core] Async Scheduling X Spec Decoding Compatibility (#24799 ) Signed-off-by: Ronald1995 <ronaldautomobile@163.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-11-17 12:16:20 -08:00
Nick Hill	80b6080ddc	[BugFix] Fix async scheduling + chunked prefill + preemption (#28787 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-17 06:46:46 +08:00
wang.yuqi	a55b64635c	[Model] Allow users to control skip reading cache per request. (#28194 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com>	2025-11-16 00:04:50 -08:00
Cyrus Leung	638e4196d1	[Misc] Make `SchedulerConfig.max_model_len` init-only (#28733 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-15 01:59:31 -08:00
Cyrus Leung	98b4d389ed	[Redo] #26368 (#28771 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-14 22:47:41 -08:00
Nick Hill	ac86bff8cb	Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… (#28773 )	2025-11-14 20:24:00 -08:00
Jialin Ouyang	186352b270	[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-14 16:04:04 -08:00
Cyrus Leung	e2741f6cbc	[Chore] Rename `SchedulerConfig.chunked_prefill_enabled` (#28735 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 18:39:57 +00:00
Andy Lo	58ce8d12b7	[BugFix] Priority scheduling and spec tokens preemption (#28558 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-11-12 20:29:21 +00:00
Chenguang Zheng	4ccffe561f	[Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#25233 ) Signed-off-by: n00909098 <nguyen.kha.long@huawei.com> Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: herotai214 <herotai214@gmail.com> Signed-off-by: Khuong Le <khuong.le.manh@huawei.com> Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com> Co-authored-by: n00909098 <nguyen.kha.long@huawei.com> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: herotai214 <herotai214@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Khuong Le <khuong.le.manh@huawei.com> Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>	2025-11-11 18:58:33 -08:00
Wei Wei	bf6a3d0ff5	[Misc] Add more scoping for improved trace (#28329 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-11-10 21:03:21 +00:00
Andy Lo	47604137a2	[Bugfix] Spec decode + structured output + spec model max len edge case (#28298 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-11-08 19:44:25 +00:00
Nick Hill	da786e339e	[Core] Rework handling of async scheduling config (#28250 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-07 20:01:23 +00:00
Kuntai Du	efe73e9b57	[Core][Hybrid allocator + connector 2/n] Unify `remove_skipped_blocks` by `get_last_useful_token` (#25431 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-11-06 00:12:00 +00:00
Snehlata	e15601789b	[Feature]: Add corrupted request metric to V1 metrics system. (#27306 ) Signed-off-by: atalhens <sneh.lata@nutanix.com>	2025-11-05 13:45:29 -08:00
Nick Hill	938a81692e	[AsyncScheduling] Don't schedule past request max_tokens (#27922 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-04 17:06:28 +00:00
Mark McLoughlin	58279c60b5	[KV Connector] Make KVCacheConfig an explicit constructor argument (#27887 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-11-03 23:00:49 -08:00

1 2 3 4 5 ...

333 Commits