biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
zhongdaor-nv	a0fe7ea2f0	[feat] Add per-block extra_keys to KV events (#33304 ) Signed-off-by: zhongdaor-nv <zhongdaor@nvidia.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-20 20:11:40 -08:00
Ekagra Ranjan	cd81cdb399	[Scheduler][ASR] Fix CrossAttn blocks per-request for Variable length encoder inputs (#31058 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-02-16 11:08:44 +00:00
haosdent	4137c5dfa7	[Bug Fix] Fix MambaManager.cache_blocks() crash on null blocks in align mode (#34418 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-02-13 00:13:22 -08:00
Cyrus Leung	ea5ff3a1f6	[Refactor] Simplify BOS/EOS token handling (#34435 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 18:18:24 -08:00
junuxyz	c5a66d1697	[Core][BugFix] Fix PP KV cache sharding memory validation (#33698 ) Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>	2026-02-10 10:46:24 -05:00
Krish Gupta	748625cdaf	[V1][BugFix] Fix EAGLE3 encoder cache miss with disable_chunked_mm_input (#34220 ) Signed-off-by: KrxGu <krishom70@gmail.com>	2026-02-10 13:05:32 +00:00
Chen Zhang	97fa8f6590	[BugFix] Avoid prefix cache hit in the same schedule step for mamba layers (#29387 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2026-02-10 07:41:16 +00:00
Roger Wang	8a5e0e2b2b	[Bugfix][Core] Fix CPU memory leak from Request reference cycle in prefix caching (#34183 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-10 13:03:32 +08:00
Cyrus Leung	48312e579a	[Misc] Make `PlaceholderRange.get_num_embeds` a method (#34035 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-07 05:30:17 +00:00
Mark McLoughlin	2abd97592f	[KV Connector][Metrics] Do not count local prefix cache hits in connector queries (#30522 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-02-05 09:57:27 +02:00
Nick Hill	fa4e0fb028	[Core] Don't schedule spec tokens with prefill chunks (#33652 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-04 23:40:22 +00:00
Or Ozeri	8e32690869	[KV Connector][BugFix] scheduler: Delay freeing blocks of aborted async loads (#32255 ) Fixes a not-yet-reported case where it was possible for blocks to be freed by an abort before an async transfer completed, resulting in corrupted KV data. Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-02-04 11:16:34 +00:00
Nick Hill	52ee21021a	[BugFix][Spec Decoding] Fix negative accepted tokens metric crash (#33729 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-03 23:34:41 +00:00
Yifan Qiao	a01ef3fa51	[Fix] prefix cache hit rate == 0 bug with gpt-oss style models (#33524 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2026-02-02 01:59:58 +00:00
jma99_2333	22d9a056d5	Support clear mm and encoder cache (#33452 ) Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-01-31 15:22:25 +00:00
Cyrus Leung	c6e7404cc5	[Multimodal] Simplify MM input definitions (#33331 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-29 13:32:04 +00:00
Wentao Ye	3e440786af	[Feature] Fully support for async scheduling + PP, 30.8% E2E throughput improvement, 31.8% TPOT improvement (#32618 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-28 20:30:32 +00:00
Joshua Deng	91601ff478	[Feature] add session based streaming input support to v1 (#28973 ) Signed-off-by: Joshua Deng <joshuakdeng@gmail.com> Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-24 12:06:28 -08:00
Harry Huang	5206e5e28c	[V1][Hybrid] Mamba Prefix Caching with align mode (#30877 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2026-01-23 09:56:48 -08:00
knlnguyen1802	378385b90c	[EC Connector] Optimize remote cache check in scheduler (#32585 ) Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>	2026-01-22 03:30:59 +00:00
Wentao Ye	b34474bf2c	[Feature] Support async scheduling + PP (#32359 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-15 12:06:23 -05:00
Lumosis	66652e8082	[BugFix] Assign page_size_padded when unifying kv cache spec. (#32283 ) Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>	2026-01-14 20:10:01 +00:00
Or Ozeri	2be765b68a	[BugFix] scheduler: Fix ordering preserving of skipped requests (#32173 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-12 18:39:38 +00:00
Or Ozeri	028599739d	[BugFix] scheduler: Fix resuming of preempted requests after async load (#31583 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-10 12:39:25 -08:00
Yifan Qiao	cd4a95e3aa	[Feat][Core] Support multiple KV cache groups in Hybrid KV Coordinator (#31707 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2026-01-09 10:53:20 -08:00
Lumosis	b634e619bb	Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. (#31635 ) Signed-off-by: Lihao Ran <imlihao.ran@gmail.com> Signed-off-by: Lumosis <30372757+Lumosis@users.noreply.github.com>	2026-01-08 09:00:07 +00:00
Nick Hill	32f4e4db00	[Cleanup] Remove deprecated fields from CachedRequestData class (#31734 ) Signed-off-by: njhill <nickhill123@gmail.com>	2026-01-05 21:07:14 +00:00
Yifan Qiao	52bf066516	[Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector (#30166 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Co-authored-by: KuntaiDu <kuntai@uchicago.edu>	2025-12-26 18:25:46 -08:00
Michael Goin	8ee90c83f8	Add `--max-model-len auto` to auto-fit context to available memory (#29431 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-23 21:37:14 -08:00
Chen Zhang	538e830caa	[KVEvent] User request.block_hash for parent block_hash (#30544 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Co-authored-by: Yifan Qiao <yifanqiao@berkeley.edu>	2025-12-23 18:23:43 -08:00
Roger Wang	f5f51e5931	[Core][MM] Optimize encoder cache manager by operating with embeddings only (#30475 ) Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Sun Kim <sunytokki@gmail.com>	2025-12-16 14:18:17 -08:00
Or Ozeri	4c6fd25880	kv_transfer: Rename the shared storage connectors (#30201 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-12-08 20:46:09 -08:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
rasmith	feecba09af	[CI/Build][AMD] Use float16 in test_reset_prefix_cache_e2e to avoid accuracy issues (#29997 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-05 08:42:25 +00:00
Mark McLoughlin	899e2ef558	[Core] Fix standalone runs of test_reset_prefix_cache_e2e (#29899 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-04 16:22:03 +08:00
Chauncey	0a9caca9f5	[Bugfix] fix --scheduling-policy=priority & n>1 crashes engine (#29764 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 22:42:28 +00:00
Harry Mellor	951445a52d	Remove default values from `InitVar`s so that they're not stored (#29859 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 12:16:37 +00:00
Zhuohan Li	d0cd728907	[Core] Support reseting all running requests' KV while calling `reset_prefix_cache` (#28827 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 02:25:05 +00:00
shivampr	cabc77cc86	[Core][Observability] Add KV cache residency metrics (#27793 ) Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by: Shivam <shivamprasad91@gmail.com>	2025-12-01 18:27:53 +00:00
Marcin Ostrowski	5cfa967efa	[Bugfix] TypeError: 'NoneType' object is not callable (#29414 ) Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com>	2025-12-01 13:16:44 +00:00
maang-h	c7ba1f6bc7	[BugFix] Fix ValueError in NewRequestData repr methods (#29392 ) Signed-off-by: maang <maang_h@163.com>	2025-11-28 13:42:30 +08:00
Yifan Qiao	48ddb02b79	[Hybrid Allocator] Support KV cache groups with different block_size (#29143 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-11-25 10:30:57 -05:00
wang.yuqi	67fc16cd8c	[Bugfix] If chunked_prefill is disabled, end the scheduling early. (#28911 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-11-25 16:06:09 +08:00
Chen Zhang	71df2a57ef	[Hybrid Allocator] Better layer padding strategy for gpt-oss eagle (#29303 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-11-24 14:28:32 -08:00
Cyrus Leung	5a4802588e	[Misc] Further clean up chunked prefill and prefix caching init (#29186 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-22 19:34:15 +08:00
Mark McLoughlin	c6fa3895e9	[KV Connector] Fix async connector prefix cache metrics (#28585 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-11-21 17:45:00 -05:00
Jialin Ouyang	30b9c67743	Revert "[Redo] #26368 (#28771 )" (#29121 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-20 21:27:45 -08:00
Cyrus Leung	98b4d389ed	[Redo] #26368 (#28771 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-14 22:47:41 -08:00
Nick Hill	ac86bff8cb	Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… (#28773 )	2025-11-14 20:24:00 -08:00

1 2 3 4 5

203 Commits