biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Walter Beller-Morales	e69a265135	[Feat][Core] safely abort requests when FSM fails to advance (#38663 ) Signed-off-by: walterbm <walter.beller.morales@gmail.com>	2026-04-06 08:00:16 -07:00
HarshRathva	17b72fd1c8	Fix priority preemption regression test in scheduler (#37051 ) Signed-off-by: HarshRathva <harshrathvaai@gmail.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-04-01 06:36:12 +03:00
Wentao Ye	1bf2ddd0ee	[Refactor] Rename `WAITING_FOR_FSM` to `WAITING_FOR_STRUCTURED_OUTPUT_GRAMMAR` (#38048 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-25 11:41:44 -04:00
tianshu-Michael-yu	269bf46d99	fix: disambiguate multimodal prefix cache keys (#36708 ) Signed-off-by: tianshu.yu <tianshuyu.formal@gmail.com>	2026-03-20 10:33:20 +08:00
Yong Hoon Shin	de35c06c66	Make KV connector metadata build overridable via plugin (#37336 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2026-03-17 21:29:06 +00:00
Harry Huang	45f526d652	[BugFix] Correct max memory usage for multiple KV-cache groups (#36030 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>	2026-03-17 00:38:52 +00:00
Wentao Ye	a8ff2cca92	[Perf] Optimize scheduler overhead for PD disaggregation, around 5% E2E perf improvement (#35781 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-03-10 21:25:30 -07:00
lif	00b814ba5a	[V0 Deprecation] Remove unused swap_space parameter (#36216 ) Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: mcelrath	2026-03-07 22:09:55 +08:00
cong-or	57c84ff129	perf: add __slots__ to KVCacheBlock (#36164 ) Signed-off-by: cong-or <conchubhar.gannon@gmail.com>	2026-03-05 22:04:09 -08:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
Qi Wang	6aa6ad8992	[BugFix] Fix implicit and incorrect assumption on ECConnector is_producer (#34783 ) Signed-off-by: Qi Wang <qiwa@nvidia.com>	2026-03-04 15:01:30 +01:00
aykoppol	25e02647c2	[Core] Add optional flags to check for repetitive token patterns in engine output (#35451 ) Signed-off-by: aykoppol <aykoppol+git@gmail.com>	2026-03-03 12:23:25 +08:00
zhongdaor-nv	a0fe7ea2f0	[feat] Add per-block extra_keys to KV events (#33304 ) Signed-off-by: zhongdaor-nv <zhongdaor@nvidia.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-20 20:11:40 -08:00
Ekagra Ranjan	cd81cdb399	[Scheduler][ASR] Fix CrossAttn blocks per-request for Variable length encoder inputs (#31058 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-02-16 11:08:44 +00:00
haosdent	4137c5dfa7	[Bug Fix] Fix MambaManager.cache_blocks() crash on null blocks in align mode (#34418 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-02-13 00:13:22 -08:00
Cyrus Leung	ea5ff3a1f6	[Refactor] Simplify BOS/EOS token handling (#34435 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 18:18:24 -08:00
junuxyz	c5a66d1697	[Core][BugFix] Fix PP KV cache sharding memory validation (#33698 ) Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>	2026-02-10 10:46:24 -05:00
Krish Gupta	748625cdaf	[V1][BugFix] Fix EAGLE3 encoder cache miss with disable_chunked_mm_input (#34220 ) Signed-off-by: KrxGu <krishom70@gmail.com>	2026-02-10 13:05:32 +00:00
Chen Zhang	97fa8f6590	[BugFix] Avoid prefix cache hit in the same schedule step for mamba layers (#29387 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2026-02-10 07:41:16 +00:00
Roger Wang	8a5e0e2b2b	[Bugfix][Core] Fix CPU memory leak from Request reference cycle in prefix caching (#34183 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-10 13:03:32 +08:00
Cyrus Leung	48312e579a	[Misc] Make `PlaceholderRange.get_num_embeds` a method (#34035 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-07 05:30:17 +00:00
Mark McLoughlin	2abd97592f	[KV Connector][Metrics] Do not count local prefix cache hits in connector queries (#30522 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-02-05 09:57:27 +02:00
Nick Hill	fa4e0fb028	[Core] Don't schedule spec tokens with prefill chunks (#33652 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-04 23:40:22 +00:00
Or Ozeri	8e32690869	[KV Connector][BugFix] scheduler: Delay freeing blocks of aborted async loads (#32255 ) Fixes a not-yet-reported case where it was possible for blocks to be freed by an abort before an async transfer completed, resulting in corrupted KV data. Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-02-04 11:16:34 +00:00
Nick Hill	52ee21021a	[BugFix][Spec Decoding] Fix negative accepted tokens metric crash (#33729 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-03 23:34:41 +00:00
Yifan Qiao	a01ef3fa51	[Fix] prefix cache hit rate == 0 bug with gpt-oss style models (#33524 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2026-02-02 01:59:58 +00:00
jma99_2333	22d9a056d5	Support clear mm and encoder cache (#33452 ) Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-01-31 15:22:25 +00:00
Cyrus Leung	c6e7404cc5	[Multimodal] Simplify MM input definitions (#33331 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-29 13:32:04 +00:00
Wentao Ye	3e440786af	[Feature] Fully support for async scheduling + PP, 30.8% E2E throughput improvement, 31.8% TPOT improvement (#32618 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-28 20:30:32 +00:00
Joshua Deng	91601ff478	[Feature] add session based streaming input support to v1 (#28973 ) Signed-off-by: Joshua Deng <joshuakdeng@gmail.com> Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-24 12:06:28 -08:00
Harry Huang	5206e5e28c	[V1][Hybrid] Mamba Prefix Caching with align mode (#30877 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2026-01-23 09:56:48 -08:00
knlnguyen1802	378385b90c	[EC Connector] Optimize remote cache check in scheduler (#32585 ) Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>	2026-01-22 03:30:59 +00:00
Wentao Ye	b34474bf2c	[Feature] Support async scheduling + PP (#32359 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-15 12:06:23 -05:00
Lumosis	66652e8082	[BugFix] Assign page_size_padded when unifying kv cache spec. (#32283 ) Signed-off-by: Lihao Ran <imlihao.ran@gmail.com>	2026-01-14 20:10:01 +00:00
Or Ozeri	2be765b68a	[BugFix] scheduler: Fix ordering preserving of skipped requests (#32173 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-12 18:39:38 +00:00
Or Ozeri	028599739d	[BugFix] scheduler: Fix resuming of preempted requests after async load (#31583 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-10 12:39:25 -08:00
Yifan Qiao	cd4a95e3aa	[Feat][Core] Support multiple KV cache groups in Hybrid KV Coordinator (#31707 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2026-01-09 10:53:20 -08:00
Lumosis	b634e619bb	Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. (#31635 ) Signed-off-by: Lihao Ran <imlihao.ran@gmail.com> Signed-off-by: Lumosis <30372757+Lumosis@users.noreply.github.com>	2026-01-08 09:00:07 +00:00
Nick Hill	32f4e4db00	[Cleanup] Remove deprecated fields from CachedRequestData class (#31734 ) Signed-off-by: njhill <nickhill123@gmail.com>	2026-01-05 21:07:14 +00:00
Yifan Qiao	52bf066516	[Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector (#30166 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Co-authored-by: KuntaiDu <kuntai@uchicago.edu>	2025-12-26 18:25:46 -08:00
Michael Goin	8ee90c83f8	Add `--max-model-len auto` to auto-fit context to available memory (#29431 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-23 21:37:14 -08:00
Chen Zhang	538e830caa	[KVEvent] User request.block_hash for parent block_hash (#30544 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Co-authored-by: Yifan Qiao <yifanqiao@berkeley.edu>	2025-12-23 18:23:43 -08:00
Roger Wang	f5f51e5931	[Core][MM] Optimize encoder cache manager by operating with embeddings only (#30475 ) Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Sun Kim <sunytokki@gmail.com>	2025-12-16 14:18:17 -08:00
Or Ozeri	4c6fd25880	kv_transfer: Rename the shared storage connectors (#30201 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-12-08 20:46:09 -08:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
rasmith	feecba09af	[CI/Build][AMD] Use float16 in test_reset_prefix_cache_e2e to avoid accuracy issues (#29997 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-05 08:42:25 +00:00
Mark McLoughlin	899e2ef558	[Core] Fix standalone runs of test_reset_prefix_cache_e2e (#29899 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-04 16:22:03 +08:00
Chauncey	0a9caca9f5	[Bugfix] fix --scheduling-policy=priority & n>1 crashes engine (#29764 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 22:42:28 +00:00
Harry Mellor	951445a52d	Remove default values from `InitVar`s so that they're not stored (#29859 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 12:16:37 +00:00

1 2 3 4 5

215 Commits