biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jialin Ouyang	186352b270	[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-14 16:04:04 -08:00
Marcin Ostrowski	0de4f217ab	[Bugfix] TypeError: 'NoneType' object is not callable (#27410 ) Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com>	2025-11-14 21:13:53 +00:00
Cyrus Leung	e2741f6cbc	[Chore] Rename `SchedulerConfig.chunked_prefill_enabled` (#28735 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 18:39:57 +00:00
Cyrus Leung	511a6b611d	[Config] Clean up SchedulerConfig initialization (#28665 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 22:41:02 +08:00
Mark McLoughlin	6e25b1cddf	[KV Connector] Test async mode in scheduler tests (#28550 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-11-13 18:30:59 -05:00
Andy Lo	58ce8d12b7	[BugFix] Priority scheduling and spec tokens preemption (#28558 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-11-12 20:29:21 +00:00
Chenguang Zheng	4ccffe561f	[Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#25233 ) Signed-off-by: n00909098 <nguyen.kha.long@huawei.com> Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: herotai214 <herotai214@gmail.com> Signed-off-by: Khuong Le <khuong.le.manh@huawei.com> Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com> Co-authored-by: n00909098 <nguyen.kha.long@huawei.com> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: herotai214 <herotai214@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Khuong Le <khuong.le.manh@huawei.com> Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>	2025-11-11 18:58:33 -08:00
Kuntai Du	86dca07d9b	[Hybrid allocator + kv connector] revert connector test changes related to hybrid allocator (#28011 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-11-05 10:36:31 +00:00
Nick Hill	938a81692e	[AsyncScheduling] Don't schedule past request max_tokens (#27922 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-04 17:06:28 +00:00
Biswa Panda	1bf43ae35d	[BugFix][LoRA] use adapter_id instead of id field of lora_request (#27728 ) Signed-off-by: Biswa Panda <biswa.panda@gmail.com>	2025-11-03 10:08:08 +08:00
Nick Hill	0cdbe7b744	[Core] Async scheduling + structured outputs compatibility (#26866 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-01 00:35:04 +00:00
Kuntai Du	b853540388	[Core][Hybrid allocator + kv connector 1/n] Enable hybrid allocator + KV cache connector (#25712 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2025-10-24 23:34:18 -07:00
Tova Movshovitz	88afa11010	[Metrics] [KVConnector] Add connector prefix cache hit rate stats (#26245 ) Signed-off-by: tovam <tovam@pliops.com>	2025-10-23 12:21:08 +02:00
Andrew Sansom	ff93cc8c84	[CORE] Support Prefix Caching with Prompt Embeds (#27219 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-10-22 22:18:07 -07:00
Sage	1651003c35	[Prefix Cache] Use LoRA name for consistent KV-cache block hashing (#27211 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2025-10-22 18:13:03 +00:00
dongbo910220	8a297115e2	[Chore] Separate out hashing utilities from vllm.utils (#27151 ) Signed-off-by: dongbo910220 <1275604947@qq.com>	2025-10-19 11:09:38 +08:00
iAmir97	1d165d6d85	[Chore] Separate out `vllm.utils.mem_utils` (#27143 ) Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com> Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com> Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-18 10:06:59 +00:00
Tahsin Tunan	43721bc67f	[CI] Replace large models with tiny alternatives in tests (#24057 ) Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-16 15:51:27 +01:00
Nick Hill	4aed506b65	[Core] Streamline some structured output related code (#26737 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-14 23:27:44 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Harry Mellor	7c12763b24	Fix some typing issues found by `mypy==1.18.2` (#26596 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-10 18:21:25 +00:00
Chen Zhang	606b00e80f	[bugfix][DCP] fix block_size of hash in DCP prefix caching (#26296 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-10 03:02:49 -07:00
Cyrus Leung	ad430a67ca	[Metrics] Log multi-modal cache stats and fix reset (#26285 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-10 01:45:55 -07:00
Qier Li	d17f0fbf30	[Core][KVConnector] Propagate all tokens on resumed preemptions (#24926 ) Signed-off-by: Qier Li <kevin44036@gmail.com> Co-authored-by: Qier Li <qier@fb.com>	2025-10-09 14:43:31 +08:00
Elaine Zhao	f08919b7d1	[Bugfix] Respect min_tokens in scheduler stop check (#26317 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com>	2025-10-08 14:08:24 -07:00
Cyrus Leung	1e4ecca1d0	[V0 Deprecation] Remove `VLLM_USE_V1` from tests (#26341 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 15:42:31 +00:00
Grant Holmes (Ren)	d100d78eb3	Optimize KV cache distribution for asymmetric pipeline parallelism (#25164 ) Signed-off-by: gholmes829 <g.holmes429@gmail.com>	2025-10-07 09:20:30 +00:00
Harry Mellor	4e256cadc2	Remove all references to `yapf` as it's no longer used (#26251 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 09:18:11 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Huamin Li	7d6b03381e	[CI Failure] fix_test_auto_prefix_cache_support (#26053 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-10-04 02:44:49 -07:00
Reza Barazesh	bc546f76a1	[CI] Move applicable tests to CPU (#24080 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-30 14:45:20 +01:00
Yongye Zhu	fa7e254a7f	[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@meta.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by: Lucia Fang <fanglu@meta.com> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Xiaozhu Meng <mxz297@gmail.com> Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-09-30 17:14:41 +08:00
Simon Danielsson	e23cacda35	[Bugfix]: Clean up chunked prefill logging when using whisper (#25075 ) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>	2025-09-30 08:17:49 +00:00
Cyrus Leung	cd87bfbf37	[CI/Build] Reorganize root-level V1 tests (#25767 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-27 13:51:15 +08:00
Jialin Ouyang	4f8c4b890a	[Core] Use KVCacheBlock as much as possible instead of dict[block_id, KVCacheBlock] (#24830 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-09-23 15:11:14 -07:00
Chen Zhang	9607d5eb44	[Hybrid Allocator] Support full attention with different hidden size (#25101 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-09-19 23:43:59 -07:00
Jialin Ouyang	2506ce5189	[Core][Prefix Hash] Fix prefix hash metrics sliding window maintainance (#24990 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-09-19 12:22:53 -06:00
Aaron Pham	29283e8976	[Chore] Cleanup guided namespace, move to structured outputs config (#22772 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-18 09:20:27 +00:00
Mickaël Seznec	45bfa49cb8	[Tests] fix initialization of kv hash in tests (#24273 ) Signed-off-by: Mickael Seznec <mickael@mistral.ai>	2025-09-15 21:48:27 +00:00
Ning Xie	bc0f6059a2	[UT] enhance free kv cache block queue popleft_n (#24220 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-09-15 10:04:37 +00:00
Ning Xie	3f3313981c	[kv cache] update num_free_blocks in the end (#24228 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-09-15 05:15:12 +00:00
Chen Zhang	8e5cdcda4e	[Hybrid Allocator] Support Pipeline Parallel (#23974 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-09-14 15:55:17 -07:00
Flora Feng	0377802c20	[Multimodal] Remove legacy multimodal fields in favor of MultiModalFeatureSpec (#24548 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2025-09-12 21:42:23 +08:00
Zebing Lin	82dfb12e52	[Core] Use sha256 bytes instead of BlockHash to reduce GC overhead (#23673 ) Signed-off-by: linzebing <linzebing1995@gmail.com>	2025-09-08 21:34:37 -07:00
Didier Durand	fad73be1a5	[Doc]: fix typos in Python comments (#24077 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-02 02:38:55 -07:00
Ning Xie	5490d633ce	[UT] fix unify_kv_cache_configs when kv cache config needs sort (#23843 )	2025-08-30 11:22:14 +00:00
Flora Feng	69f46359dd	[Multimodal] Consolidate mm inputs into MultiModalFeatureSpec (#23779 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2025-08-29 18:36:57 +08:00
Hanchenli	5da4f5d857	[Bugfix] Fix for V1 priority scheduling crashes at preemption (#23713 ) Signed-off-by: Hanchenli <lihanc2002@gmail.com>	2025-08-28 00:44:52 +00:00
Roger Wang	b5d34af328	[Bugfix] Fix scheduling when repeated images in one request (#23544 ) Signed-off-by: Roger Wang <hey@rogerw.me> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.me> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>	2025-08-26 09:46:28 +00:00
Chenguang Zheng	d765cf01fe	[Core][Multimodal] Track encode cache entries by mm_hash and enable embedding sharing between requests (#22711 ) Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-08-25 00:41:17 -07:00

1 2 3 4 5

203 Commits