biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Harry Mellor	951445a52d	Remove default values from `InitVar`s so that they're not stored (#29859 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 12:16:37 +00:00
Zhuohan Li	d0cd728907	[Core] Support reseting all running requests' KV while calling `reset_prefix_cache` (#28827 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 02:25:05 +00:00
wang.yuqi	67fc16cd8c	[Bugfix] If chunked_prefill is disabled, end the scheduling early. (#28911 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-11-25 16:06:09 +08:00
Cyrus Leung	5a4802588e	[Misc] Further clean up chunked prefill and prefix caching init (#29186 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-22 19:34:15 +08:00
Mark McLoughlin	c6fa3895e9	[KV Connector] Fix async connector prefix cache metrics (#28585 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-11-21 17:45:00 -05:00
Jialin Ouyang	30b9c67743	Revert "[Redo] #26368 (#28771 )" (#29121 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-20 21:27:45 -08:00
Cyrus Leung	98b4d389ed	[Redo] #26368 (#28771 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-14 22:47:41 -08:00
Nick Hill	ac86bff8cb	Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… (#28773 )	2025-11-14 20:24:00 -08:00
Jialin Ouyang	186352b270	[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-14 16:04:04 -08:00
Cyrus Leung	e2741f6cbc	[Chore] Rename `SchedulerConfig.chunked_prefill_enabled` (#28735 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 18:39:57 +00:00
Cyrus Leung	511a6b611d	[Config] Clean up SchedulerConfig initialization (#28665 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 22:41:02 +08:00
Mark McLoughlin	6e25b1cddf	[KV Connector] Test async mode in scheduler tests (#28550 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-11-13 18:30:59 -05:00
Chenguang Zheng	4ccffe561f	[Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#25233 ) Signed-off-by: n00909098 <nguyen.kha.long@huawei.com> Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: herotai214 <herotai214@gmail.com> Signed-off-by: Khuong Le <khuong.le.manh@huawei.com> Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com> Co-authored-by: n00909098 <nguyen.kha.long@huawei.com> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: herotai214 <herotai214@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Khuong Le <khuong.le.manh@huawei.com> Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>	2025-11-11 18:58:33 -08:00
Kuntai Du	86dca07d9b	[Hybrid allocator + kv connector] revert connector test changes related to hybrid allocator (#28011 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-11-05 10:36:31 +00:00
Nick Hill	0cdbe7b744	[Core] Async scheduling + structured outputs compatibility (#26866 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-01 00:35:04 +00:00
Kuntai Du	b853540388	[Core][Hybrid allocator + kv connector 1/n] Enable hybrid allocator + KV cache connector (#25712 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: Kuntai Du <kuntai@uchicago.edu>	2025-10-24 23:34:18 -07:00
Tova Movshovitz	88afa11010	[Metrics] [KVConnector] Add connector prefix cache hit rate stats (#26245 ) Signed-off-by: tovam <tovam@pliops.com>	2025-10-23 12:21:08 +02:00
Nick Hill	4aed506b65	[Core] Streamline some structured output related code (#26737 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-14 23:27:44 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Harry Mellor	7c12763b24	Fix some typing issues found by `mypy==1.18.2` (#26596 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-10 18:21:25 +00:00
Chen Zhang	606b00e80f	[bugfix][DCP] fix block_size of hash in DCP prefix caching (#26296 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-10 03:02:49 -07:00
Qier Li	d17f0fbf30	[Core][KVConnector] Propagate all tokens on resumed preemptions (#24926 ) Signed-off-by: Qier Li <kevin44036@gmail.com> Co-authored-by: Qier Li <qier@fb.com>	2025-10-09 14:43:31 +08:00
Elaine Zhao	f08919b7d1	[Bugfix] Respect min_tokens in scheduler stop check (#26317 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com>	2025-10-08 14:08:24 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Huamin Li	7d6b03381e	[CI Failure] fix_test_auto_prefix_cache_support (#26053 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-10-04 02:44:49 -07:00
Reza Barazesh	bc546f76a1	[CI] Move applicable tests to CPU (#24080 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-30 14:45:20 +01:00
Simon Danielsson	e23cacda35	[Bugfix]: Clean up chunked prefill logging when using whisper (#25075 ) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>	2025-09-30 08:17:49 +00:00
Aaron Pham	29283e8976	[Chore] Cleanup guided namespace, move to structured outputs config (#22772 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-18 09:20:27 +00:00
Flora Feng	69f46359dd	[Multimodal] Consolidate mm inputs into MultiModalFeatureSpec (#23779 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2025-08-29 18:36:57 +08:00
Hanchenli	5da4f5d857	[Bugfix] Fix for V1 priority scheduling crashes at preemption (#23713 ) Signed-off-by: Hanchenli <lihanc2002@gmail.com>	2025-08-28 00:44:52 +00:00
Chenguang Zheng	d765cf01fe	[Core][Multimodal] Track encode cache entries by mm_hash and enable embedding sharing between requests (#22711 ) Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-08-25 00:41:17 -07:00
Woosuk Kwon	c9b38be8aa	[Spec Decode] Make `propose_draft_token_ids` non-blocking for lower TTFT (#23041 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-18 17:20:38 -07:00
Cyrus Leung	4dff91c93d	[Refactor] Allow optional MultiModalKwargsItem in IPC (#23022 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-16 11:30:49 +00:00
Or Ozeri	c280066f9d	[v1] Move block_hashes from KVCacheManager to Request.block_hashes (#19728 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-08-15 16:52:52 -07:00
Cyrus Leung	19b927e52d	[Core] Use individual MM items in P0/P1 cache and model runner (#22570 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-13 07:18:07 -07:00
Woosuk Kwon	7175817637	Revert "[Bugfix] V1 Fix the cursor leakage issue during request scheduling." (#22223 )	2025-08-04 18:37:06 -07:00
PiteXChen	2dffac464c	[Bugfix] V1 Fix the cursor leakage issue during request scheduling. (#21173 ) Signed-off-by: CLFutureX <775523362@qq.com>	2025-08-04 18:34:10 -07:00
Cyrus Leung	86ae693f20	[Deprecation][2/N] Replace `--task` with `--runner` and `--convert` (#21470 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-27 19:42:40 -07:00
Woosuk Kwon	d4d309409f	Implement Async Scheduling (#19970 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-14 23:01:46 -07:00
Woosuk Kwon	f45a332886	[Sched] Enhance the logic to remove stopped requests from queues (#20739 )	2025-07-12 15:33:13 -07:00
Aaron Pham	4a98edff1f	[Structured Outputs][V1] Skipping with models doesn't contain tokenizers (#20365 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-07-04 15:05:49 +08:00
Woosuk Kwon	2863befce3	[Optimization] Use Shared `CachedRequestData` Instance Across All Requests (#20232 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-06-30 09:07:50 -07:00
amit	4a0f7888a3	[Core] feat: Implement Priority Scheduling in V1 Engine (#19057 ) Signed-off-by: amit <amit.man@gmail.com> Co-authored-by: Roger Wang <Rogerw0108@gmail.com>	2025-06-22 20:18:08 -07:00
Maximilien de Bayser	799397ee4f	Support embedding models in V1 (#16188 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-18 21:36:33 -07:00
Chen Zhang	f8a1a2d108	[v1] Hybrid Memory Allocator (#17996 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-05 20:47:09 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Nick Hill	2dbe8c0774	[Perf] API-server scaleout with many-to-many server-engine comms (#17546 )	2025-05-30 08:17:00 -07:00
Robert Shaw	d19110204c	[P/D] NIXL Integration (#17751 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Brent Salisbury <bsalisbu@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Brent Salisbury <bsalisbu@redhat.com>	2025-05-12 09:46:16 -07:00
Chen Zhang	200da9a517	[v1] Move block management logic from KVCacheManager to SpecializedManager (#17474 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-09 15:25:34 +00:00
Harry Mellor	d6484ef3c3	Add full API docs and improve the UX of navigating them (#17485 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-03 19:42:43 -07:00

1 2

75 Commits