biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Cyrus Leung	11b556878b	[Refactor] Use data parser for matching data items to multi-modal UUIDs (#32955 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-26 15:00:28 +08:00
Cyrus Leung	d117a4d1a9	[Frontend] Introduce Renderer for processing chat messages (using `ModelConfig`) (#30200 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-22 12:44:22 +00:00
wang.yuqi	4ae77dfd42	[Frontend][1/n] Make pooling entrypoints request schema consensus \| CompletionRequest (#32395 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-01-16 06:17:04 +00:00
Hongxin Xu	49e6b86c91	[Feature] Support recording expert indices for rollout router replay (#28284 ) Signed-off-by: xhx1022 <1737006628@qq.com> Signed-off-by: Hongxin Xu <70438206+xhx1022@users.noreply.github.com> Signed-off-by: arlenxu <arlenxu@tencent.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by: arlenxu <arlenxu@tencent.com>	2026-01-12 06:23:04 -08:00
Cyrus Leung	583a90e005	[Refactor] Separate sequence and token pooling types (#32026 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-10 04:53:24 +00:00
Michael Goin	d5ec6c056f	[UX] Add vLLM model inspection view (#29450 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-09 10:12:35 -07:00
Michael Goin	bc5ef333e0	[Perf] Add skip_clone to SamplingParams for internal request handling (#31041 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-24 14:35:57 -08:00
Mark McLoughlin	f790068600	[Core] Add a random suffix to frontend-provided request IDs (#27987 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-23 13:05:39 -08:00
Jakub Zakrzewski	23daef548d	[Frontend] Support using chat template as custom score template for reranking models (#30550 ) Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-23 11:19:16 +00:00
Matthew Bonanni	a182be4308	[UX][Attention] Add `attention_config` argument to `LLM()` (#30710 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-15 17:29:09 -05:00
Cyrus Leung	64251f48df	[Chore] Adjust tokenizer import to avoid circular imports (#30601 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-13 04:42:39 -08:00
Cyrus Leung	7e24e5d4d6	[Deprecation] Remove deprecated task, seed and MM settings (#30397 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-10 19:59:39 -08:00
Cyrus Leung	e72d65b959	{Deprecation] Remove tokenizer setter (#30400 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-10 19:10:58 +00:00
Benjamin Chislett	e858bfe051	[Cleanup] Refactor profiling env vars into a CLI config (#29912 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-09 13:29:33 -05:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Yu Jiaqi	43e7593031	Support tokenization_kwargs override (#29794 ) Signed-off-by: piood <2477084691@qq.com>	2025-12-06 09:12:53 +00:00
Tova Movshovitz	adb315060c	[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache (#27170 ) Signed-off-by: tovam <tovam@pliops.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-05 18:33:26 +00:00
Cyrus Leung	9ae2f60374	[Misc] Various cleanups for MM input processing (#29970 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-04 06:22:20 +00:00
Zhuohan Li	d0cd728907	[Core] Support reseting all running requests' KV while calling `reset_prefix_cache` (#28827 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 02:25:05 +00:00
Cyrus Leung	f0a28bf661	[Misc] Unify tokenizer registration (#29767 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-01 11:34:58 +00:00
Cyrus Leung	2afcec4dec	[Misc] Update `TokenizerLike` interface and move `get_cached_tokenizer` (#29730 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-30 14:59:47 +08:00
Cyrus Leung	34a984274e	[Misc] Refactor tokenizer interface (#29693 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-29 04:02:21 -08:00
Cyrus Leung	8d9338fae4	[Chore] Rename `Processor` to `InputProcessor` (#29682 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-28 09:35:41 -08:00
maang-h	51906c8c55	[Docs] Improve `priority` parameter documentation (#29572 ) Signed-off-by: maang <maang_h@163.com> Signed-off-by: maang-h <55082429+maang-h@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-11-27 02:09:24 -08:00
Harry Mellor	a1f2676879	Scheduled removal of `override_pooler_config` and `disable_log_requests` (#29402 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-25 16:08:57 +00:00
Cyrus Leung	aab0102a26	[V0 deprecation] Remove more V0 references (#29088 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-21 11:56:59 +00:00
Alex Brooks	b4734b9550	[Bugfix] Fix default MM LoRA alignment for single str prompts (#29140 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-11-21 13:32:30 +08:00
Zhuohan Li	dd6ac1c2bb	[RL] [V1] Remove unused device argument from reset_kv_cache (#28766 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-11-14 23:59:42 -08:00
Yanan Cao	48c879369f	[Frontend] Change CompilationMode to a proper Enum (#28165 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-11-11 19:46:18 -05:00
Vensen	0ce743f4e1	Fix(llm): Abort orphaned requests when llm.chat() batch fails Fixes #26081 (#27420 ) Signed-off-by: vensenmu <vensenmu@gmail.com>	2025-11-02 16:24:01 +00:00
wenxindongwork	af6e19f50f	[Core][TPU] Support TPU Data Parallalism (#27365 ) Signed-off-by: wenxindongwork <wenxindong@google.com>	2025-11-01 17:14:44 +00:00
Junpu Fan	b186149e8e	[Bugfix][Frontend] validate arg priority in frontend LLM class before add request (#27596 ) Signed-off-by: Junpu Fan <junpufan@gmail.com>	2025-10-28 14:02:43 +00:00
Cyrus Leung	f58d9b6404	[Misc] Separate out `utils.counter` and move `utils.Device` to engine (#27588 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-28 12:20:46 +00:00
22quinn	e0ef8a2920	[BugFix] Fix torchrun DP with LLM class (#27395 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-10-24 08:11:37 +00:00
wang.yuqi	3fa2c12185	[Frontend][4/N] Improve all pooling task \| Add plugin pooling task (#26973 ) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Christian Pinto <christian.pinto@ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Christian Pinto <christian.pinto@ibm.com>	2025-10-23 14:46:18 +00:00
wang.yuqi	3729ed00ba	[Model] Add num_cached_tokens for PoolingRequestOutput (#27378 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-10-23 14:03:42 +08:00
Wentao Ye	1a0f4defb7	[Log] Add Warning for `LLM(data_parallel_size=k)` single-process DP Usage (#27282 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-22 12:12:21 +00:00
Cyrus Leung	d31f7844f8	[Misc] Move utils to avoid conflicts with stdlib, and move tests (#27169 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-19 05:20:55 -07:00
Harry Mellor	6c9fdbf725	[Docs] Replace `rst` style double-backtick with `md` single-backtick (#27091 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-17 02:47:34 -07:00
Cyrus Leung	d2740fafbf	[Chore] Separate out `vllm.utils.collections` (#26990 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-16 08:35:35 +00:00
wangxiyuan	8f4b313c37	[Misc] rename torch_dtype to dtype (#26695 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-10-15 12:11:48 +00:00
wang.yuqi	f54f85129e	[Model][2/N] Improve all pooling task \| Support multi-vector retrieval (#25370 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-10-15 11:14:41 +00:00
Morrison Turnansky	96b9aa5aa0	[Frontend][torch.compile] CompilationConfig Overhaul (#20283 ): name change compilation level to compilation mode, deprecation compilation level (#26355 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-15 02:51:16 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Cyrus Leung	ad430a67ca	[Metrics] Log multi-modal cache stats and fix reset (#26285 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-10 01:45:55 -07:00
Cyrus Leung	4bdf7ac593	[Bugfix] Fix SHM cache initialization (#26427 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-09 02:48:04 -07:00
Harry Mellor	2f99f2f506	Tidy `vllm/config/__init__.py` to only add classes and functions (#26405 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-08 07:10:00 -07:00
Grant Holmes (Ren)	d100d78eb3	Optimize KV cache distribution for asymmetric pipeline parallelism (#25164 ) Signed-off-by: gholmes829 <g.holmes429@gmail.com>	2025-10-07 09:20:30 +00:00
Cyrus Leung	d9836d4517	[Deprecation] Deprecate `LLM.set_tokenizer` (#26333 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 06:50:57 +00:00

1 2 3 4 5 ...

266 Commits