biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
wang.yuqi	66c079ae83	[Frontend][4/n] Improve pooling entrypoints \| pooling. (#39153 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-04-09 10:09:45 +00:00
triangleXIV	7c94ae16c6	[BugFix] --max-model-len=-1 causes over-limit requests to hang and starve the entire service (#39102 ) Signed-off-by: triangle14 <y1019026570@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2026-04-08 14:03:17 -07:00
yzong-rh	6dad4c5722	[Test] Fix flaky race condition in test_abort_final_step (#38414 ) Signed-off-by: Yifan <yzong@redhat.com>	2026-03-28 09:06:56 +00:00
Itay Alroy	d5af196c18	[2/N] Elastic EP Milestone 2: Integrating NIXL-EP (#35627 ) Signed-off-by: Itay Alroy <ialroy@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-03-13 09:25:33 -04:00
Sage	a2268617cf	[Frontend] Delegate preprocessing to `OpenAIServingRender` (#36483 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-13 00:39:43 -07:00
Wentao Ye	7279374f91	[Perf] Compute maxsim in worker side, reducing redundant copies, 2.7% E2E throughput improvement (#36159 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-09 20:55:58 -07:00
lif	00b814ba5a	[V0 Deprecation] Remove unused swap_space parameter (#36216 ) Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: mcelrath	2026-03-07 22:09:55 +08:00
Ning Xie	176c799f4c	[openai api] log exception in exception handler (1/N) (#31164 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-03-05 16:00:12 +00:00
Nick Hill	dbf0da817a	[Core] Cleanup engine pause/sleep logic (#34528 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-24 19:33:34 -08:00
Cyrus Leung	574fe75245	[Renderer] Move InputPreprocessor into Renderer (2/2) (#34560 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-17 05:29:01 -08:00
Aaron Hao	dddbff4624	[Core] Move pause and resume functions into engine (#34125 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Signed-off-by: hao-aaron <ahao@anyscale.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-02-13 00:15:10 -08:00
Cyrus Leung	ea5ff3a1f6	[Refactor] Simplify BOS/EOS token handling (#34435 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 18:18:24 -08:00
Cyrus Leung	b5dcb372e4	[Misc] Clean up validation logic in input processor (#34144 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-10 19:29:29 -08:00
Aaron Hao	89a385d79f	[Feat][RL] Pause and Resume with keep requests for single engine (#32351 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-07 00:08:58 +00:00
Nick Hill	876a16f4fb	[ModelRunner V2] Fix spec decoding + logprobs (#33391 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-31 03:33:26 +00:00
Cyrus Leung	11b556878b	[Refactor] Use data parser for matching data items to multi-modal UUIDs (#32955 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-26 15:00:28 +08:00
Cyrus Leung	d117a4d1a9	[Frontend] Introduce Renderer for processing chat messages (using `ModelConfig`) (#30200 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-22 12:44:22 +00:00
Cyrus Leung	cbbae38f93	[2/N] Move cache factories to MM registry (#32382 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-15 01:02:30 -08:00
dtc	1e584823f8	[Bugfix] Strengthen the check of X-data-parallel-rank in Hybrid LB mode (#32314 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>	2026-01-15 16:31:16 +08:00
Chauncey	4c1c501a7e	[Refactor] [10/N] to simplify the vLLM openai completion serving architecture (#32369 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-15 07:41:34 +00:00
Chauncey	fefce49807	[Refactor] [6/N] to simplify the vLLM openai chat_completion serving architecture (#32240 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-13 13:01:39 +00:00
Andreas Karatzas	df7e12715f	[ROCm][CI] Fix engine core client tests for ROCm spawn multiprocessing (#32061 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-13 15:14:30 +08:00
Ryan Rock	8cbdc7eb94	[CI/Build] Enable test_kv_cache_events_dp for AMD (#31834 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2026-01-08 09:00:24 +00:00
John Calderon	2f4e6548ef	[Bugfix] vLLM produces invalid UTF-8 tokens and “�” (#28874 ) Signed-off-by: John Calderon <jcalderon@nvidia.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2026-01-06 00:23:00 +00:00
Nick Hill	bd877162eb	[BugFix] Support online dense model DP without overhead (#30739 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: njhill <nickhill123@gmail.com>	2026-01-02 23:36:38 +08:00
Sage	39512aba72	[Prefix Cache] Include lora_name in BlockStored event for deterministic KV-cache reconstruction (#27577 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Co-authored-by: Sage <80211083+sagiahrac@users.noreply.github.com>	2025-12-30 00:17:16 +00:00
Kunshang Ji	5326c89803	[XPU][CI]skip test_preprocess_error_handling due to fork/spawn issue (#31381 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-12-26 21:40:44 +00:00
Mark McLoughlin	f790068600	[Core] Add a random suffix to frontend-provided request IDs (#27987 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-23 13:05:39 -08:00
Divakar Verma	78e5e62bbf	[AMD][CI] fix v1/engine test_preprocess_error_handling (#31192 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-23 01:28:19 +00:00
Seiji Eicher	1ab5213531	Make engine core client handshake timeout configurable (#27444 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-12-19 20:38:30 +00:00
Nick Hill	45c0526ac9	[BugFix] Handle errors when preprocessing added requests (#30895 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-19 01:29:11 +00:00
inkcherry	500f26e6d3	[Bugfix] fix DP-aware routing in OpenAI API requests (#29002 ) Signed-off-by: inkcherry <mingzhi.liu@amd.com>	2025-12-18 09:50:42 -08:00
shivampr	8580919ac3	[Bugfix] fix confusing OOM errors during v1 init (#28051 ) Signed-off-by: Shivam <shivamprasad91@gmail.com> Signed-off-by: shivampr <shivampr.dev@gmail.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-12-10 23:17:41 +00:00
Or Ozeri	4c6fd25880	kv_transfer: Rename the shared storage connectors (#30201 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-12-08 20:46:09 -08:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Nick Hill	dc264bcea1	[BugFix] Eagerly abort cancelled final-step requests (#29987 ) Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-05 17:28:32 +00:00
Lumis Chen	9bcf92295a	[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 ) Signed-off-by: LuminolT <lumischen01@gmail.com> Signed-off-by: Lumis Chen <lumischen01@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-12-03 16:06:57 +00:00
Harry Mellor	951445a52d	Remove default values from `InitVar`s so that they're not stored (#29859 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 12:16:37 +00:00
Cyrus Leung	34a984274e	[Misc] Refactor tokenizer interface (#29693 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-29 04:02:21 -08:00
Cyrus Leung	8d9338fae4	[Chore] Rename `Processor` to `InputProcessor` (#29682 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-28 09:35:41 -08:00
Cyrus Leung	e2741f6cbc	[Chore] Rename `SchedulerConfig.chunked_prefill_enabled` (#28735 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 18:39:57 +00:00
elvischenv	5d6ce2b960	[Perf] Support stream interval for reducing host overhead (#27869 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-13 13:21:25 -05:00
Jialin Ouyang	a1d3866dda	[n-gen] DO NOT repeatedly return finished child requests (#28591 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-13 03:36:07 +00:00
Chenguang Zheng	4ccffe561f	[Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#25233 ) Signed-off-by: n00909098 <nguyen.kha.long@huawei.com> Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: herotai214 <herotai214@gmail.com> Signed-off-by: Khuong Le <khuong.le.manh@huawei.com> Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com> Co-authored-by: n00909098 <nguyen.kha.long@huawei.com> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: herotai214 <herotai214@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Khuong Le <khuong.le.manh@huawei.com> Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>	2025-11-11 18:58:33 -08:00
Jialin Ouyang	4228be7959	[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead (#28245 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-11 10:28:47 -08:00
Mark McLoughlin	6f7de33bed	[Metrics] Refactor LoRA state tracking (#26801 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-11-10 16:34:36 +08:00
Nick Hill	da786e339e	[Core] Rework handling of async scheduling config (#28250 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-07 20:01:23 +00:00
Zhewen Li	0b8e871e5e	[CI/Build] Fix `test_defaults_with_usage_context` in AMD CI (#27926 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-05 15:40:24 -08:00
wangxiyuan	428bc7bf1c	[V0 deprecation] Remove VLLM_USE_V1 usage in most modules (#27955 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-04 20:51:16 -08:00

1 2 3 4

169 Commits