biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Mark McLoughlin	c6fa3895e9	[KV Connector] Fix async connector prefix cache metrics (#28585 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-11-21 17:45:00 -05:00
Julien Denize	57430fc95c	Default model load/config/tokenizer to `mistral` format if relevant files exist (#28659 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-21 13:58:59 -08:00
Wentao Ye	1f400c58b8	[CI] Add batch invariant test to ci (#27842 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-21 09:20:33 -07:00
WeiQing Chen	b34129bf8e	[Misc] remove useless v1 env (#29164 ) Signed-off-by: David Chen <530634352@qq.com>	2025-11-21 01:41:20 -08:00
Jialin Ouyang	30b9c67743	Revert "[Redo] #26368 (#28771 )" (#29121 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-20 21:27:45 -08:00
Cyrus Leung	56e96b37e4	[V0 Deprecation] Remove `best_of` (#29090 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-21 11:40:40 +08:00
rasmith	c7a29d2c8d	[CI/Build] Remove skip global cleanup in test_struct_output_generate.py (#29022 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-20 21:44:37 +00:00
rasmith	8237ab8a2b	[CI/Build] Skip lm-format-enforcer tests in test_struct_output_generate.py for now (#29021 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-20 21:35:14 +00:00
Or Ozeri	647464719b	[KVConnector][Core] Support cross-layer KV blocks (#27743 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-11-20 19:09:59 +01:00
Or Ozeri	c0c2dd1e0b	[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks (#28951 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-20 18:55:10 +08:00
Wentao Ye	2c52c7fd9a	[Bug] Fix torch dynamo warning Dynamo detected a call to a `functools.lru_cache` (#29038 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-20 16:52:23 +08:00
Benjamin Chislett	fcbcba6c70	[Feat] Iteration-level profiling for Torch and CUDA profiler (#28987 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-19 19:17:48 -08:00
Wentao Ye	1607e664f0	[Bug] Fix Batch Invariant MLA test (#28967 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-19 21:18:32 +00:00
Qiu	2fd893b4ce	[Feature] Prefill Context Parallel (PCP) basic support (#28718 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com> Signed-off-by: LookAround <lixushi@huawei.com> Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com> Co-authored-by: LookAround <lixushi@huawei.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com> Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>	2025-11-19 15:52:44 -05:00
Didier Durand	09540cd918	[Doc]: fix typos in various files (#29010 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-19 04:56:21 -08:00
Chendi.Xue	c3e2978620	[NIXL] fix cpu PD after physical <> logical block_size PR (#28904 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2025-11-18 14:03:23 -05:00
Kevin H. Luu	c64c0b78de	[chore] Move the rest of wikimedia url to S3 (#28921 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-18 09:44:18 -08:00
Nicolò Lucchesi	f226a3f0c1	[CI][NIXL] Change default `block_size` for tests (#28927 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-18 09:22:30 -08:00
Nick Hill	5bdd155277	[CI] Fix async scheduling + spec decoding test flake (#28902 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-18 05:26:32 +00:00
Wentao Ye	a289cc1dde	[Test] Batch Invariant: Rename and organize tests (#27421 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-17 18:09:47 -05:00
Ronald	d8874c61a5	[Core] Async Scheduling X Spec Decoding Compatibility (#24799 ) Signed-off-by: Ronald1995 <ronaldautomobile@163.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-11-17 12:16:20 -08:00
Nick Hill	80b6080ddc	[BugFix] Fix async scheduling + chunked prefill + preemption (#28787 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-17 06:46:46 +08:00
Eldar Kurtić	e439c784fa	Add support for Eagle with separate lm-head and embed_tokens layers (#28549 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>	2025-11-15 06:12:02 -08:00
Cyrus Leung	98b4d389ed	[Redo] #26368 (#28771 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-14 22:47:41 -08:00
Chendi.Xue	c9e665852a	[NIXL] heterogeneous block_size support (#26759 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2025-11-14 21:51:32 -08:00
Nick Hill	ac86bff8cb	Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… (#28773 )	2025-11-14 20:24:00 -08:00
Jialin Ouyang	186352b270	[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-14 16:04:04 -08:00
Nick Hill	58e61e56b7	[Test] Rework e2e async scheduling tests (#28744 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-14 16:01:09 -08:00
Laith Sakka	2e0ad629b0	Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-11-14 14:11:10 -08:00
Marcin Ostrowski	0de4f217ab	[Bugfix] TypeError: 'NoneType' object is not callable (#27410 ) Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com>	2025-11-14 21:13:53 +00:00
Cyrus Leung	e2741f6cbc	[Chore] Rename `SchedulerConfig.chunked_prefill_enabled` (#28735 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 18:39:57 +00:00
Cyrus Leung	511a6b611d	[Config] Clean up SchedulerConfig initialization (#28665 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 22:41:02 +08:00
Nicolò Lucchesi	96b23b8e3b	[Bugfix][Nixl] Fix kernel physical<>logical block_size issue (#28677 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-14 22:40:05 +08:00
Yong Hoon Shin	9324e10275	Fix KV sharing fast prefill with cudagraph enabled (#28537 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-14 11:53:42 +00:00
rasmith	93103575ce	[BugFix][CI/Build][ROCM] Fix import error and apply assert in appropriate case in test_struct_output_generate (#28311 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-11-13 22:41:29 -08:00
Mark McLoughlin	6e25b1cddf	[KV Connector] Test async mode in scheduler tests (#28550 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-11-13 18:30:59 -05:00
elvischenv	5d6ce2b960	[Perf] Support stream interval for reducing host overhead (#27869 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-13 13:21:25 -05:00
Yannick Schnider	119c4927b3	[Bugfix] Fix validate model input for decoder models (#27099 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com> Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-11-13 10:18:47 -08:00
tjandy98	4504e8029b	[Bugfix] Prevent crash on empty grammar string (#28210 ) Signed-off-by: tjandy98 <3953059+tjandy98@users.noreply.github.com>	2025-11-13 06:42:29 +00:00
Andrew Xia	1a0b157a2e	[Frontend][responsesAPI][1/n] convert responses API tool input to chat completions tool format (#28231 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-11-13 04:47:22 +00:00
Jialin Ouyang	a1d3866dda	[n-gen] DO NOT repeatedly return finished child requests (#28591 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-13 03:36:07 +00:00
Andy Lo	58ce8d12b7	[BugFix] Priority scheduling and spec tokens preemption (#28558 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-11-12 20:29:21 +00:00
alberto	bac904565f	Implement ARC KV cache eviction policy for CPU offloader (#27039 ) Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Signed-off-by: alberto <aperdomo@redhat.com> Co-authored-by: Or Ozeri <or@ozery.com>	2025-11-12 09:51:39 -08:00
Chenguang Zheng	4ccffe561f	[Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#25233 ) Signed-off-by: n00909098 <nguyen.kha.long@huawei.com> Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: herotai214 <herotai214@gmail.com> Signed-off-by: Khuong Le <khuong.le.manh@huawei.com> Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com> Co-authored-by: n00909098 <nguyen.kha.long@huawei.com> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: herotai214 <herotai214@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Khuong Le <khuong.le.manh@huawei.com> Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>	2025-11-11 18:58:33 -08:00
Jialin Ouyang	4228be7959	[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead (#28245 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-11 10:28:47 -08:00
Nicolò Lucchesi	a7ef3eb0cd	[NIXL] Generalize block-first backend layouts (FlashInfer-like) (#28282 )	2025-11-11 16:57:43 +00:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00
Rémi Delacourt	6d54336ae5	[Bugfix] Fix llguidance backend, rollback when EOS was encountered (#25905 ) Signed-off-by: Rémi Delacourt <remi@mistral.ai> Signed-off-by: remi <remi@mistral.ai> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-11-10 14:53:32 -05:00
Mark McLoughlin	6f7de33bed	[Metrics] Refactor LoRA state tracking (#26801 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-11-10 16:34:36 +08:00
usberkeley	4a8d6bd168	Fix cu_num_generated_tokens slicing logic in LogprobsLists.slice() method (#28214 ) Signed-off-by: Bradley <bradley.b.pitt@gmail.com>	2025-11-09 19:11:46 +00:00

1 2 3 4 5 ...

728 Commits