biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Andreas Karatzas	df7e12715f	[ROCm][CI] Fix engine core client tests for ROCm spawn multiprocessing (#32061 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-13 15:14:30 +08:00
Nicolò Lucchesi	f8bd8394e3	[NIXL][Bugfix] Failure logging overhaul + early metadata free on failure (#32031 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-12 20:38:49 +00:00
Or Ozeri	2be765b68a	[BugFix] scheduler: Fix ordering preserving of skipped requests (#32173 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-12 18:39:38 +00:00
Matthew Bonanni	20228cb851	[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-12 09:13:56 -08:00
Nicolò Lucchesi	5b68107411	[Misc][PD] Fix `get_attn_backend` usage in transfer connectors (#31988 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-12 18:10:05 +01:00
Asaf Joseph Gardin	8fb2c135be	[Bugfix] Fix stale SSM state for new Mamba requests scheduled as decode (#32118 ) Signed-off-by: Josephasafg <ajgard7@gmail.com>	2026-01-12 17:02:38 +00:00
Or Ozeri	9cddbdba6d	OffloadingConnector: Add cpu_bytes_to_use configuration (#24498 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-12 15:00:43 +00:00
Or Ozeri	4c16ba617f	[KVConnector] OffloadingConnector: Fix bug in handling of preemptions (#29870 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-11 08:05:36 +00:00
Or Ozeri	2a4dbe24ea	[BugFix] Wait for compute before offloading KV to CPU (#31341 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-10 22:25:08 +00:00
Or Ozeri	028599739d	[BugFix] scheduler: Fix resuming of preempted requests after async load (#31583 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-10 12:39:25 -08:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
Yifan Qiao	cd4a95e3aa	[Feat][Core] Support multiple KV cache groups in Hybrid KV Coordinator (#31707 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2026-01-09 10:53:20 -08:00
inkcherry	4505849b30	[ROCm][PD] add moriio kv connector. (#29304 ) Signed-off-by: inkcherry <mingzhi.liu@amd.com>	2026-01-09 14:01:57 +00:00
Andreas Karatzas	e02706d2d2	[ROCm][CI][V1] Fix `nixl_connector` test failure and achieve CUDA parity in `test_async_scheduling` (#32000 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-09 20:48:32 +08:00
Nick Hill	29ce48221c	[Cleanup] Remove obsolete spec decoding compatibility logic (#32003 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 05:44:18 +00:00
zhrrr	8ff4a99566	[Async][Feat] support apply penalty or bad_words for async + spec (#30495 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com> Signed-off-by: izhuhaoran <izhuhaoran@qq.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 02:31:50 +00:00
Nick Hill	11cec296dd	[BugFix] Add spec-decode-incompatible request param validation (#31982 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 00:08:21 +00:00
Lucas Wilkinson	6cdf015c3c	[Misc] Fix `Current vLLM config is not set.` warnings, assert to avoid issues in the future (#31747 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-08 15:20:49 -08:00
Ryan Rock	8cbdc7eb94	[CI/Build] Enable test_kv_cache_events_dp for AMD (#31834 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2026-01-08 09:00:24 +00:00
Lumosis	b634e619bb	Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. (#31635 ) Signed-off-by: Lihao Ran <imlihao.ran@gmail.com> Signed-off-by: Lumosis <30372757+Lumosis@users.noreply.github.com>	2026-01-08 09:00:07 +00:00
Andreas Karatzas	5f2a473ff3	[ROCm][CI] v1 cpu offloading attention backend fix (#31833 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-08 14:37:50 +08:00
Andreas Karatzas	087a138963	[ROCm][CI] Fix attention backend test flakiness from uninitialized KV cache memory (#31928 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-08 04:35:25 +00:00
Richard Zou	a79079feef	[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 (#31915 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-01-08 04:04:58 +00:00
Nick Hill	10ef65eded	[BugFix] Fix bad words with speculative decoding (#31908 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-07 15:46:42 -05:00
Kfir Toledo	b89443b8d9	[KVConnector]: Enable Cross-layers KV cache layout for MultiConnector (#30761 ) Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>	2026-01-07 16:59:43 +00:00
weiyu	e7596371a4	[Refactor][TPU] Remove torch_xla path and use tpu-inference (#30808 ) Signed-off-by: Wei-Yu Lin <weiyulin@google.com> Signed-off-by: weiyu <62784299+weiyu0824@users.noreply.github.com>	2026-01-07 16:07:16 +08:00
Benjamin Chislett	f7008ce1c4	[Perf] Async Scheduling + Speculative Decoding + Structured Outputs (#29821 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-06 18:50:37 +00:00
Lucas Wilkinson	e0327c9db2	[Attention][1/n] Remove usage of deprecated `seq_lens_cpu` and `num_computed_tokens_cpu` CommonAttentionMetadata properties (#31773 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-06 04:05:17 -08:00
John Calderon	2f4e6548ef	[Bugfix] vLLM produces invalid UTF-8 tokens and “�” (#28874 ) Signed-off-by: John Calderon <jcalderon@nvidia.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2026-01-06 00:23:00 +00:00
Wentao Ye	af9a7ec255	[Bug] Revert torch warning fix (#31585 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-05 22:31:21 +00:00
Nick Hill	32f4e4db00	[Cleanup] Remove deprecated fields from CachedRequestData class (#31734 ) Signed-off-by: njhill <nickhill123@gmail.com>	2026-01-05 21:07:14 +00:00
Or Ozeri	d8e38d4939	Triton Attention: Support cross-layers blocks (#30687 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-05 19:29:16 +00:00
Isotr0py	6aa5b18e1d	[v1] Add encoder-only/cross attention support to Triton Attention backend (#31406 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-06 00:00:23 +08:00
wangxiyuan	bb4337b34c	[Platform] Deprecate seed_everything (#31659 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-01-04 18:34:04 -08:00
Xingyu Liu	0eee877f67	[Core] Parse vLLM engine required fields from hf_config to model_arch_config (#28454 ) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com> Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>	2026-01-02 15:13:15 -08:00
Nick Hill	bd877162eb	[BugFix] Support online dense model DP without overhead (#30739 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: njhill <nickhill123@gmail.com>	2026-01-02 23:36:38 +08:00
Nicolò Lucchesi	ab1af6aa3e	[CI][NIXL] Split DPEP tests (#31491 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-30 07:26:12 -05:00
Sage	39512aba72	[Prefix Cache] Include lora_name in BlockStored event for deterministic KV-cache reconstruction (#27577 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Co-authored-by: Sage <80211083+sagiahrac@users.noreply.github.com>	2025-12-30 00:17:16 +00:00
Alexei-V-Ivanov-AMD	d63b969675	[CI/ROCm] Fixing "V1 Test attention (H100)" test group. (#31187 ) Signed-off-by: DCCS-4560 <alivanov@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com> Signed-off-by: <> Co-authored-by: DCCS-4560 <alivanov@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com> Co-authored-by: root <root@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com>	2025-12-29 16:53:59 -05:00
Yifan Qiao	52bf066516	[Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector (#30166 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Co-authored-by: KuntaiDu <kuntai@uchicago.edu>	2025-12-26 18:25:46 -08:00
Kunshang Ji	5326c89803	[XPU][CI]skip test_preprocess_error_handling due to fork/spawn issue (#31381 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-12-26 21:40:44 +00:00
Nick Hill	81786c8774	[BugFix] Fix async scheduling + reasoning with struct output (#31332 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2025-12-25 23:01:02 +00:00
Richard Zou	254f6b9867	[Bugfix] Fix eagle dp tests on A100 (#31241 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-12-25 00:05:04 +00:00
Cyrus Leung	aa3868ecfe	[Chore] Remove unused `noqa`s (#31263 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-24 05:38:46 -08:00
Michael Goin	8ee90c83f8	Add `--max-model-len auto` to auto-fit context to available memory (#29431 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-23 21:37:14 -08:00
Chen Zhang	538e830caa	[KVEvent] User request.block_hash for parent block_hash (#30544 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Co-authored-by: Yifan Qiao <yifanqiao@berkeley.edu>	2025-12-23 18:23:43 -08:00
Mark McLoughlin	f790068600	[Core] Add a random suffix to frontend-provided request IDs (#27987 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-23 13:05:39 -08:00
Cyrus Leung	bb62dda2c3	[Misc] Introduce `encode_*_url` utility function (#31208 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-23 13:45:21 +00:00
Pavani Majety	3e10262356	Revert "[SM100] Enable fp8 compute for prefill MLA (#30746 )" (#31197 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-12-22 18:15:33 -08:00
Divakar Verma	78e5e62bbf	[AMD][CI] fix v1/engine test_preprocess_error_handling (#31192 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-23 01:28:19 +00:00

1 2 3 4 5 ...

891 Commits