biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
inkcherry	4505849b30	[ROCm][PD] add moriio kv connector. (#29304 ) Signed-off-by: inkcherry <mingzhi.liu@amd.com>	2026-01-09 14:01:57 +00:00
Andreas Karatzas	e02706d2d2	[ROCm][CI][V1] Fix `nixl_connector` test failure and achieve CUDA parity in `test_async_scheduling` (#32000 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-09 20:48:32 +08:00
Nick Hill	29ce48221c	[Cleanup] Remove obsolete spec decoding compatibility logic (#32003 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 05:44:18 +00:00
zhrrr	8ff4a99566	[Async][Feat] support apply penalty or bad_words for async + spec (#30495 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com> Signed-off-by: izhuhaoran <izhuhaoran@qq.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 02:31:50 +00:00
Nick Hill	11cec296dd	[BugFix] Add spec-decode-incompatible request param validation (#31982 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 00:08:21 +00:00
Lucas Wilkinson	6cdf015c3c	[Misc] Fix `Current vLLM config is not set.` warnings, assert to avoid issues in the future (#31747 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-08 15:20:49 -08:00
Ryan Rock	8cbdc7eb94	[CI/Build] Enable test_kv_cache_events_dp for AMD (#31834 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2026-01-08 09:00:24 +00:00
Lumosis	b634e619bb	Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. (#31635 ) Signed-off-by: Lihao Ran <imlihao.ran@gmail.com> Signed-off-by: Lumosis <30372757+Lumosis@users.noreply.github.com>	2026-01-08 09:00:07 +00:00
Andreas Karatzas	5f2a473ff3	[ROCm][CI] v1 cpu offloading attention backend fix (#31833 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-08 14:37:50 +08:00
Andreas Karatzas	087a138963	[ROCm][CI] Fix attention backend test flakiness from uninitialized KV cache memory (#31928 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-08 04:35:25 +00:00
Richard Zou	a79079feef	[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 (#31915 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-01-08 04:04:58 +00:00
Nick Hill	10ef65eded	[BugFix] Fix bad words with speculative decoding (#31908 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-07 15:46:42 -05:00
Kfir Toledo	b89443b8d9	[KVConnector]: Enable Cross-layers KV cache layout for MultiConnector (#30761 ) Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>	2026-01-07 16:59:43 +00:00
weiyu	e7596371a4	[Refactor][TPU] Remove torch_xla path and use tpu-inference (#30808 ) Signed-off-by: Wei-Yu Lin <weiyulin@google.com> Signed-off-by: weiyu <62784299+weiyu0824@users.noreply.github.com>	2026-01-07 16:07:16 +08:00
Benjamin Chislett	f7008ce1c4	[Perf] Async Scheduling + Speculative Decoding + Structured Outputs (#29821 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-06 18:50:37 +00:00
Lucas Wilkinson	e0327c9db2	[Attention][1/n] Remove usage of deprecated `seq_lens_cpu` and `num_computed_tokens_cpu` CommonAttentionMetadata properties (#31773 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-06 04:05:17 -08:00
John Calderon	2f4e6548ef	[Bugfix] vLLM produces invalid UTF-8 tokens and “�” (#28874 ) Signed-off-by: John Calderon <jcalderon@nvidia.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2026-01-06 00:23:00 +00:00
Wentao Ye	af9a7ec255	[Bug] Revert torch warning fix (#31585 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-05 22:31:21 +00:00
Nick Hill	32f4e4db00	[Cleanup] Remove deprecated fields from CachedRequestData class (#31734 ) Signed-off-by: njhill <nickhill123@gmail.com>	2026-01-05 21:07:14 +00:00
Or Ozeri	d8e38d4939	Triton Attention: Support cross-layers blocks (#30687 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-05 19:29:16 +00:00
Isotr0py	6aa5b18e1d	[v1] Add encoder-only/cross attention support to Triton Attention backend (#31406 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-06 00:00:23 +08:00
wangxiyuan	bb4337b34c	[Platform] Deprecate seed_everything (#31659 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-01-04 18:34:04 -08:00
Xingyu Liu	0eee877f67	[Core] Parse vLLM engine required fields from hf_config to model_arch_config (#28454 ) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com> Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>	2026-01-02 15:13:15 -08:00
Nick Hill	bd877162eb	[BugFix] Support online dense model DP without overhead (#30739 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: njhill <nickhill123@gmail.com>	2026-01-02 23:36:38 +08:00
Nicolò Lucchesi	ab1af6aa3e	[CI][NIXL] Split DPEP tests (#31491 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-30 07:26:12 -05:00
Sage	39512aba72	[Prefix Cache] Include lora_name in BlockStored event for deterministic KV-cache reconstruction (#27577 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Co-authored-by: Sage <80211083+sagiahrac@users.noreply.github.com>	2025-12-30 00:17:16 +00:00
Alexei-V-Ivanov-AMD	d63b969675	[CI/ROCm] Fixing "V1 Test attention (H100)" test group. (#31187 ) Signed-off-by: DCCS-4560 <alivanov@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com> Signed-off-by: <> Co-authored-by: DCCS-4560 <alivanov@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com> Co-authored-by: root <root@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com>	2025-12-29 16:53:59 -05:00
Yifan Qiao	52bf066516	[Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector (#30166 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Co-authored-by: KuntaiDu <kuntai@uchicago.edu>	2025-12-26 18:25:46 -08:00
Kunshang Ji	5326c89803	[XPU][CI]skip test_preprocess_error_handling due to fork/spawn issue (#31381 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-12-26 21:40:44 +00:00
Nick Hill	81786c8774	[BugFix] Fix async scheduling + reasoning with struct output (#31332 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2025-12-25 23:01:02 +00:00
Richard Zou	254f6b9867	[Bugfix] Fix eagle dp tests on A100 (#31241 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-12-25 00:05:04 +00:00
Cyrus Leung	aa3868ecfe	[Chore] Remove unused `noqa`s (#31263 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-24 05:38:46 -08:00
Michael Goin	8ee90c83f8	Add `--max-model-len auto` to auto-fit context to available memory (#29431 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-23 21:37:14 -08:00
Chen Zhang	538e830caa	[KVEvent] User request.block_hash for parent block_hash (#30544 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Co-authored-by: Yifan Qiao <yifanqiao@berkeley.edu>	2025-12-23 18:23:43 -08:00
Mark McLoughlin	f790068600	[Core] Add a random suffix to frontend-provided request IDs (#27987 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-23 13:05:39 -08:00
Cyrus Leung	bb62dda2c3	[Misc] Introduce `encode_*_url` utility function (#31208 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-23 13:45:21 +00:00
Pavani Majety	3e10262356	Revert "[SM100] Enable fp8 compute for prefill MLA (#30746 )" (#31197 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-12-22 18:15:33 -08:00
Divakar Verma	78e5e62bbf	[AMD][CI] fix v1/engine test_preprocess_error_handling (#31192 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-23 01:28:19 +00:00
Lucas Wilkinson	de71747655	[SpecDecode] Simplified alternative padded-speculation acceptance rate fix (#29845 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-22 13:06:10 -08:00
Pavani Majety	b10f41c894	[SM100] Enable fp8 compute for prefill MLA (#30746 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-12-22 19:15:57 +00:00
Seiji Eicher	1ab5213531	Make engine core client handshake timeout configurable (#27444 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-12-19 20:38:30 +00:00
Nick Hill	2ac85a4544	[BugFix] Fix logprobs with spec decode and modified logits (#30846 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-18 19:58:28 -08:00
Nick Hill	45c0526ac9	[BugFix] Handle errors when preprocessing added requests (#30895 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-19 01:29:11 +00:00
Elizabeth Thomas	41b6f9200f	Remove all2all backend envvar (#30363 ) Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-18 19:46:28 +00:00
Isotr0py	700a5ad6c6	[MM Encoder]: Migrate legacy ViT `MultiHeadAttention` to new `MMEncoderAttention` interface (#30684 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-19 02:04:19 +08:00
inkcherry	500f26e6d3	[Bugfix] fix DP-aware routing in OpenAI API requests (#29002 ) Signed-off-by: inkcherry <mingzhi.liu@amd.com>	2025-12-18 09:50:42 -08:00
Nicolò Lucchesi	bc3700e0cd	[NIXL] Support P tensor-parallel-size > D tensor-parallel-size (#27274 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-18 11:53:30 +08:00
Micah Williamson	fd8afdf38d	[ROCm][CI] Reduce Flakiness For test_async_scheduling Using ROCM_ATTN With FP32 (#30811 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-18 10:27:37 +08:00
SungMinCho	a0b782f9cc	[Metrics] Model FLOPs Utilization estimation (#30738 ) Signed-off-by: SungMinCho <tjdals4565@gmail.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-12-18 01:40:51 +00:00
Matthew Bonanni	7eb6cb6c18	[Attention] Update tests to remove deprecated env vars (#30563 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-17 09:49:59 -08:00

1 2 3 4 5 ...

879 Commits