biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Harry Mellor	e2090bf3af	[CI] Fix startup error test (#36230 ) A change in engine startup error messages in #35478 caused this test failure. Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-06 11:50:28 +00:00
Nicolò Lucchesi	5b3ba94ab4	[Core][KVConnector] Support HMA+NixlConnector (#35758 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-06 08:51:21 +01:00
zhanqiuhu	90f3c01fa4	[Spec Decode][KV Connector] Fix KV transfer in PD + speculative decoding (#35158 ) Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Zhanqiu Hu <zh338@cornell.edu> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-06 08:50:44 +01:00
cong-or	57c84ff129	perf: add __slots__ to KVCacheBlock (#36164 ) Signed-off-by: cong-or <conchubhar.gannon@gmail.com>	2026-03-05 22:04:09 -08:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
Ning Xie	176c799f4c	[openai api] log exception in exception handler (1/N) (#31164 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-03-05 16:00:12 +00:00
Or Ozeri	612e7729c2	[KVConnector] Scheduler: Fix num_computed_tokens after async KV load (#34616 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-05 14:25:15 +00:00
Benjamin Chislett	57c629e9c1	[Bugfix] Fix block_size for hybrid model MTP (#36036 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-05 06:10:54 +00:00
Qi Wang	6aa6ad8992	[BugFix] Fix implicit and incorrect assumption on ECConnector is_producer (#34783 ) Signed-off-by: Qi Wang <qiwa@nvidia.com>	2026-03-04 15:01:30 +01:00
Ronen Schaffer	bb6888b8b1	[Bugfix][CPUOffloadingManager] Prevent eviction of already-stored blocks in LRU/ARC `prepare_store()` (#35846 ) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>	2026-03-04 14:25:33 +02:00
haosdent	d6e04f4c43	[Bugfix] Cap FULL decode cudagraph sizes for Mamba/hybrid models (#34094 ) (#34571 ) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>	2026-03-04 11:56:22 +01:00
Kunshang Ji	16d2ad1d38	[Hardware] Replace `torch.cuda.empty_cache` with `torch.accelerator.empty_cache` (#30681 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 09:49:47 +00:00
ojhaanshika	e05cb3b93e	TRTLLM gen-full attn Test Coverage (#34986 ) Signed-off-by: Anshika Ojha <anshikao@nvidia.com> Co-authored-by: Anshika Ojha <anshikao@gb-nvl-059-compute09.nvidia.com>	2026-03-03 11:35:34 -05:00
Lucas Wilkinson	28ef9ba399	[BugFix] Add support for MTP num_speculative_tokens > 1 with sparse MLA (#34552 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-03 07:21:57 -08:00
aykoppol	25e02647c2	[Core] Add optional flags to check for repetitive token patterns in engine output (#35451 ) Signed-off-by: aykoppol <aykoppol+git@gmail.com>	2026-03-03 12:23:25 +08:00
Fynn Schmitt-Ulms	9433acb8df	[Spec Decode] Add hidden states extraction system (#33736 ) Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>	2026-03-02 14:29:09 -05:00
Andreas Karatzas	ec27b36b4b	[CI] Defining extended V1 e2e + engine tests (#35580 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-02 08:10:54 +00:00
Ryan Rock	87d319c52f	[AMD][CI] Support Triton attention with ExampleConnector (#34931 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2026-03-01 09:58:07 +02:00
Itay Alroy	dea268336f	[1/N] Elastic EP Milestone 2 (#34861 ) Signed-off-by: Yongji Wu <wuyongji317@gmail.com> Signed-off-by: Itay Alroy <ialroy@nvidia.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-02-28 04:46:42 +00:00
Lucas Wilkinson	1d532f9d8f	[DP] Only use DP padding when cudagraphs are actually used (#34102 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-27 15:14:31 -05:00
Huamin Li	157722da75	[perf] Use pinned memory for async H2D transfer in do_mamba_copy_block (#35480 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2026-02-28 01:50:37 +08:00
Nicolò Lucchesi	cabdaa7619	[Misc] Move `GPUModelRunner.prepare_kernel_block_sizes` to utils (#35400 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-27 11:42:51 +08:00
Yiliu Dong	d940607629	[Core] Support `min_tokens` with speculative decoding (#32642 ) Signed-off-by: qianlihuang <yiliu.dong@qq.com> Co-authored-by: qianlihuang <yiliu.dong@qq.com>	2026-02-26 12:31:28 -05:00
Andreas Karatzas	9571e99945	[ROCm][CI] Extending attention backend coverage for Eagle spec decode tests (#35265 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-25 14:16:18 -08:00
Benjamin Chislett	ee59a7c615	[Tests] Add GSM8k check to SpecDec E2E tests (#34772 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-02-25 07:51:14 -05:00
Chen Zhang	8fae54faff	[Linear Attention] fix bug for linear attention + prefix caching + reset_prefix_cache (#35157 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2026-02-24 22:00:19 -08:00
Harry Mellor	f7967577f5	Remove requirement to use `--hf-overrides` for `DeepseekVLV2ForCausalLM` (#35203 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-24 22:00:06 -08:00
Nick Hill	dbf0da817a	[Core] Cleanup engine pause/sleep logic (#34528 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-24 19:33:34 -08:00
Andreas Karatzas	067c5d9ad1	[ROCm][CI] Added MI325 mirrors (#34923 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-24 13:37:15 -08:00
Benjamin Chislett	f5972a872f	[Model][Spec Decode] Nemotron-H MTP and Mamba Speculative Decoding Support (#33726 ) Signed-off-by: Shahar Mor <smor@nvidia.com> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Shahar Mor <smor@nvidia.com> Co-authored-by: Roi Koren <roik@nvidia.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-24 09:49:56 -08:00
Rohan Potdar	2ff4e51152	[ROCm] AITER fused RoPE+KVCache (#33443 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: charlifu <charlifu@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>	2026-02-23 19:06:00 -08:00
haosdent	a2ba6a5244	[Bugfix] Fix prefix caching for Mamba 'all' mode (Nemotron models) (#34874 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-02-23 17:31:51 +01:00
Andreas Karatzas	5f68464f92	[ROCm][CI] Fix spec decode profile assertion and logprob test determinism (#35043 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-23 05:05:54 -08:00
Nicolò Lucchesi	ab6f3487a6	[PD] Change kv_load_failure_policy Default from "recompute" to "fail" (#34896 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-21 01:34:57 -08:00
Andreas Karatzas	54254f7a61	[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree attention backends (#34599 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-20 20:25:23 -08:00
zhongdaor-nv	a0fe7ea2f0	[feat] Add per-block extra_keys to KV events (#33304 ) Signed-off-by: zhongdaor-nv <zhongdaor@nvidia.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-20 20:11:40 -08:00
Xin Yang	7a5adad480	[Kernel] Optimize sample_recovered_tokens_kernel (#34974 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-02-20 19:59:06 -08:00
Lucas Wilkinson	aaefc58ee0	[CI] Revert PRs 34818 and 33600 (#34979 )	2026-02-20 13:25:50 -08:00
Wei Zhao	f24b2de3d3	[Test] Add FP8 KV Cache Testing for MLA Backends (#34473 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-02-20 18:51:58 +00:00
rasmith	0c1dc42748	[CI][AMD][BugFix][P/D] Add default_vllm_config to test_moriio_connector.py so tests pass (#33739 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-02-19 21:32:40 -08:00
Matthew Bonanni	662205d34e	[Bugfix] Fix Basic Models Test (#34818 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-19 14:49:07 -08:00
Kyle Sayers	64ac1395e8	[Docs] Clean up speculators docs (#34065 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2026-02-18 13:48:11 -08:00
Jongseok Park	c656ba3b4d	[Kernel] Triton-based Top-k and Top-p sampler kernels (#33538 ) Signed-off-by: js_park <cakeng@naver.com> Signed-off-by: Jongseok Park <37990712+cakeng@users.noreply.github.com> Signed-off-by: Sunga Kim <sunga.kim@berkeley.edu> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Sunga Kim <sunga.kim@berkeley.edu> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-02-17 23:14:30 +00:00
Cyrus Leung	574fe75245	[Renderer] Move InputPreprocessor into Renderer (2/2) (#34560 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-17 05:29:01 -08:00
junuxyz	c61a98f529	[CI][BugFix] ShellCheck cleanup to remove baseline and preserve runtime behavior (#34514 ) Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>	2026-02-17 12:22:56 +00:00
Ekagra Ranjan	cd81cdb399	[Scheduler][ASR] Fix CrossAttn blocks per-request for Variable length encoder inputs (#31058 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-02-16 11:08:44 +00:00
Thomas Parnell	d5fe3f702c	[Hybrid] Enable mamba prefix cache "align" mode with async scheduling (#33997 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2026-02-14 13:15:56 -08:00
Cyrus Leung	73391a1baa	[Renderer] Move InputPreprocessor into Renderer (1/2) (#34510 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-14 10:14:21 -08:00
Andreas Karatzas	b3c14229b0	[ROCm][CI] Guard sparse MLA backend imports for ROCm compatibility in tests (#34538 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-14 07:32:09 -08:00
Harry Huang	c027541eaf	[Hybrid] Enable spec decoding in mamba cache align mode (#33705 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>	2026-02-13 13:02:28 -08:00

1 2 3 4 5 ...

1026 Commits