biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
haosdent	d6e04f4c43	[Bugfix] Cap FULL decode cudagraph sizes for Mamba/hybrid models (#34094 ) (#34571 ) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>	2026-03-04 11:56:22 +01:00
Kunshang Ji	16d2ad1d38	[Hardware] Replace `torch.cuda.empty_cache` with `torch.accelerator.empty_cache` (#30681 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 09:49:47 +00:00
ojhaanshika	e05cb3b93e	TRTLLM gen-full attn Test Coverage (#34986 ) Signed-off-by: Anshika Ojha <anshikao@nvidia.com> Co-authored-by: Anshika Ojha <anshikao@gb-nvl-059-compute09.nvidia.com>	2026-03-03 11:35:34 -05:00
Lucas Wilkinson	28ef9ba399	[BugFix] Add support for MTP num_speculative_tokens > 1 with sparse MLA (#34552 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-03 07:21:57 -08:00
aykoppol	25e02647c2	[Core] Add optional flags to check for repetitive token patterns in engine output (#35451 ) Signed-off-by: aykoppol <aykoppol+git@gmail.com>	2026-03-03 12:23:25 +08:00
Fynn Schmitt-Ulms	9433acb8df	[Spec Decode] Add hidden states extraction system (#33736 ) Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>	2026-03-02 14:29:09 -05:00
Andreas Karatzas	ec27b36b4b	[CI] Defining extended V1 e2e + engine tests (#35580 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-02 08:10:54 +00:00
Ryan Rock	87d319c52f	[AMD][CI] Support Triton attention with ExampleConnector (#34931 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2026-03-01 09:58:07 +02:00
Itay Alroy	dea268336f	[1/N] Elastic EP Milestone 2 (#34861 ) Signed-off-by: Yongji Wu <wuyongji317@gmail.com> Signed-off-by: Itay Alroy <ialroy@nvidia.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-02-28 04:46:42 +00:00
Lucas Wilkinson	1d532f9d8f	[DP] Only use DP padding when cudagraphs are actually used (#34102 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-27 15:14:31 -05:00
Huamin Li	157722da75	[perf] Use pinned memory for async H2D transfer in do_mamba_copy_block (#35480 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2026-02-28 01:50:37 +08:00
Nicolò Lucchesi	cabdaa7619	[Misc] Move `GPUModelRunner.prepare_kernel_block_sizes` to utils (#35400 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-27 11:42:51 +08:00
Yiliu Dong	d940607629	[Core] Support `min_tokens` with speculative decoding (#32642 ) Signed-off-by: qianlihuang <yiliu.dong@qq.com> Co-authored-by: qianlihuang <yiliu.dong@qq.com>	2026-02-26 12:31:28 -05:00
Andreas Karatzas	9571e99945	[ROCm][CI] Extending attention backend coverage for Eagle spec decode tests (#35265 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-25 14:16:18 -08:00
Benjamin Chislett	ee59a7c615	[Tests] Add GSM8k check to SpecDec E2E tests (#34772 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-02-25 07:51:14 -05:00
Chen Zhang	8fae54faff	[Linear Attention] fix bug for linear attention + prefix caching + reset_prefix_cache (#35157 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2026-02-24 22:00:19 -08:00
Harry Mellor	f7967577f5	Remove requirement to use `--hf-overrides` for `DeepseekVLV2ForCausalLM` (#35203 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-24 22:00:06 -08:00
Nick Hill	dbf0da817a	[Core] Cleanup engine pause/sleep logic (#34528 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-24 19:33:34 -08:00
Andreas Karatzas	067c5d9ad1	[ROCm][CI] Added MI325 mirrors (#34923 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-24 13:37:15 -08:00
Benjamin Chislett	f5972a872f	[Model][Spec Decode] Nemotron-H MTP and Mamba Speculative Decoding Support (#33726 ) Signed-off-by: Shahar Mor <smor@nvidia.com> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Shahar Mor <smor@nvidia.com> Co-authored-by: Roi Koren <roik@nvidia.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-24 09:49:56 -08:00
Rohan Potdar	2ff4e51152	[ROCm] AITER fused RoPE+KVCache (#33443 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: charlifu <charlifu@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>	2026-02-23 19:06:00 -08:00
haosdent	a2ba6a5244	[Bugfix] Fix prefix caching for Mamba 'all' mode (Nemotron models) (#34874 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-02-23 17:31:51 +01:00
Andreas Karatzas	5f68464f92	[ROCm][CI] Fix spec decode profile assertion and logprob test determinism (#35043 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-23 05:05:54 -08:00
Nicolò Lucchesi	ab6f3487a6	[PD] Change kv_load_failure_policy Default from "recompute" to "fail" (#34896 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-21 01:34:57 -08:00
Andreas Karatzas	54254f7a61	[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree attention backends (#34599 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-20 20:25:23 -08:00
zhongdaor-nv	a0fe7ea2f0	[feat] Add per-block extra_keys to KV events (#33304 ) Signed-off-by: zhongdaor-nv <zhongdaor@nvidia.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-20 20:11:40 -08:00
Xin Yang	7a5adad480	[Kernel] Optimize sample_recovered_tokens_kernel (#34974 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-02-20 19:59:06 -08:00
Lucas Wilkinson	aaefc58ee0	[CI] Revert PRs 34818 and 33600 (#34979 )	2026-02-20 13:25:50 -08:00
Wei Zhao	f24b2de3d3	[Test] Add FP8 KV Cache Testing for MLA Backends (#34473 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-02-20 18:51:58 +00:00
rasmith	0c1dc42748	[CI][AMD][BugFix][P/D] Add default_vllm_config to test_moriio_connector.py so tests pass (#33739 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-02-19 21:32:40 -08:00
Matthew Bonanni	662205d34e	[Bugfix] Fix Basic Models Test (#34818 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-19 14:49:07 -08:00
Kyle Sayers	64ac1395e8	[Docs] Clean up speculators docs (#34065 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2026-02-18 13:48:11 -08:00
Jongseok Park	c656ba3b4d	[Kernel] Triton-based Top-k and Top-p sampler kernels (#33538 ) Signed-off-by: js_park <cakeng@naver.com> Signed-off-by: Jongseok Park <37990712+cakeng@users.noreply.github.com> Signed-off-by: Sunga Kim <sunga.kim@berkeley.edu> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Sunga Kim <sunga.kim@berkeley.edu> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-02-17 23:14:30 +00:00
Cyrus Leung	574fe75245	[Renderer] Move InputPreprocessor into Renderer (2/2) (#34560 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-17 05:29:01 -08:00
junuxyz	c61a98f529	[CI][BugFix] ShellCheck cleanup to remove baseline and preserve runtime behavior (#34514 ) Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>	2026-02-17 12:22:56 +00:00
Ekagra Ranjan	cd81cdb399	[Scheduler][ASR] Fix CrossAttn blocks per-request for Variable length encoder inputs (#31058 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-02-16 11:08:44 +00:00
Thomas Parnell	d5fe3f702c	[Hybrid] Enable mamba prefix cache "align" mode with async scheduling (#33997 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2026-02-14 13:15:56 -08:00
Cyrus Leung	73391a1baa	[Renderer] Move InputPreprocessor into Renderer (1/2) (#34510 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-14 10:14:21 -08:00
Andreas Karatzas	b3c14229b0	[ROCm][CI] Guard sparse MLA backend imports for ROCm compatibility in tests (#34538 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-14 07:32:09 -08:00
Harry Huang	c027541eaf	[Hybrid] Enable spec decoding in mamba cache align mode (#33705 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>	2026-02-13 13:02:28 -08:00
Ben Browning	fd267bc7b7	[Bugfix]: Fix structured output in multi-turn gpt-oss (#34454 ) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-13 11:12:48 -08:00
Aaron Hao	dddbff4624	[Core] Move pause and resume functions into engine (#34125 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Signed-off-by: hao-aaron <ahao@anyscale.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-02-13 00:15:10 -08:00
haosdent	4137c5dfa7	[Bug Fix] Fix MambaManager.cache_blocks() crash on null blocks in align mode (#34418 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-02-13 00:13:22 -08:00
Cyrus Leung	ea5ff3a1f6	[Refactor] Simplify BOS/EOS token handling (#34435 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 18:18:24 -08:00
Matthew Bonanni	f2c47886fd	[Attention] Add FlashInfer Sparse MLA backend (#33451 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2026-02-12 17:21:54 +00:00
Cyrus Leung	fb455ed547	[V0 Deprecation] Remove code related to per-request logits processors (#34400 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 20:44:28 +08:00
Cyrus Leung	b96f7314b4	[Refactor] Pass Renderer to Input Processor (#34329 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-11 19:38:11 -08:00
Lucas Wilkinson	c7914d30f9	Reapply [Attention][FA3] Update FA3 to include new swizzle optimization (#34043 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-11 07:07:56 -08:00
Cyrus Leung	b5dcb372e4	[Misc] Clean up validation logic in input processor (#34144 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-10 19:29:29 -08:00
Pavani Majety	578977bb5e	[SM100] Resubmit FMHA FP8 prefill for MLA (#31195 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2026-02-10 16:18:43 -05:00

1 2 3 4 5 ...

1016 Commits