biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Woosuk Kwon	467886a0c4	[Model Runner V2] Fix inputs_embeds=None bug for MM models (#35917 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-03 13:47:45 -08:00
bnellnm	a9b8b13e5c	[Bugfix] Fix misnamed parameter in compressed_tensors_moe.py (#35813 ) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-03-03 16:29:57 -05:00
Micah Williamson	e7213003cb	[ROCm][CI] Fix TP size issue for `test_gpt_oss` (#35887 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-03-03 20:57:34 +00:00
Rohan Potdar	3a8eef5869	[ROCm][Bugfix]: Disable AITER Triton ROPE by default (#35601 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-03-03 13:43:56 -06:00
Robert Shaw	97995f6376	[MoE Refactor] Create MK for TRTLLM Kernels (#32564 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2026-03-03 10:39:50 -08:00
Robert Shaw	881a6b011b	[CI] Temporarily Disable Llama4 MoE Refactor Test (#35870 ) Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-03-03 10:36:15 -08:00
Matthew Bonanni	8e1fd5baf0	[CI] Bump `num_speculative_tokens` to 3 in nightly DeepSeek tests (#35882 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-03 09:26:44 -08:00
JasonCohere	ae88468bcc	fix: Ensure invalid audio files return 400 error (#34715 ) Signed-off-by: Jason Ozuzu <jasonozuzu@cohere.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-03 08:47:39 -08:00
ojhaanshika	e05cb3b93e	TRTLLM gen-full attn Test Coverage (#34986 ) Signed-off-by: Anshika Ojha <anshikao@nvidia.com> Co-authored-by: Anshika Ojha <anshikao@gb-nvl-059-compute09.nvidia.com>	2026-03-03 11:35:34 -05:00
Lucas Wilkinson	28ef9ba399	[BugFix] Add support for MTP num_speculative_tokens > 1 with sparse MLA (#34552 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-03 07:21:57 -08:00
TJian	fb7fdc49c4	[ROCm] [CI] Add new fusion test cases that are relevant to vLLM IR Ops (#34307 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-03-03 06:24:21 -08:00
wang.yuqi	ea463978bb	[Frontend][1/n] Improve pooling entrypoints \| classify. (#35604 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-03-03 06:05:36 -08:00
Li, Jiang	440f0e7dc6	[Bugfix] Avoid src/dst as None in irecv/isend_tensor_dict (#35754 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-03-03 05:56:08 -08:00
wang.yuqi	fd4a90f337	[CI] And PPL test for Qwen3.5. (#35853 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-03 13:15:51 +00:00
Thomas Parnell	ad9d09e2b8	[Perf] [Hybrid] Copy num_accepted_tokens in non-blocking way when not using prefix caching (#35442 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2026-03-03 04:15:43 -08:00
Szymon Reginis	4beebfd146	[CI/Build][Intel] Add new performance benchmarks for Intel Gaudi 3 (#31025 ) Signed-off-by: Szymon Reginis <sreginis@habana.ai> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-03 19:48:24 +08:00
hallerite	b8401cde0e	add regression test (#35834 ) Signed-off-by: hallerite <git@hallerite.com>	2026-03-03 07:32:15 +00:00
TJian	5dfc5abe94	[ROCm] [Release] Change the package from `aiter` to `amd-aiter` (#35198 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2026-03-02 23:13:39 -08:00
lin-shh	8fa68a8ce4	Fix TYPE_CHECKING stub defaults in envs.py to match actual runtime defaults (#35645 )	2026-03-02 21:59:43 -08:00
lin-shh	35a6f0bfe2	[Misc] Fix typos in comments: explict→explicit, paramaters→parameters (#35648 )	2026-03-02 21:59:14 -08:00
Taneem Ibrahim	3a6cbf16e2	[MISC] Removed unused function find_all_indices() from tool_parsers/utils.py (#35683 ) Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>	2026-03-03 13:58:42 +08:00
Lucas Wilkinson	f44d1ddc8c	[BugFix] Fix cmake based incremental install (wrong vllm install dir) (#35773 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-03-02 21:58:16 -08:00
Cyrus Leung	48a54c1e0d	[CI/Build] Trigger processor tests on registry update (#35824 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-03 13:55:57 +08:00
Micah Williamson	8b9e8b7454	[ROCm][CI] Fix Assertion Logic For `test_gpt_oss` (#35806 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-03-03 05:08:04 +00:00
Wentao Ye	c21d0039ec	[Refactor] Fix maxsim cuda platform and add cli to control it (#35427 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-03-03 12:48:31 +08:00
Isotr0py	7d8bbe6f42	[CI/Build] Automatically patch video metadata for multimodal processor test (#35822 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-03 04:27:45 +00:00
aykoppol	25e02647c2	[Core] Add optional flags to check for repetitive token patterns in engine output (#35451 ) Signed-off-by: aykoppol <aykoppol+git@gmail.com>	2026-03-03 12:23:25 +08:00
Woosuk Kwon	a0a5178ab4	[Model Runner V2] Use ModelState.prepare_attn() for cuda graph capture [5/N] (#35774 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-02 20:06:27 -08:00
Isotr0py	8ea8ba275e	[V0 deprecation] Remove Swin model (#35821 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-02 20:03:41 -08:00
Woosuk Kwon	4f85bae9d6	[Docs][Model Runner V2] Add Design Docs (#35819 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-02 19:58:14 -08:00
Andy Lo	0a7165fd71	[ModelRunnerV2] Rename sampler functions and variables for clarity (#35459 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2026-03-02 19:48:56 -08:00
Robert Shaw	6521ccf286	[CI] Temporarily Disable Nightly Failures (#35770 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-03-03 01:49:13 +00:00
Martin Vit	8ebd872f50	[Tool Parser] Fix Qwen3Coder streaming parameter loss with speculative decode (#35615 ) Signed-off-by: Martin Vit <martin@voipmonitor.org> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 09:40:37 +08:00
zhrrr	168ee03e1c	[Model Runner V2][Perf] align dummy_run tokens to uniform decode for dp cudagraph (#35376 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>	2026-03-02 17:10:47 -08:00
liuzhenwei	9dd656f0ea	[XPU][NIXL] Add GPUDirect RDMA support for XPU (#35270 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-03 08:42:49 +08:00
Jakub Zakrzewski	c8b678e53e	[Model] Add support for nvidia/llama-nemotron-rerank-vl-1b-v2 (#35735 ) Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>	2026-03-03 08:32:14 +08:00
Andreas Karatzas	18c29c746b	[ROCm][CI] Fix backslash-continuation in pytest marker re-quoting and treat exit code 5 as success (#35798 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-02 16:07:51 -08:00
Hanjie Qiu	96fc09503a	[All Reduce] Change default backend of Flashinfer All Reduce to trtllm (#35793 ) Signed-off-by: hjjq <hanjieq@nvidia.com>	2026-03-02 18:57:38 -05:00
Roger Wang	1b82b433fc	[Bugfix] Fix MM processor test for Qwen3.5 (#35797 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-03-02 23:05:08 +00:00
Robert Shaw	9319044ee9	[MoE][Perf] Wrap DSV3 QKVAProj GEMM in custom op for torch.compile (#35751 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-03-02 23:03:49 +00:00
Boyuan Feng	c42dc402c1	clean unused cudagraph_batch_sizes (#35552 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2026-03-02 22:00:16 +00:00
Ye (Charlotte) Qi	fa6a6be519	[Bugfix] Fix missing sequence_lengths in qwen3_omni_moe_thinker (#35741 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2026-03-02 21:11:56 +00:00
Aaron Hao	cad21918e3	[BUG] Fix rlhf_async example (#35788 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2026-03-02 20:36:40 +00:00
Jeffrey Wang	53700bf49b	[ci] Add Ray compatibility check informational CI job (#34672 ) Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>	2026-03-02 12:06:16 -08:00
Yashwant Bezawada	a13d8c03c9	[KVConnector] Auto-downgrade to PIECEWISE cudagraph mode for layerwise async ops (#31057 ) Signed-off-by: Yashwant Bezawada <yashwant_b@me.com>	2026-03-02 15:04:47 -05:00
Fynn Schmitt-Ulms	9433acb8df	[Spec Decode] Add hidden states extraction system (#33736 ) Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>	2026-03-02 14:29:09 -05:00
Richard Zou	d1a6e96d9e	[torch.compile] Improve cold and warm start compile tests (#35709 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-02 19:27:06 +00:00
CSWYF3634076	2a9e3347e9	[BugFix][Model]Fix the garbled code in Ernie4.5-VL caused by fast_moe_cold_start (#35587 ) Signed-off-by: wangyafeng <wangyafeng@baidu.com>	2026-03-02 18:56:33 +00:00
Isotr0py	cc0d565f40	[CI/Build] Enable Qwen3.5 tests on CI (#35763 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-02 17:43:53 +00:00
Patryk Wolsza	358e4d5ba7	[CI][HPU] Pin vllm commit compatible with vllm-gaudi - HPU tests (#35307 ) Signed-off-by: PatrykWo <patryk.wolsza@intel.com>	2026-03-02 17:02:26 +00:00

1 2 3 4 5 ...

14427 Commits