Avery Miao
|
b7a423cb01
|
[BUGFIX]Fix Qwen-Omni models audio max_token_per_item estimation error leading to encoder_cache_size is 0 (#35994)
Signed-off-by: Miao, Avery <avery.miao@intel.com>
(cherry picked from commit e998fa76b9)
|
2026-03-06 13:03:40 -08:00 |
|
Cyrus Leung
|
fa78ec8a72
|
[Bugfix] Fix Qwen-VL tokenizer implementation (#36140)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
(cherry picked from commit 7196348157)
|
2026-03-06 13:03:26 -08:00 |
|
Kunshang Ji
|
9a474ce7a4
|
[XPU] bump vllm-xpu-kernels to v0.1.3 (#35984)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
(cherry picked from commit a8f66cbde8)
|
2026-03-06 13:03:05 -08:00 |
|
lailoo
|
097eb544e9
|
[Bugfix] Improve engine ready timeout error message (#35616)
Signed-off-by: damaozi <1811866786@qq.com>
v0.17.0rc0
|
2026-03-04 05:54:32 +00:00 |
|
ShiJie Zhong
|
7cdba98edf
|
[BugFix] Support tool_choice=none in the Anthropic API (#35835)
Signed-off-by: ZhongsJie <zhongsjie@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-03-04 05:24:46 +00:00 |
|
Charlie Fu
|
3c85cd9d74
|
[Rocm][CI] Fix ROCm LM Eval Large Models (8 Card) (#35913)
Signed-off-by: charlifu <charlifu@amd.com>
|
2026-03-04 04:50:13 +00:00 |
|
Andreas Karatzas
|
edba15045a
|
[Bugfix] Guard mm_token_type_ids kwarg in get_mrope_input_positions (#35711)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-04 04:12:51 +00:00 |
|
Cyrus Leung
|
e379396167
|
[Refactor] Clean up processor kwargs extraction (#35872)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-03 19:53:53 -08:00 |
|
Isotr0py
|
6e9f21e8a2
|
[Chore] Remove debug code in model implementation (#35883)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-03 19:50:58 -08:00 |
|
AllenDou
|
c1d963403c
|
[model] support FireRedASR2 (#35727)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-03 19:41:30 -08:00 |
|
Shanshan Shen
|
77e6dcbbfa
|
[PluggableLayer][MM] Add PluggableLayer for RelPosAttention (#33753)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2026-03-03 19:41:27 -08:00 |
|
William Zhang
|
70c73df69e
|
[Bugfix] Fix EVS implementation for Qwen3 VL (#33607)
Signed-off-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com>
|
2026-03-04 02:18:11 +00:00 |
|
xjx
|
9a9d442464
|
Enable bnb for multiple indices weight (#35838)
Signed-off-by: xjx <493337577@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-04 01:46:47 +00:00 |
|
Andreas Karatzas
|
f7da9cdffc
|
[ROCm][CI] Support async weight transfer example with platform-aware determinism (#35710)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-04 09:44:14 +08:00 |
|
Jaewon
|
f22ff2958c
|
[Bugfix] Fix coord_socket assertion in DPEngineCoreProc for offline DP mode (#35916)
Signed-off-by: Jaewon Lee <jaewon@meta.com>
|
2026-03-04 00:10:11 +00:00 |
|
Nick Hill
|
d15c3b90fc
|
[Core] Move save_tensorized_model logic to Worker (#35825)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-03 15:31:59 -08:00 |
|
zhrrr
|
97286a20ed
|
[Model Runner V2] support dp & ep for spec decoding (#35294)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-03-03 15:19:45 -08:00 |
|
Amr Mahdi
|
12b38c0f45
|
[CI/Build] Allow mounting AWS credentials for sccache S3 auth (#35912)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
|
2026-03-03 14:30:47 -08:00 |
|
Woosuk Kwon
|
467886a0c4
|
[Model Runner V2] Fix inputs_embeds=None bug for MM models (#35917)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-03 13:47:45 -08:00 |
|
bnellnm
|
a9b8b13e5c
|
[Bugfix] Fix misnamed parameter in compressed_tensors_moe.py (#35813)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-03-03 16:29:57 -05:00 |
|
Micah Williamson
|
e7213003cb
|
[ROCm][CI] Fix TP size issue for test_gpt_oss (#35887)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-03-03 20:57:34 +00:00 |
|
Rohan Potdar
|
3a8eef5869
|
[ROCm][Bugfix]: Disable AITER Triton ROPE by default (#35601)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-03-03 13:43:56 -06:00 |
|
Robert Shaw
|
97995f6376
|
[MoE Refactor] Create MK for TRTLLM Kernels (#32564)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2026-03-03 10:39:50 -08:00 |
|
Robert Shaw
|
881a6b011b
|
[CI] Temporarily Disable Llama4 MoE Refactor Test (#35870)
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-03-03 10:36:15 -08:00 |
|
Matthew Bonanni
|
8e1fd5baf0
|
[CI] Bump num_speculative_tokens to 3 in nightly DeepSeek tests (#35882)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-03 09:26:44 -08:00 |
|
JasonCohere
|
ae88468bcc
|
fix: Ensure invalid audio files return 400 error (#34715)
Signed-off-by: Jason Ozuzu <jasonozuzu@cohere.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-03 08:47:39 -08:00 |
|
ojhaanshika
|
e05cb3b93e
|
TRTLLM gen-full attn Test Coverage (#34986)
Signed-off-by: Anshika Ojha <anshikao@nvidia.com>
Co-authored-by: Anshika Ojha <anshikao@gb-nvl-059-compute09.nvidia.com>
|
2026-03-03 11:35:34 -05:00 |
|
Lucas Wilkinson
|
28ef9ba399
|
[BugFix] Add support for MTP num_speculative_tokens > 1 with sparse MLA (#34552)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-03 07:21:57 -08:00 |
|
TJian
|
fb7fdc49c4
|
[ROCm] [CI] Add new fusion test cases that are relevant to vLLM IR Ops (#34307)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-03-03 06:24:21 -08:00 |
|
wang.yuqi
|
ea463978bb
|
[Frontend][1/n] Improve pooling entrypoints | classify. (#35604)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-03-03 06:05:36 -08:00 |
|
Li, Jiang
|
440f0e7dc6
|
[Bugfix] Avoid src/dst as None in irecv/isend_tensor_dict (#35754)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-03 05:56:08 -08:00 |
|
wang.yuqi
|
fd4a90f337
|
[CI] And PPL test for Qwen3.5. (#35853)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-03 13:15:51 +00:00 |
|
Thomas Parnell
|
ad9d09e2b8
|
[Perf] [Hybrid] Copy num_accepted_tokens in non-blocking way when not using prefix caching (#35442)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2026-03-03 04:15:43 -08:00 |
|
Szymon Reginis
|
4beebfd146
|
[CI/Build][Intel] Add new performance benchmarks for Intel Gaudi 3 (#31025)
Signed-off-by: Szymon Reginis <sreginis@habana.ai>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-03 19:48:24 +08:00 |
|
hallerite
|
b8401cde0e
|
add regression test (#35834)
Signed-off-by: hallerite <git@hallerite.com>
|
2026-03-03 07:32:15 +00:00 |
|
TJian
|
5dfc5abe94
|
[ROCm] [Release] Change the package from aiter to amd-aiter (#35198)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-03-02 23:13:39 -08:00 |
|
lin-shh
|
8fa68a8ce4
|
Fix TYPE_CHECKING stub defaults in envs.py to match actual runtime defaults (#35645)
|
2026-03-02 21:59:43 -08:00 |
|
lin-shh
|
35a6f0bfe2
|
[Misc] Fix typos in comments: explict→explicit, paramaters→parameters (#35648)
|
2026-03-02 21:59:14 -08:00 |
|
Taneem Ibrahim
|
3a6cbf16e2
|
[MISC] Removed unused function find_all_indices() from tool_parsers/utils.py (#35683)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
|
2026-03-03 13:58:42 +08:00 |
|
Lucas Wilkinson
|
f44d1ddc8c
|
[BugFix] Fix cmake based incremental install (wrong vllm install dir) (#35773)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-03-02 21:58:16 -08:00 |
|
Cyrus Leung
|
48a54c1e0d
|
[CI/Build] Trigger processor tests on registry update (#35824)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-03 13:55:57 +08:00 |
|
Micah Williamson
|
8b9e8b7454
|
[ROCm][CI] Fix Assertion Logic For test_gpt_oss (#35806)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-03-03 05:08:04 +00:00 |
|
Wentao Ye
|
c21d0039ec
|
[Refactor] Fix maxsim cuda platform and add cli to control it (#35427)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-03-03 12:48:31 +08:00 |
|
Isotr0py
|
7d8bbe6f42
|
[CI/Build] Automatically patch video metadata for multimodal processor test (#35822)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-03 04:27:45 +00:00 |
|
aykoppol
|
25e02647c2
|
[Core] Add optional flags to check for repetitive token patterns in engine output (#35451)
Signed-off-by: aykoppol <aykoppol+git@gmail.com>
|
2026-03-03 12:23:25 +08:00 |
|
Woosuk Kwon
|
a0a5178ab4
|
[Model Runner V2] Use ModelState.prepare_attn() for cuda graph capture [5/N] (#35774)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-02 20:06:27 -08:00 |
|
Isotr0py
|
8ea8ba275e
|
[V0 deprecation] Remove Swin model (#35821)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-02 20:03:41 -08:00 |
|
Woosuk Kwon
|
4f85bae9d6
|
[Docs][Model Runner V2] Add Design Docs (#35819)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-02 19:58:14 -08:00 |
|
Andy Lo
|
0a7165fd71
|
[ModelRunnerV2] Rename sampler functions and variables for clarity (#35459)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2026-03-02 19:48:56 -08:00 |
|
Robert Shaw
|
6521ccf286
|
[CI] Temporarily Disable Nightly Failures (#35770)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-03-03 01:49:13 +00:00 |
|