Charlie Fu
cddbc2b4b2
[ROCm][CI] Add rocm support for run-multi-node-test.sh ( #31922 )
...
Signed-off-by: charlifu <charlifu@amd.com >
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-08 04:36:39 +00:00
Andreas Karatzas
087a138963
[ROCm][CI] Fix attention backend test flakiness from uninitialized KV cache memory ( #31928 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-08 04:35:25 +00:00
Andreas Karatzas
c4041f37a4
[ROCm][LoRA] Fix MoE accuracy regression by preserving float32 router weight scaling ( #31931 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-08 04:17:56 +00:00
Richard Zou
a79079feef
[BugFix] Fix flakiness in test_eagle_dp for PyTorch 2.10 ( #31915 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-01-08 04:04:58 +00:00
Robert Shaw
9f6dcb71ae
[MoE Refactor][16/N] Apply Refactor to NVFP4 ( #31692 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Pavani Majety <pmajety@nvidia.com >
2026-01-08 03:46:27 +00:00
Andreas Karatzas
8dd2419fa9
[CI] Skip Qwen-VL in multimodal processing tests due to flaky external dependency ( #31932 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-08 02:58:01 +00:00
Rabi Mishra
39d82005f7
fix(rocm): add early return in get_flash_attn_version for ROCm ( #31286 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2026-01-08 10:28:07 +08:00
Rabi Mishra
25eef3dc2e
feat(moe): Add is_act_and_mul=False support for Triton MoE kernels ( #31645 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2026-01-08 10:27:09 +08:00
Matthew Bonanni
0d7667419f
[0/N][Attention] Fix miscellaneous pre-commit issues ( #31924 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-08 01:15:17 +00:00
Robert Shaw
5dcd7ef1f2
[MoE Refactor][15/N] Apply Refactor to Fp8 ( #31415 )
2026-01-07 19:42:33 -05:00
Elvir Crnčević
ffc0a2798b
Add back missing DeepEP LL params ( #31911 )
...
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com >
2026-01-07 17:47:54 -05:00
Nick Hill
10ef65eded
[BugFix] Fix bad words with speculative decoding ( #31908 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-07 15:46:42 -05:00
Ilya Markov
6170d47d22
[EPLB] Optimize EPLB with numpy ( #29499 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2026-01-07 15:21:35 -05:00
Xin Yang
0ada960a20
[Kernel] Support bias type in grouped_topk kernel ( #31781 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-07 12:16:32 -08:00
Ning Xie
c907d22158
[refactor] refactor memory constants usage ( #31865 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-01-07 18:37:31 +00:00
Michael Goin
f347ac6c34
[Perf] Fuse stride preparation for NVFP4 cutlass_moe ( #31837 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-07 13:31:26 -05:00
Festus Ayobami Owumi
05f47bd8d2
[Doc] Fix: Correct vLLM announcing blog post link in docs ( #31868 )
...
Signed-off-by: enfinity <festusowumi@gmail.com >
2026-01-07 10:06:42 -08:00
roikoren755
bf184a6621
Enable quantized attention in NemotronH models ( #31898 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
2026-01-07 17:37:19 +00:00
Jee Jee Li
30399cc725
UX: add vLLM env info in '/server_info' ( #31899 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-07 17:13:02 +00:00
Kfir Toledo
b89443b8d9
[KVConnector]: Enable Cross-layers KV cache layout for MultiConnector ( #30761 )
...
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com >
2026-01-07 16:59:43 +00:00
Marko Rosenmueller
1d9e9ae8a4
[Bugfix]: prevent leaking tokens in crash log ( #30751 )
...
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com >
2026-01-07 16:15:19 +00:00
Cyrus Leung
b7036c87a1
[Refactor] Clean up pooler modules ( #31897 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-08 00:07:43 +08:00
Kate Cheng
cc6dafaef2
[Perf][Kernels] Enable FlashInfer DeepGEMM swapAB on SM90 (for W8A8 Linear Op) ( #29213 )
...
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com >
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com >
Co-authored-by: Jhao-Ting Chen <jhaotingc@nvidia.com >
2026-01-07 10:53:54 -05:00
R3hankhan
1ab055efe6
[OpenAI] Extend VLLMValidationError to additional validation parameters ( #31870 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2026-01-07 14:45:49 +00:00
Cyrus Leung
b665bbc2d4
[Chore] Migrate V0 attention utils ( #31891 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-07 13:44:36 +00:00
Jared Wen
974138751b
[Refactor] GLM-ASR Modeling ( #31779 )
...
Signed-off-by: JaredforReal <w13431838023@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-07 13:08:29 +00:00
vllmellm
41cfa50632
[ROCm][AITER] fix wrong argument passed to AITER flash_attn_varlen_func ( #31880 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-01-07 11:25:03 +00:00
Andy Liu
d111bc53ad
[Bugfix][MTP] Fix GLM4 MoE fp8 loading with MTP on ( #31757 )
...
Signed-off-by: Andy Liu <andyliu@roblox.com >
2026-01-07 09:18:52 +00:00
BlankR
0790f07695
[Misc] Improve error messages for unsupported types and parameters ( #30593 )
...
Signed-off-by: BlankR <hjyblanche@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-01-07 09:00:16 +00:00
maang
1f33e38e81
[Model] Cleanup: Remove redundant manual definition of make_empty_intermediate_tensors in GLM-4-MoE ( #31869 )
...
Signed-off-by: maang <maang_h@163.com >
2026-01-07 08:18:28 +00:00
sihao_li
59fe6f298e
[XPU]fallback to TRITON_ATTN on xpu when use float32 dtype ( #31762 )
...
Signed-off-by: sihao.li <sihao.li@intel.com >
2026-01-07 08:10:29 +00:00
weiyu
e7596371a4
[Refactor][TPU] Remove torch_xla path and use tpu-inference ( #30808 )
...
Signed-off-by: Wei-Yu Lin <weiyulin@google.com >
Signed-off-by: weiyu <62784299+weiyu0824@users.noreply.github.com >
2026-01-07 16:07:16 +08:00
xuebwang-amd
0dd5dee9b9
[Bugfix][Kernel] fix bias adding in triton kernel implemented fused moe ( #31676 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
2026-01-07 07:36:13 +00:00
Kevin McKay
4614c5a539
[Bugfix][Hardware][AMD] Consolidate FP8 min/max values helper function ( #31106 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com >
Signed-off-by: Kevin McKay <kevin@example.com >
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
2026-01-07 06:55:03 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
482914849c
[BugFix] LoRA: Support loading base_layer of experts ( #31104 )
...
Signed-off-by: Hollow Man <hollowman@opensuse.org >
2026-01-07 14:49:39 +08:00
tianshu-Michael-yu
efeaac92f2
[Bugfix] Fix race condition in async-scheduling for vlm model ( #31841 )
...
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com >
2026-01-07 06:45:10 +00:00
tjp_zju
55caa6051d
refactor: find_loaded_library ( #31866 )
...
Signed-off-by: tjp_zju <tanjianpingzju1990@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-01-07 06:42:20 +00:00
Lucas Wilkinson
c7a79d41a0
[Attention][3/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties ( #31850 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-07 13:31:34 +08:00
vllmellm
6409004b26
[ROCm][AITER] bugfix accuracy regression in ROCM_AITER_TRITON_MLA backend ( #31816 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2026-01-07 05:04:53 +00:00
Cyrus Leung
aafd4d2354
[Chore] Try remove init_cached_hf_modules ( #31786 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-07 12:34:04 +08:00
Jack Yang
0a2c2dc3f1
fixed mypy warnings for files vllm/v1/attention with TEMPORARY workaround ( #31465 )
...
Signed-off-by: Zhuohao Yang <zy242@cornell.edu >
Co-authored-by: Zhuohao Yang <zy242@cornell.edu >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-01-07 04:08:47 +00:00
Tyler Michael Smith
f09c5feb7c
Change warning in get_current_vllm_config to report caller's line number ( #31855 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2026-01-07 03:48:13 +00:00
Cyrus Leung
1b8af957f6
[Doc] Update release docs ( #31799 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-07 03:27:40 +00:00
Ce Zhao
a051525e07
[Model] Enable LoRA support for PaliGemma ( #31656 )
...
Signed-off-by: 赵策 <alcor@mac.mynetworksettings.com >
Signed-off-by: Alcor <alcor_zhao@outlook.com >
Co-authored-by: 赵策 <alcor@mac.mynetworksettings.com >
2026-01-07 10:09:32 +08:00
Yihua Cheng
5b833be49e
[1/2][lmcache connector] clean up lmcache multi-process adapter ( #31838 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
2026-01-07 02:02:42 +00:00
Lucas Kabela
873480d133
[Misc][BE] Type coverage for vllm/compilation [1/3] ( #31554 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-01-06 20:37:51 -05:00
vSeamar
6f351548b2
[Frontend] Implement robust video frame recovery for corrupted videos ( #29197 )
...
Signed-off-by: cmartinez <cmartinez@roblox.com >
Signed-off-by: vSeamar <cmartinez@roblox.com >
2026-01-07 01:13:24 +00:00
Andreas Karatzas
364a8bc6dc
[ROCm][CI] Fix plugin tests (2 GPUs) failures on ROCm and removing VLLM_FLOAT32_MATMUL_PRECISION from all ROCm tests ( #31829 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-07 01:12:23 +00:00
Angela Yi
9a1d20a89c
[CI] Add warmup run in test_fusion_attn ( #31183 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-01-07 00:31:52 +00:00
Cyrus Leung
309a8f66ee
[Bugfix] Handle mistral tokenizer in get_hf_processor ( #31817 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-07 07:46:56 +08:00