Rabi Mishra
|
25eef3dc2e
|
feat(moe): Add is_act_and_mul=False support for Triton MoE kernels (#31645)
Signed-off-by: rabi <ramishra@redhat.com>
|
2026-01-08 10:27:09 +08:00 |
|
Matthew Bonanni
|
0d7667419f
|
[0/N][Attention] Fix miscellaneous pre-commit issues (#31924)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-08 01:15:17 +00:00 |
|
Robert Shaw
|
5dcd7ef1f2
|
[MoE Refactor][15/N] Apply Refactor to Fp8 (#31415)
|
2026-01-07 19:42:33 -05:00 |
|
Elvir Crnčević
|
ffc0a2798b
|
Add back missing DeepEP LL params (#31911)
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
|
2026-01-07 17:47:54 -05:00 |
|
Nick Hill
|
10ef65eded
|
[BugFix] Fix bad words with speculative decoding (#31908)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-07 15:46:42 -05:00 |
|
Ilya Markov
|
6170d47d22
|
[EPLB] Optimize EPLB with numpy (#29499)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-01-07 15:21:35 -05:00 |
|
Xin Yang
|
0ada960a20
|
[Kernel] Support bias type in grouped_topk kernel (#31781)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-07 12:16:32 -08:00 |
|
Ning Xie
|
c907d22158
|
[refactor] refactor memory constants usage (#31865)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-01-07 18:37:31 +00:00 |
|
Michael Goin
|
f347ac6c34
|
[Perf] Fuse stride preparation for NVFP4 cutlass_moe (#31837)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-07 13:31:26 -05:00 |
|
Festus Ayobami Owumi
|
05f47bd8d2
|
[Doc] Fix: Correct vLLM announcing blog post link in docs (#31868)
Signed-off-by: enfinity <festusowumi@gmail.com>
|
2026-01-07 10:06:42 -08:00 |
|
roikoren755
|
bf184a6621
|
Enable quantized attention in NemotronH models (#31898)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-01-07 17:37:19 +00:00 |
|
Jee Jee Li
|
30399cc725
|
UX: add vLLM env info in '/server_info' (#31899)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-07 17:13:02 +00:00 |
|
Kfir Toledo
|
b89443b8d9
|
[KVConnector]: Enable Cross-layers KV cache layout for MultiConnector (#30761)
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
|
2026-01-07 16:59:43 +00:00 |
|
Marko Rosenmueller
|
1d9e9ae8a4
|
[Bugfix]: prevent leaking tokens in crash log (#30751)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
|
2026-01-07 16:15:19 +00:00 |
|
Cyrus Leung
|
b7036c87a1
|
[Refactor] Clean up pooler modules (#31897)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-08 00:07:43 +08:00 |
|
Kate Cheng
|
cc6dafaef2
|
[Perf][Kernels] Enable FlashInfer DeepGEMM swapAB on SM90 (for W8A8 Linear Op) (#29213)
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
2026-01-07 10:53:54 -05:00 |
|
R3hankhan
|
1ab055efe6
|
[OpenAI] Extend VLLMValidationError to additional validation parameters (#31870)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
|
2026-01-07 14:45:49 +00:00 |
|
Cyrus Leung
|
b665bbc2d4
|
[Chore] Migrate V0 attention utils (#31891)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-07 13:44:36 +00:00 |
|
Jared Wen
|
974138751b
|
[Refactor] GLM-ASR Modeling (#31779)
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-07 13:08:29 +00:00 |
|
vllmellm
|
41cfa50632
|
[ROCm][AITER] fix wrong argument passed to AITER flash_attn_varlen_func (#31880)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-01-07 11:25:03 +00:00 |
|
Andy Liu
|
d111bc53ad
|
[Bugfix][MTP] Fix GLM4 MoE fp8 loading with MTP on (#31757)
Signed-off-by: Andy Liu <andyliu@roblox.com>
|
2026-01-07 09:18:52 +00:00 |
|
BlankR
|
0790f07695
|
[Misc] Improve error messages for unsupported types and parameters (#30593)
Signed-off-by: BlankR <hjyblanche@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-07 09:00:16 +00:00 |
|
maang
|
1f33e38e81
|
[Model] Cleanup: Remove redundant manual definition of make_empty_intermediate_tensors in GLM-4-MoE (#31869)
Signed-off-by: maang <maang_h@163.com>
|
2026-01-07 08:18:28 +00:00 |
|
sihao_li
|
59fe6f298e
|
[XPU]fallback to TRITON_ATTN on xpu when use float32 dtype (#31762)
Signed-off-by: sihao.li <sihao.li@intel.com>
|
2026-01-07 08:10:29 +00:00 |
|
weiyu
|
e7596371a4
|
[Refactor][TPU] Remove torch_xla path and use tpu-inference (#30808)
Signed-off-by: Wei-Yu Lin <weiyulin@google.com>
Signed-off-by: weiyu <62784299+weiyu0824@users.noreply.github.com>
|
2026-01-07 16:07:16 +08:00 |
|
xuebwang-amd
|
0dd5dee9b9
|
[Bugfix][Kernel] fix bias adding in triton kernel implemented fused moe (#31676)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
|
2026-01-07 07:36:13 +00:00 |
|
Kevin McKay
|
4614c5a539
|
[Bugfix][Hardware][AMD] Consolidate FP8 min/max values helper function (#31106)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
Signed-off-by: Kevin McKay <kevin@example.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2026-01-07 06:55:03 +00:00 |
|
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
|
482914849c
|
[BugFix] LoRA: Support loading base_layer of experts (#31104)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
2026-01-07 14:49:39 +08:00 |
|
tianshu-Michael-yu
|
efeaac92f2
|
[Bugfix] Fix race condition in async-scheduling for vlm model (#31841)
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>
|
2026-01-07 06:45:10 +00:00 |
|
tjp_zju
|
55caa6051d
|
refactor: find_loaded_library (#31866)
Signed-off-by: tjp_zju <tanjianpingzju1990@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-01-07 06:42:20 +00:00 |
|
Lucas Wilkinson
|
c7a79d41a0
|
[Attention][3/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties (#31850)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-07 13:31:34 +08:00 |
|
vllmellm
|
6409004b26
|
[ROCm][AITER] bugfix accuracy regression in ROCM_AITER_TRITON_MLA backend (#31816)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-01-07 05:04:53 +00:00 |
|
Cyrus Leung
|
aafd4d2354
|
[Chore] Try remove init_cached_hf_modules (#31786)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-07 12:34:04 +08:00 |
|
Jack Yang
|
0a2c2dc3f1
|
fixed mypy warnings for files vllm/v1/attention with TEMPORARY workaround (#31465)
Signed-off-by: Zhuohao Yang <zy242@cornell.edu>
Co-authored-by: Zhuohao Yang <zy242@cornell.edu>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-07 04:08:47 +00:00 |
|
Tyler Michael Smith
|
f09c5feb7c
|
Change warning in get_current_vllm_config to report caller's line number (#31855)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2026-01-07 03:48:13 +00:00 |
|
Cyrus Leung
|
1b8af957f6
|
[Doc] Update release docs (#31799)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-07 03:27:40 +00:00 |
|
Ce Zhao
|
a051525e07
|
[Model] Enable LoRA support for PaliGemma (#31656)
Signed-off-by: 赵策 <alcor@mac.mynetworksettings.com>
Signed-off-by: Alcor <alcor_zhao@outlook.com>
Co-authored-by: 赵策 <alcor@mac.mynetworksettings.com>
|
2026-01-07 10:09:32 +08:00 |
|
Yihua Cheng
|
5b833be49e
|
[1/2][lmcache connector] clean up lmcache multi-process adapter (#31838)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
|
2026-01-07 02:02:42 +00:00 |
|
Lucas Kabela
|
873480d133
|
[Misc][BE] Type coverage for vllm/compilation [1/3] (#31554)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-01-06 20:37:51 -05:00 |
|
vSeamar
|
6f351548b2
|
[Frontend] Implement robust video frame recovery for corrupted videos (#29197)
Signed-off-by: cmartinez <cmartinez@roblox.com>
Signed-off-by: vSeamar <cmartinez@roblox.com>
|
2026-01-07 01:13:24 +00:00 |
|
Andreas Karatzas
|
364a8bc6dc
|
[ROCm][CI] Fix plugin tests (2 GPUs) failures on ROCm and removing VLLM_FLOAT32_MATMUL_PRECISION from all ROCm tests (#31829)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-07 01:12:23 +00:00 |
|
Angela Yi
|
9a1d20a89c
|
[CI] Add warmup run in test_fusion_attn (#31183)
Signed-off-by: angelayi <yiangela7@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-01-07 00:31:52 +00:00 |
|
Cyrus Leung
|
309a8f66ee
|
[Bugfix] Handle mistral tokenizer in get_hf_processor (#31817)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-07 07:46:56 +08:00 |
|
Andreas Karatzas
|
e5d427e93a
|
[ROCm][CI] Pinning timm lib version to fix ImportError in Multi-Modal Tests (Nemotron) (#31835)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-06 23:23:11 +00:00 |
|
Andreas Karatzas
|
2a42ae790d
|
[ROCm][CI] Fix ModernBERT token classification test numerical accuracy on ROCm (#31820)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-06 23:21:15 +00:00 |
|
Matthew Bonanni
|
d49899732e
|
[Spec Decode][UX] Add acceptance stats to vllm bench serve report (#31739)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-01-06 21:21:42 +00:00 |
|
Elvir Crnčević
|
dba95378a6
|
Report error log after vllm bench serve (#31808)
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
|
2026-01-06 20:24:19 +00:00 |
|
Nikhil G
|
ada6f91d56
|
Fix RecursionError in MediaWithBytes unpickling (#31191)
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
|
2026-01-06 20:11:26 +00:00 |
|
Li, Jiang
|
8becf146bd
|
[Quantization][Refactor] Move CPU GPTQ kernel into MP linear (#31801)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-06 19:10:18 +00:00 |
|
Charlie Fu
|
c07163663d
|
[ROCm][CI] Fix tests/compile unit tests (#28895)
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-01-06 18:50:43 +00:00 |
|