Cyrus Leung
|
1b8af957f6
|
[Doc] Update release docs (#31799)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-07 03:27:40 +00:00 |
|
Ce Zhao
|
a051525e07
|
[Model] Enable LoRA support for PaliGemma (#31656)
Signed-off-by: 赵策 <alcor@mac.mynetworksettings.com>
Signed-off-by: Alcor <alcor_zhao@outlook.com>
Co-authored-by: 赵策 <alcor@mac.mynetworksettings.com>
|
2026-01-07 10:09:32 +08:00 |
|
Yihua Cheng
|
5b833be49e
|
[1/2][lmcache connector] clean up lmcache multi-process adapter (#31838)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
|
2026-01-07 02:02:42 +00:00 |
|
Lucas Kabela
|
873480d133
|
[Misc][BE] Type coverage for vllm/compilation [1/3] (#31554)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-01-06 20:37:51 -05:00 |
|
vSeamar
|
6f351548b2
|
[Frontend] Implement robust video frame recovery for corrupted videos (#29197)
Signed-off-by: cmartinez <cmartinez@roblox.com>
Signed-off-by: vSeamar <cmartinez@roblox.com>
|
2026-01-07 01:13:24 +00:00 |
|
Andreas Karatzas
|
364a8bc6dc
|
[ROCm][CI] Fix plugin tests (2 GPUs) failures on ROCm and removing VLLM_FLOAT32_MATMUL_PRECISION from all ROCm tests (#31829)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-07 01:12:23 +00:00 |
|
Angela Yi
|
9a1d20a89c
|
[CI] Add warmup run in test_fusion_attn (#31183)
Signed-off-by: angelayi <yiangela7@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-01-07 00:31:52 +00:00 |
|
Cyrus Leung
|
309a8f66ee
|
[Bugfix] Handle mistral tokenizer in get_hf_processor (#31817)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-07 07:46:56 +08:00 |
|
Andreas Karatzas
|
e5d427e93a
|
[ROCm][CI] Pinning timm lib version to fix ImportError in Multi-Modal Tests (Nemotron) (#31835)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-06 23:23:11 +00:00 |
|
Andreas Karatzas
|
2a42ae790d
|
[ROCm][CI] Fix ModernBERT token classification test numerical accuracy on ROCm (#31820)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-06 23:21:15 +00:00 |
|
Matthew Bonanni
|
d49899732e
|
[Spec Decode][UX] Add acceptance stats to vllm bench serve report (#31739)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-01-06 21:21:42 +00:00 |
|
Elvir Crnčević
|
dba95378a6
|
Report error log after vllm bench serve (#31808)
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
|
2026-01-06 20:24:19 +00:00 |
|
Nikhil G
|
ada6f91d56
|
Fix RecursionError in MediaWithBytes unpickling (#31191)
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
|
2026-01-06 20:11:26 +00:00 |
|
Li, Jiang
|
8becf146bd
|
[Quantization][Refactor] Move CPU GPTQ kernel into MP linear (#31801)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-06 19:10:18 +00:00 |
|
Charlie Fu
|
c07163663d
|
[ROCm][CI] Fix tests/compile unit tests (#28895)
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-01-06 18:50:43 +00:00 |
|
Benjamin Chislett
|
f7008ce1c4
|
[Perf] Async Scheduling + Speculative Decoding + Structured Outputs (#29821)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-06 18:50:37 +00:00 |
|
Yakine Tahtah
|
4e67a8f616
|
[Bugfix] Fix GLM-4 MoE router logits dtype for data parallel chunking (#31055)
Signed-off-by: ReinforcedKnowledge <reinforced.knowledge@gmail.com>
|
2026-01-06 17:57:56 +00:00 |
|
Masataro Asai
|
142c4d1738
|
make 500: InternalServerError more informative (#20610)
Signed-off-by: Masataro Asai <guicho2.71828@gmail.com>
|
2026-01-06 17:36:24 +00:00 |
|
Ning Xie
|
6f5e653383
|
[Log] add log about gpu worker init snapshot and requested memory (#29493)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-01-06 17:32:55 +00:00 |
|
Vadim Gimpelson
|
22dffca982
|
[PERF] Speed-up of GDN attention decode part (Qwen3-Next) (#31722)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-01-06 17:32:46 +00:00 |
|
Lucas Wilkinson
|
4c73be14e0
|
[Attention][2/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties (#31774)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-01-06 17:32:14 +00:00 |
|
Jinzhen Lin
|
2f4bdee61e
|
[Quantization][MoE] remove unused ep logic from moe marlin (#31571)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-06 09:07:19 -08:00 |
|
roikoren755
|
28c94770ad
|
[NemotronH] Use ReplicatedLinear for fc1_latent_proj (#31807)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-01-06 16:00:40 +00:00 |
|
Robert Shaw
|
af8fd73051
|
[MoE Refactor][14/N] Clean Up FI Quant Config Smuggling (#31593)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-06 15:47:04 +00:00 |
|
Robert Shaw
|
d3e477c013
|
[MoE Refactor] Add Temporary Integration Tests - H100/B200 (#31759)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-06 10:34:17 -05:00 |
|
Isotr0py
|
02809af1e7
|
[Bugfix]: Fix cross attention backend selection for Turing GPU (#31806)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-06 23:15:56 +08:00 |
|
Jee Jee Li
|
cbd4690a03
|
[LoRA]Disable linear LoRA kernel PDL (#31777)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-06 23:12:25 +08:00 |
|
wang.yuqi
|
96860af655
|
[Model] rename use_pad_token to use_sep_token (#31784)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-01-06 14:16:04 +00:00 |
|
Chauncey
|
0202971a48
|
[Frontend] Support GLM-4.5 / GLM-4.7 with enable_thinking: false (#31788)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-06 13:53:21 +00:00 |
|
Jzz1943
|
2c1a4f2488
|
[Bugfix]: avoid overriding audio/text kwargs (Qwen3-Omni) (#31790)
Signed-off-by: Zhongze Jiang <jiangzhongze.jzz@ant-intl.com>
|
2026-01-06 12:59:17 +00:00 |
|
Cyrus Leung
|
6444824873
|
[Misc] Implement TokenizerLike.convert_tokens_to_ids (#31796)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-06 12:08:22 +00:00 |
|
kzwrime
|
bf0f3a4638
|
[Bugfix] Fix torch.compile error for DP + MoE on CPU Backend (#31650)
Signed-off-by: kunzh <zhikun.wu@outlook.com>
|
2026-01-06 12:06:20 +00:00 |
|
Lucas Wilkinson
|
e0327c9db2
|
[Attention][1/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties (#31773)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-06 04:05:17 -08:00 |
|
Cyrus Leung
|
14df02b4e1
|
[Chore] Cleanup mem_utils.py (#31793)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-06 19:55:59 +08:00 |
|
BlankR
|
6ebb66ccea
|
[Doc] Fix format of multimodal_inputs.md (#31800)
Signed-off-by: BlankR <hjyblanche@gmail.com>
|
2026-01-06 03:30:24 -08:00 |
|
wang.yuqi
|
43d384bab4
|
[CI] Increase the MTEB_EMBED_TOL threshold to 5e-4. (#31797)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-01-06 19:30:05 +08:00 |
|
Cyrus Leung
|
db318326a5
|
[Misc] Use deprecated for seed_everything (#31780)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-06 11:29:55 +00:00 |
|
Fadi Arafeh
|
799b5721f6
|
[cpu][bench] Add CPU paged attention benchmarks (#31720)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-01-06 10:57:57 +00:00 |
|
Cyrus Leung
|
97ca4c3b60
|
[Chore] Remove more V0 dead code from sequence.py (#31783)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-06 10:25:14 +00:00 |
|
Isotr0py
|
ee2e69d6cd
|
[Bugfix][CI/Build] Fix failing pooling models test due to Triton kernel accuracy diff (#31776)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-06 00:44:22 -08:00 |
|
Isotr0py
|
7101e0851f
|
[Models]: Use MMEncoderAttention for MoonViT (#31738)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: h100 <h100@inferact.ai>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: h100 <h100@inferact.ai>
|
2026-01-06 08:00:25 +00:00 |
|
vllmellm
|
e9717801bd
|
[Bugfix][ROCm] Fix Unsupported attention metadata type for speculative decoding in eagle.py (#31714)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-01-06 07:53:22 +00:00 |
|
Cyrus Leung
|
da71d44410
|
[Doc] Show that use_audio_in_video is supported in docs (#30837)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-05 23:27:19 -08:00 |
|
Kevin McKay
|
1fb0209bbc
|
[Bugfix][Hardware][AMD] Fix exception types in AITER MLA FP8 check (#31177)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-01-06 14:10:59 +08:00 |
|
Robert Shaw
|
81323ea221
|
[CI] Fix CPU MM PRocessor Test (#31764)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-06 04:22:18 +00:00 |
|
Michael Goin
|
e1cd7a5faf
|
[Bugfix] Add init_workspace_manager to moe kernel benchmarks (#31042)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-05 19:14:33 -08:00 |
|
Michael Goin
|
a68e703c32
|
[UX] Add -ep shorthand for --enable-expert-parallel (#30890)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-05 19:13:36 -08:00 |
|
maang
|
cd1245a184
|
[Cleanup] Remove redundant decoder_layer_type assignment in Qwen2 (#31760)
Signed-off-by: maang <maang_h@163.com>
|
2026-01-05 18:09:18 -08:00 |
|
Wentao Ye
|
ffec815422
|
[Perf] Optimize additional fill(0) in cutlass moe, 2.9% E2E throughput improvement, 10.8% TTFT improvement (#31754)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-05 18:01:13 -08:00 |
|
maang
|
d386ab1412
|
[Docs] Improve malformed exception caused by backslash line continuations (#31694)
Signed-off-by: maang <maang_h@163.com>
Signed-off-by: maang <55082429+maang-h@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-05 17:51:54 -08:00 |
|