Nick Hill
|
876a16f4fb
|
[ModelRunner V2] Fix spec decoding + logprobs (#33391)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-31 03:33:26 +00:00 |
|
Matthew Bonanni
|
aaa901ad55
|
[Attention] Move MLA forward from backend to layer (#33284)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-30 19:30:00 -08:00 |
|
Wentao Ye
|
010ec0c30e
|
[Deprecation] Deprecate seed_everything and scatter_mm_placeholders in v0.15 (#33362)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-31 02:54:16 +00:00 |
|
Alberto Ferrer
|
64a40a7ab4
|
[Bugfix] Fix typo in read_offset variable name (#33426)
Signed-off-by: Alberto Ferrer <albertof@barrahome.org>
|
2026-01-31 01:26:15 +00:00 |
|
Gregory Shtrasberg
|
31aedfe7d6
|
[Bugfix][ROCm] Fixing the skinny gemm dispatch logic from #32831 (#33366)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-01-30 19:05:23 -06:00 |
|
Michael Goin
|
67ebaff528
|
Refactor NVFP4 Linear utils for ModelOpt and CT (#33201)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-30 16:37:42 -08:00 |
|
Chendi.Xue
|
2b465570e6
|
[CI][HPU]accelerate hpu test by skip python re-install and clean container name (#33286)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2026-01-30 21:36:29 +00:00 |
|
Huy Do
|
9ca66ecc10
|
Indicate compile mode in the benchmark results (#32990)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2026-01-30 15:34:36 -05:00 |
|
Pavani Majety
|
c3a9752b0c
|
[Hardware][SM100] Add TRTLLM Kernel for INT4 W4A16 Kernel. (#32437)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2026-01-30 10:30:46 -08:00 |
|
xuebwang-amd
|
f451b4558b
|
[Quantization][ROCm] Fix MoE weight loading to be robust (Qwen3_MoE/Qwen3_next as example models) (#33173)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
|
2026-01-30 17:50:23 +00:00 |
|
Vasiliy Kuznetsov
|
3f96fcf646
|
fix QERL attention import path (#33432)
Signed-off-by: vasiliy <vasiliy@fb.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-30 09:29:09 -08:00 |
|
Yanan Cao
|
6c1f9e4c18
|
[Kernel] [Helion] [1/N] Add Helion ConfigManager (#32740)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2026-01-30 12:19:19 -05:00 |
|
Harry Mellor
|
67239c4c42
|
Fix encoder-decoder model disabling mm processor cache (#33236)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-30 16:30:10 +00:00 |
|
Nicolò Lucchesi
|
8ece60768f
|
[CI] Qwen3-ASR transcriptios tests (#33414)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-30 16:17:56 +00:00 |
|
Michael Goin
|
fd0e377244
|
Support FP8 block quant for CompressedTensorsW8A16Fp8 (#33280)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-30 11:15:20 -05:00 |
|
Kyle Sayers
|
f857a03f6b
|
[QeRL] Layerwise Reloading (#32133)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-01-30 08:50:05 -07:00 |
|
Danielle Robinson
|
74898a7015
|
[BugFix][LoRA] TritonExperts is ModularMoEPath for FP8 models (#33393)
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com>
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com>
|
2026-01-30 15:27:42 +00:00 |
|
Frank Wang
|
8f5d51203b
|
Disable Cascade Attention for Batch Invariance (#32561)
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-30 10:00:46 -05:00 |
|
Julien Denize
|
ae5b7aff2b
|
Improve Mistral format checks. (#33253)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-30 06:23:33 -08:00 |
|
Harry Mellor
|
a11bc12d53
|
Fix test_moe.py for Transformers v5 (#33413)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-30 14:03:25 +00:00 |
|
Nathan Weinberg
|
58cb55e4de
|
[Doc] Enhance documentation around CPU container images (#32286)
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
|
2026-01-30 13:36:20 +00:00 |
|
杨朱 · Kiki
|
cf896ae0e3
|
[Misc] Clean up HIDDEN_DEPRECATED_METRICS after metric removal (#33323)
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-01-30 13:31:17 +00:00 |
|
Harry Mellor
|
c5113f60f2
|
Remove deprecated reasoning_content message field (#33402)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-30 11:48:15 +00:00 |
|
vllmellm
|
174f16700b
|
[Doc] [ROCm] Update Documentation to reflect v0.15.0 release (#33388)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-01-30 19:06:08 +08:00 |
|
Julien Denize
|
8e2ad97ad0
|
[BUGFIX] Pixtral cannot be loaded with --limit-mm-per-prompt 0 (#33406)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
|
2026-01-30 02:52:02 -08:00 |
|
Patrick von Platen
|
10152d2194
|
[Realtime API] Adds minimal realtime API based on websockets (#33187)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-30 18:41:29 +08:00 |
|
杨朱 · Kiki
|
1a7894dbdf
|
[Misc] Replace Optional[X] with X | None syntax (#33332)
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-01-30 01:56:59 -08:00 |
|
Cyrus Leung
|
c87eac18f7
|
[Refactor] Move MM item count validation outside of processor (#33396)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-30 09:27:31 +00:00 |
|
tianshu-Michael-yu
|
f45870b53f
|
fix: allow LFM2 MoE prefix caching (align) (#33376)
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>
|
2026-01-30 08:23:14 +00:00 |
|
hujiaxin0
|
ba45bedfd1
|
[model] Add support for openPangu7B-VL (#32449)
Signed-off-by: hujiaxin <524446785@qq.com>
Signed-off-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com>
Co-authored-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com>
|
2026-01-30 15:54:27 +08:00 |
|
Harry Mellor
|
9432ed8c7e
|
Explicitly set return_dict for apply_chat_template (#33372)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-30 07:27:04 +00:00 |
|
Lucas Kabela
|
726d89720c
|
[CI] Enable mypy import following for vllm/spec_decode (#33282)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-01-30 06:43:32 +00:00 |
|
Harry Mellor
|
d334dd26c4
|
Move decode context parallel validationn to ParallelConfig (#33239)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-30 06:18:41 +00:00 |
|
Ryan Rock
|
070c811d6f
|
[CI][AMD] Skip 4 GPUs testgroup ray tests (#33305)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-01-29 21:39:53 -08:00 |
|
Isotr0py
|
8bfc8d5600
|
[Models] Refactor Kimi-K2.5 weight loading (#33346)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-30 05:31:20 +00:00 |
|
Harry Huang
|
ec51831a22
|
[BugFix] Disable async scheduling for Mamba prefix caching (#33352)
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
|
2026-01-30 04:40:19 +00:00 |
|
Harry Mellor
|
80b918f2bd
|
Fix tie_word_embeddings for multimodal models in Transformers v5 (#33359)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-30 03:37:39 +00:00 |
|
Wang Haoyu
|
c46b0cd0af
|
[Model][Multimodal] Add explicit MusicFlamingo adapter (#32696)
Signed-off-by: WangHaoyuuu <mailwhaoyu@gmail.com>
|
2026-01-30 11:01:29 +08:00 |
|
Aidan Reilly
|
133765760b
|
[Docs] Adding links and intro to Speculators and LLM Compressor (#32849)
Signed-off-by: Aidan Reilly <aireilly@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
v0.16.0rc0
|
2026-01-29 14:12:35 -08:00 |
|
Michael Goin
|
bfb9bdaf3f
|
[Bugfix] Enable Triton MoE for FP8 per-tensor dynamic (#33300)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-29 12:15:17 -08:00 |
|
Kevin H. Luu
|
2284461d02
|
[release] Minor fixes to release annotation and wheel upload (#33129)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-01-29 12:09:35 -08:00 |
|
danisereb
|
8e2a469b3b
|
Add Triton fused MoE config for B200 (Nemotron Nano) (#32804)
|
2026-01-29 19:21:33 +00:00 |
|
CarstyYou
|
23591e631e
|
[Bugfix][Kernel] Fix negative memory offset in GDN Triton kernel (#33326)
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
|
2026-01-29 10:40:11 -08:00 |
|
Linda
|
0493d897c4
|
[NVIDIA] [feat] Integrate flashinfer Trtllmgen bf16 moe (#32954)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2026-01-29 10:00:13 -08:00 |
|
Chendi.Xue
|
8c8ebeb941
|
[BUGFIX][XPU] fix memory check after XPU reuse GPU_worker (#33358)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2026-01-29 09:56:30 -08:00 |
|
Cyrus Leung
|
831453fcef
|
[Chore] Move MediaConnector to vllm.multimodal.media (#33324)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-29 16:54:31 +00:00 |
|
Angela Yi
|
5a66c9cc76
|
[ez] Delete torch25_custom_graph_pass (#33287)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-01-29 16:47:05 +00:00 |
|
Isotr0py
|
5e73e4900c
|
[Bugfix] Fix broken GLM-OCR initialization (#33350)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-29 07:56:05 -08:00 |
|
Cyrus Leung
|
c6e7404cc5
|
[Multimodal] Simplify MM input definitions (#33331)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-29 13:32:04 +00:00 |
|
sthWrong
|
17b17c0684
|
[Backport] [Kimi-K2.5] Replace torch.cuda with current_platform for d… (#33320)
|
2026-01-29 12:29:17 +00:00 |
|