Michael Goin
29fba76781
[UX] Use gguf repo_id:quant_type syntax for examples and docs ( #33371 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-31 12:14:54 +08:00
Nick Hill
876a16f4fb
[ModelRunner V2] Fix spec decoding + logprobs ( #33391 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-31 03:33:26 +00:00
Matthew Bonanni
aaa901ad55
[Attention] Move MLA forward from backend to layer ( #33284 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-30 19:30:00 -08:00
Gregory Shtrasberg
31aedfe7d6
[Bugfix][ROCm] Fixing the skinny gemm dispatch logic from #32831 ( #33366 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-01-30 19:05:23 -06:00
Michael Goin
67ebaff528
Refactor NVFP4 Linear utils for ModelOpt and CT ( #33201 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-30 16:37:42 -08:00
Pavani Majety
c3a9752b0c
[Hardware][SM100] Add TRTLLM Kernel for INT4 W4A16 Kernel. ( #32437 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2026-01-30 10:30:46 -08:00
Yanan Cao
6c1f9e4c18
[Kernel] [Helion] [1/N] Add Helion ConfigManager ( #32740 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2026-01-30 12:19:19 -05:00
Harry Mellor
67239c4c42
Fix encoder-decoder model disabling mm processor cache ( #33236 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-30 16:30:10 +00:00
Nicolò Lucchesi
8ece60768f
[CI] Qwen3-ASR transcriptios tests ( #33414 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-30 16:17:56 +00:00
Kyle Sayers
f857a03f6b
[QeRL] Layerwise Reloading ( #32133 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2026-01-30 08:50:05 -07:00
Frank Wang
8f5d51203b
Disable Cascade Attention for Batch Invariance ( #32561 )
...
Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-01-30 10:00:46 -05:00
Julien Denize
ae5b7aff2b
Improve Mistral format checks. ( #33253 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
Signed-off-by: juliendenize <julien.denize@mistral.ai >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-30 06:23:33 -08:00
Harry Mellor
a11bc12d53
Fix test_moe.py for Transformers v5 ( #33413 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-30 14:03:25 +00:00
杨朱 · Kiki
cf896ae0e3
[Misc] Clean up HIDDEN_DEPRECATED_METRICS after metric removal ( #33323 )
...
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-30 13:31:17 +00:00
Harry Mellor
c5113f60f2
Remove deprecated reasoning_content message field ( #33402 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-30 11:48:15 +00:00
Patrick von Platen
10152d2194
[Realtime API] Adds minimal realtime API based on websockets ( #33187 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-30 18:41:29 +08:00
Cyrus Leung
c87eac18f7
[Refactor] Move MM item count validation outside of processor ( #33396 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-30 09:27:31 +00:00
hujiaxin0
ba45bedfd1
[model] Add support for openPangu7B-VL ( #32449 )
...
Signed-off-by: hujiaxin <524446785@qq.com >
Signed-off-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com >
Co-authored-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com >
2026-01-30 15:54:27 +08:00
Harry Mellor
9432ed8c7e
Explicitly set return_dict for apply_chat_template ( #33372 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-30 07:27:04 +00:00
Ryan Rock
070c811d6f
[CI][AMD] Skip 4 GPUs testgroup ray tests ( #33305 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-01-29 21:39:53 -08:00
Wang Haoyu
c46b0cd0af
[Model][Multimodal] Add explicit MusicFlamingo adapter ( #32696 )
...
Signed-off-by: WangHaoyuuu <mailwhaoyu@gmail.com >
2026-01-30 11:01:29 +08:00
Cyrus Leung
831453fcef
[Chore] Move MediaConnector to vllm.multimodal.media ( #33324 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-29 16:54:31 +00:00
Cyrus Leung
c6e7404cc5
[Multimodal] Simplify MM input definitions ( #33331 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-29 13:32:04 +00:00
Roger Wang
8b3f0a99dd
[Models] Qwen3-ASR ( #33312 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-01-29 19:27:15 +08:00
Patrick von Platen
40c35038d2
[Voxtral] Streaming example ( #33042 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-01-29 03:22:49 -08:00
andrii.pasternak
615e8033e5
[Bug Fix] Handle variable-length tensors in MultiModalFlatField batching ( #31751 )
...
Signed-off-by: Andrii Pasternak <andriipasternak31@gmail.com >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-29 10:42:59 +00:00
daniel-salib
8688c3d460
[fix] tesdt mcp_tool_calling_streaming with a more complex math question ( #32769 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com >
2026-01-29 10:25:58 +00:00
Isotr0py
3a92c6f3b5
[Misc] Cleanup Kimi-K2.5's vision chunk modality entrypoints ( #33157 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-29 09:46:02 +00:00
shanjiaz
5eeba80c74
Adding optional speculator tests for larger models ( #32943 )
...
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com >
2026-01-29 16:54:02 +08:00
cmunley1
3bba2edb0f
support returning tokenids in responses api ( #33212 )
...
Signed-off-by: Christian Munley <cmunley@nvidia.com >
2026-01-29 16:52:39 +08:00
wang.yuqi
abb34ac43a
[Bugfix] Fix Qwen3-VL-Reranker load. ( #33298 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-29 08:42:53 +00:00
Cyrus Leung
51550179fc
[Refactor] Define MM data parser in processing info instead of processor itself ( #33260 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-29 13:55:17 +08:00
Michael Goin
ca1969186d
[UX] Enable nested configs in config yaml files ( #33193 )
2026-01-28 16:54:25 -05:00
Rohan Potdar
59bcc5b6f2
Use aiter triton fused_add_rmsnorm_pad for gpt-oss ( #30976 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-01-28 20:47:47 +00:00
Wentao Ye
3e440786af
[Feature] Fully support for async scheduling + PP, 30.8% E2E throughput improvement, 31.8% TPOT improvement ( #32618 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-28 20:30:32 +00:00
Nicolò Lucchesi
8ebf372e9d
[CI] Whisper tests enforce_eager=False ( #33098 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-28 09:36:56 -08:00
Robert Shaw
af9b69f977
[Quantization][Deprecation] Remove Marlin 24 ( #32688 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 15:54:59 +00:00
Chauncey
8e5e40daf4
[Misc] Provide a DeepSeek ReasoningParser with thinking enabled by default ( #33221 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-28 21:16:53 +08:00
Or Ozeri
2e8de86777
Revert "Enable Cross layers KV cache layout at NIXL Connector ( #30207 )" ( #33241 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
2026-01-28 04:36:00 -08:00
Robert Shaw
247d1a32ea
[Quantization][Deprecation] Remove BitBlas ( #32683 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-28 11:06:22 +00:00
ramos
36d450e3b8
Adds FunAudioChat multimodal audio model support ( #2 ) ( #33058 )
...
Signed-off-by: ramos <49182011+nemoramo@users.noreply.github.com >
Signed-off-by: mayufeng <mayufeng@example.com >
Co-authored-by: mayufeng <mayufeng@example.com >
2026-01-28 05:18:09 +00:00
Harry Mellor
2eb673a088
Add flake8-implicit-str-concat rules to Ruff ( #33191 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 04:56:10 +00:00
Richard Zou
d9aa39a3bb
[torch.compile] Speed up MOE handling in forward_context ( #33184 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-01-27 15:17:54 -08:00
danielafrimi
83fb2d09e8
Support heterogeneous NemotronHPuzzle model ( #32549 )
...
Signed-off-by: <dafrimi@nvidia.com >
Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com >
Signed-off-by: root <dafrimi@nvidia.com >
2026-01-27 10:55:54 -05:00
Matthew Bonanni
a608b4c6c2
[5/N][Attention] Finish eliminating vllm/attention folder ( #32064 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-27 10:02:51 -05:00
omerpaz95
7227d06156
[Metrics] [KVConnector] Add Offloading Connector metrics ( #27942 )
...
Added queries and hits metrics for the Offloading Connector.
Also added timing metrics for store and load operations, which take the
average time it takes to load/store, per-token.
The metrics are available from Prometheus and from the StatLogger.
Signed-off-by: omerpaz95 <omerpaz95@gmail.com >
Co-authored-by: Omer Paz <Omer.Paz@ibm.com >
2026-01-27 13:34:49 +00:00
Harry Mellor
14385c80fc
Fix weight mapping test for Transfomers v5 ( #33162 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-27 12:30:14 +00:00
wang.yuqi
76139d0801
[Frontend] Frontend will only attach supported tasks corresponding entrypoints. ( #33139 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-01-27 12:15:43 +00:00
Roger Wang
b539f988e1
[Models] Kimi-K2.5 ( #33131 )
...
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn >
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn >
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-27 14:50:31 +08:00
Andreas Karatzas
6c00645712
[CI][Pooling] Stabilize ModernBERT test ( #32909 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-27 05:26:48 +00:00