Benjamin Chislett
8b346309a5
[Refactor] Consolidate SupportsEagle ( #36063 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-03-13 23:22:40 +00:00
Kevin H. Luu
f1816fb192
[CI] Split V1 e2e + engine (1 GPU) into separate jobs ( #36945 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-13 14:16:02 -07:00
Harry Mellor
0005d2a3c9
Use Transformers v5 WeightRenaming for Transformers modeling backend ( #31545 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-13 20:49:08 +00:00
Mark McLoughlin
7afe0faab1
[Frontend][Core] Re-add shutdown timeout - allowing in-flight requests to finish ( #36666 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-03-13 12:10:06 -07:00
yugong333
b3ce711b93
Fp8 lora dense kernel ( #35242 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
2026-03-13 19:05:08 +00:00
Isotr0py
abf61aaa8e
[Bugfix] Fix Qwen2.5-omni/Qwen3-omni mm_processor cache for audio_in_video request ( #36800 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-13 18:16:05 +00:00
Itay Alroy
d5af196c18
[2/N] Elastic EP Milestone 2: Integrating NIXL-EP ( #35627 )
...
Signed-off-by: Itay Alroy <ialroy@nvidia.com >
Co-authored-by: Yongji Wu <wuyongji317@gmail.com >
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com >
2026-03-13 09:25:33 -04:00
Or Ozeri
cfaf4668f7
[kv_offload+HMA][1/N]: Support multiple KV groups in OffloadingSpec ( #36610 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-03-13 08:04:21 +00:00
Sage
a2268617cf
[Frontend] Delegate preprocessing to OpenAIServingRender ( #36483 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-13 00:39:43 -07:00
Rohan Potdar
a4ad9db541
Enable RoPE+KV cache fusion for ROCm AITER FA (non-shuffle layout) ( #35786 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-03-13 07:33:22 +00:00
Nick Hill
b373b5102a
[Tests] Shutdown test RemoteVLLMServer cleanly ( #36950 )
...
Recent PR #33949 changed the teardown logic of the RemoteVLLMServer test utility class to
send SIGTERM to all vllm (sub)processes at once, which breaks the clean/coordinated
shutdown logic that assumes only the top-level process will receive a signal (for example
when running in a container that's shut down).
This caused a bunch of errors and stacktraces in some test logs, even though those tests
still pass. We should still attempt a normal shutdown and only kill other procs if they are
still running after a few seconds.
Example: tests/v1/distributed/test_external_lb_dp.py::test_external_lb_completion_streaming
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-03-13 07:32:55 +00:00
Csrayz
bc2c0c86ef
[Frontend] Fix usage incorrectly returned with empty stream_options` ( #36379 )
...
Signed-off-by: Csrayz <33659823+Csrayz@users.noreply.github.com >
2026-03-13 03:33:04 +00:00
whyiug
1ce13cf992
[Model] Add support for BERT-like Chinese ERNIE pooling models ( #36385 )
...
Signed-off-by: whyiug <whyiug@hotmail.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-13 03:23:53 +00:00
Nikita
10f08dedfa
[Model] Add ColPali late interaction model for multi-modal retrieval ( #36818 )
...
Signed-off-by: Nikita Sukharev <kaonael@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-03-13 02:18:57 +00:00
Xinan Miao
2cdf92228c
[Feature]: Remove Chunking From FusedMoE ( #34086 )
...
Signed-off-by: SouthWest7 <am1ao@qq.com >
Signed-off-by: Southwest <1403572259@qq.com >
Signed-off-by: southwest <am1ao@qq.com >
Signed-off-by: Xinan Miao <1403572259@qq.com >
Co-authored-by: SouthWest7 <am1ao@qq.com >
2026-03-12 14:24:38 -04:00
Marc Sun
c973ecdead
[bnb] Skip moe + bnb test ( #36896 )
...
Signed-off-by: Marc Sun <marc@huggingface.co >
2026-03-12 18:03:25 +00:00
Eunkwang Jeon
bdc2343454
[Bugfix] Fix KeyError in parse_response_input for reasoning items with optional content ( #34499 )
...
Signed-off-by: jeonsworld <jeonsworld@gmail.com >
2026-03-13 00:13:36 +08:00
SoluMilken
85199f9681
[Bugfix] fix main branch pre-commit error (1 line change) ( #36897 )
...
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw >
2026-03-12 09:08:37 -07:00
grimulkan
a1257fd1ea
[Kernel] Add FP8 KV cache support to Triton MLA decode attention ( #34597 )
...
Signed-off-by: grimulkan <grimulkan@gmail.com >
2026-03-12 08:32:34 -07:00
Kunshang Ji
53ec16a705
[Hardware] Replace torch.cuda.device_count/current_device/set_device API ( #36145 )
...
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-12 07:57:47 -07:00
Wei Zhao
2e693f48e7
[Perf] Add TRTLLM FP8 MoE Modular Kernel ( #36307 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-03-12 07:32:31 -07:00
Martin Hickey
7f1f36bf91
[CI] Fix mypy for vllm/reasoning ( #35742 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-12 12:21:33 +00:00
caozuoba
9e19f8338b
[Perf] add packed recurrent fast path for decode ( #36596 )
...
Signed-off-by: hdj <1293066020@qq.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-03-12 04:01:57 -07:00
Chauncey
5a71cdd76e
[Bugfix] Fix crash when tool_choice=required exceeds max_tokens ( #36841 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-12 03:28:45 -07:00
Shanshan Shen
f0d3658c0f
[MM][OOT] Support CPU seq_lens for OOT MMEncoderAttention kernels ( #36605 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-12 03:28:23 -07:00
sfeiqiang
8cb24d3aed
[KV Connector] Support using FlexKV as KV Cache Offloading option. ( #34328 )
...
Signed-off-by: phaedonsun <phaedonsun@tencent.com >
Co-authored-by: phaedonsun <phaedonsun@tencent.com >
2026-03-12 00:46:20 -07:00
István Ketykó
00726c74c9
[Bugfix][Model] Fix DeepSeek-OCR TensorSchema crash on empty images_crop ( #36670 )
...
Signed-off-by: István Ketykó <istvan.ketyko@gmail.com >
2026-03-12 15:35:54 +08:00
Chauncey
9fe404ed04
[Frontend] OpenAI Responses API supports Tool/Function calling with streaming ( #29947 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-03-12 15:03:50 +08:00
Sage
802f306cd1
[Tests] Skip model weight download for render-only test server ( #36813 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2026-03-12 06:24:42 +00:00
Yanan Cao
584a3f56de
[Kernel][Helion][13/N] Force static_shapes=False in helion register ( #36677 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-12 05:35:29 +00:00
wang.yuqi
6ecabe4936
[CI Failure] Fix Language Models Test (Extended Pooling) daily CI Failure ( #36761 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-12 12:22:05 +08:00
Flora Feng
8647c6cf51
[Bugfix] Fix minimax_m2 tool parser when stream interval > 1 ( #35895 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2026-03-12 10:25:14 +08:00
Nick Hill
262b76a09f
[Frontend] Exclude anthropic billing header to avoid prefix cache miss ( #36829 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-12 01:20:34 +00:00
Wentao Ye
c34ba6b961
[Perf] Optimize compute maxsim using batched version, 3.2% E2E throughput improvement ( #36710 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-12 08:37:01 +08:00
Yanan Cao
cf632499ee
[Kernel] [Helion] [15/N] Split config files into per-platform files ( #36698 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-11 17:25:29 -04:00
Or Ozeri
7ee5d5093b
[BugFix][kv_offload] Fix offloading decodes with async scheduling ( #33881 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-03-11 20:43:40 +00:00
Harry Mellor
65986db6ba
Make Gemma and Gemma 2 accept inputs_embeds like Gemma 3 ( #36787 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-11 18:12:43 +00:00
Luka Govedič
9556af87d5
[torch.compile] Add support for non-contiguous fused RMSNorm + group quant ( #36551 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com >
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com >
2026-03-11 10:56:55 -07:00
Or Ozeri
a1a3523a56
[KVConnector] Support worker -> scheduler metadata ( #31964 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-03-11 17:36:37 +00:00
Julien Denize
a5d06dc557
Add 320 dimension size support to MLA ( #36161 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2026-03-11 10:21:22 -07:00
Harry Mellor
5efa206a8c
Fix ExaoneMoeMTP test that never ran in Transformers v4 ( #36792 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-11 17:10:23 +00:00
Cyrus Leung
196802dfa6
[Misc] Clean up renderers ( #36770 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-11 16:39:29 +00:00
Isotr0py
c84b519cf3
[Bugfix] Fix negative max_tokens when input prompt is too long ( #36789 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-11 16:30:51 +00:00
Richard Zou
822e250ab7
[torch.compile] Use FakeTensors instead of real GPU tensors for single-size compilation ( #36093 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-03-11 16:07:09 +00:00
Jhao-Ting Chen
5573894737
Kimi k2.5 MLA based eagle3 ( #36361 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com >
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com >
Co-authored-by: Izzy Putterman <iputterman@nvidia.com >
2026-03-11 11:36:11 -04:00
Harry Mellor
d5816c8c2f
Fix tied weights in weight mapping test for Transformers v5 ( #36788 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-11 15:10:26 +00:00
Wuxun Zhang
e584dce52b
Add XPU MLA Sparse backend for DeepSeek v3.2 ( #33230 )
...
Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com >
2026-03-11 19:19:15 +08:00
Weiguang Li
724759684c
[Bugfix] Fix Qwen3-VL timestamp mismatch when using num_frames without fps ( #36136 )
...
Signed-off-by: OiPunk <codingpunk@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-11 03:13:06 -07:00
Richard Zou
09b6f99852
[compile] aot_compile should respect VLLM_DISABLE_COMPILE_CACHE ( #36358 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-03-11 03:12:03 -07:00
Angela Yi
13e79fc811
[ci] Update rtol for test_classification ( #36556 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com >
2026-03-11 03:08:16 -07:00