Eunkwang Jeon
|
bdc2343454
|
[Bugfix] Fix KeyError in parse_response_input for reasoning items with optional content (#34499)
Signed-off-by: jeonsworld <jeonsworld@gmail.com>
|
2026-03-13 00:13:36 +08:00 |
|
SoluMilken
|
85199f9681
|
[Bugfix] fix main branch pre-commit error (1 line change) (#36897)
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
|
2026-03-12 09:08:37 -07:00 |
|
grimulkan
|
a1257fd1ea
|
[Kernel] Add FP8 KV cache support to Triton MLA decode attention (#34597)
Signed-off-by: grimulkan <grimulkan@gmail.com>
|
2026-03-12 08:32:34 -07:00 |
|
Kunshang Ji
|
53ec16a705
|
[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-12 07:57:47 -07:00 |
|
Wei Zhao
|
2e693f48e7
|
[Perf] Add TRTLLM FP8 MoE Modular Kernel (#36307)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-03-12 07:32:31 -07:00 |
|
Martin Hickey
|
7f1f36bf91
|
[CI] Fix mypy for vllm/reasoning (#35742)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-12 12:21:33 +00:00 |
|
caozuoba
|
9e19f8338b
|
[Perf] add packed recurrent fast path for decode (#36596)
Signed-off-by: hdj <1293066020@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-03-12 04:01:57 -07:00 |
|
Chauncey
|
5a71cdd76e
|
[Bugfix] Fix crash when tool_choice=required exceeds max_tokens (#36841)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-12 03:28:45 -07:00 |
|
Shanshan Shen
|
f0d3658c0f
|
[MM][OOT] Support CPU seq_lens for OOT MMEncoderAttention kernels (#36605)
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-12 03:28:23 -07:00 |
|
sfeiqiang
|
8cb24d3aed
|
[KV Connector] Support using FlexKV as KV Cache Offloading option. (#34328)
Signed-off-by: phaedonsun <phaedonsun@tencent.com>
Co-authored-by: phaedonsun <phaedonsun@tencent.com>
|
2026-03-12 00:46:20 -07:00 |
|
István Ketykó
|
00726c74c9
|
[Bugfix][Model] Fix DeepSeek-OCR TensorSchema crash on empty images_crop (#36670)
Signed-off-by: István Ketykó <istvan.ketyko@gmail.com>
|
2026-03-12 15:35:54 +08:00 |
|
Chauncey
|
9fe404ed04
|
[Frontend] OpenAI Responses API supports Tool/Function calling with streaming (#29947)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-12 15:03:50 +08:00 |
|
Sage
|
802f306cd1
|
[Tests] Skip model weight download for render-only test server (#36813)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
2026-03-12 06:24:42 +00:00 |
|
Yanan Cao
|
584a3f56de
|
[Kernel][Helion][13/N] Force static_shapes=False in helion register (#36677)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-12 05:35:29 +00:00 |
|
wang.yuqi
|
6ecabe4936
|
[CI Failure] Fix Language Models Test (Extended Pooling) daily CI Failure (#36761)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-12 12:22:05 +08:00 |
|
Flora Feng
|
8647c6cf51
|
[Bugfix] Fix minimax_m2 tool parser when stream interval > 1 (#35895)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-12 10:25:14 +08:00 |
|
Nick Hill
|
262b76a09f
|
[Frontend] Exclude anthropic billing header to avoid prefix cache miss (#36829)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-12 01:20:34 +00:00 |
|
Wentao Ye
|
c34ba6b961
|
[Perf] Optimize compute maxsim using batched version, 3.2% E2E throughput improvement (#36710)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-12 08:37:01 +08:00 |
|
Yanan Cao
|
cf632499ee
|
[Kernel] [Helion] [15/N] Split config files into per-platform files (#36698)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-11 17:25:29 -04:00 |
|
Or Ozeri
|
7ee5d5093b
|
[BugFix][kv_offload] Fix offloading decodes with async scheduling (#33881)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-11 20:43:40 +00:00 |
|
Harry Mellor
|
65986db6ba
|
Make Gemma and Gemma 2 accept inputs_embeds like Gemma 3 (#36787)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-11 18:12:43 +00:00 |
|
Luka Govedič
|
9556af87d5
|
[torch.compile] Add support for non-contiguous fused RMSNorm + group quant (#36551)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
|
2026-03-11 10:56:55 -07:00 |
|
Or Ozeri
|
a1a3523a56
|
[KVConnector] Support worker -> scheduler metadata (#31964)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-11 17:36:37 +00:00 |
|
Julien Denize
|
a5d06dc557
|
Add 320 dimension size support to MLA (#36161)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
|
2026-03-11 10:21:22 -07:00 |
|
Harry Mellor
|
5efa206a8c
|
Fix ExaoneMoeMTP test that never ran in Transformers v4 (#36792)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-11 17:10:23 +00:00 |
|
Cyrus Leung
|
196802dfa6
|
[Misc] Clean up renderers (#36770)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-11 16:39:29 +00:00 |
|
Isotr0py
|
c84b519cf3
|
[Bugfix] Fix negative max_tokens when input prompt is too long (#36789)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-11 16:30:51 +00:00 |
|
Richard Zou
|
822e250ab7
|
[torch.compile] Use FakeTensors instead of real GPU tensors for single-size compilation (#36093)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-11 16:07:09 +00:00 |
|
Jhao-Ting Chen
|
5573894737
|
Kimi k2.5 MLA based eagle3 (#36361)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Izzy Putterman <iputterman@nvidia.com>
|
2026-03-11 11:36:11 -04:00 |
|
Harry Mellor
|
d5816c8c2f
|
Fix tied weights in weight mapping test for Transformers v5 (#36788)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-11 15:10:26 +00:00 |
|
Wuxun Zhang
|
e584dce52b
|
Add XPU MLA Sparse backend for DeepSeek v3.2 (#33230)
Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com>
|
2026-03-11 19:19:15 +08:00 |
|
Weiguang Li
|
724759684c
|
[Bugfix] Fix Qwen3-VL timestamp mismatch when using num_frames without fps (#36136)
Signed-off-by: OiPunk <codingpunk@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-11 03:13:06 -07:00 |
|
Richard Zou
|
09b6f99852
|
[compile] aot_compile should respect VLLM_DISABLE_COMPILE_CACHE (#36358)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-11 03:12:03 -07:00 |
|
Angela Yi
|
13e79fc811
|
[ci] Update rtol for test_classification (#36556)
Signed-off-by: angelayi <yiangela7@gmail.com>
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
|
2026-03-11 03:08:16 -07:00 |
|
roikoren755
|
e661b9ee83
|
[NemotronH] Small fix reasoning parser (#36635)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-03-11 02:44:41 -07:00 |
|
Nicolò Lucchesi
|
098d844731
|
[NIXL][1/N] Refactor kernel_block_size detection (#35752)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-11 01:11:23 -07:00 |
|
Sladyn
|
4aaaf8c8ce
|
feat(spec_decode): fuse EAGLE step slot mapping and metadata updates (#33503)
Signed-off-by: sladynnunes <snunes@usc.edu>
|
2026-03-11 04:35:33 +00:00 |
|
Wentao Ye
|
a8ff2cca92
|
[Perf] Optimize scheduler overhead for PD disaggregation, around 5% E2E perf improvement (#35781)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-10 21:25:30 -07:00 |
|
tunglinwood
|
42fadebecb
|
[Model] Add support for moonshotai/Kimi-Audio-7B-Instruct (#36127)
Signed-off-by: tunglinwood <tunglinwood@gmail.com>
Signed-off-by: tunglinwood <tomwu.tunglin@gmail.com>
Signed-off-by: tunglinwood <113751333+tunglinwood@users.noreply.github.com>
|
2026-03-10 21:24:48 -07:00 |
|
Ning Xie
|
fe714dd507
|
[openapi server] log exception in exception handler(2/N) (#36201)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-03-10 20:16:30 -07:00 |
|
Nick Hill
|
65b2f405dc
|
[Core] Simplify core kv-cache blocks initialization logic (#36521)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-10 20:20:02 +00:00 |
|
Nick Hill
|
2a68464c5b
|
[Test] test_async_scheduling.py improvements (#36340)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-10 11:17:26 -07:00 |
|
Harry Mellor
|
f83b933b84
|
[CI] Bump mypy version to 1.19.1 (#36104)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-10 09:18:28 -07:00 |
|
Hashem Hashemi
|
721ae79f50
|
Improvements to wvSplitKrc skinny GEMM solution (#34304)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-03-10 09:14:27 -07:00 |
|
Srinivasoo7
|
106ff69c4e
|
feat(kv-offload): Strategy A — StoreReusedOffloadingManager gates CPU stores on reuse frequency (#35342)
Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com>
Signed-off-by: Sriusa4414@gmail.com
Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-10 14:43:40 +00:00 |
|
Jiangyun Zhu
|
ca5fb4bbd8
|
[Bugfix] Avoid merging empty-only partitions into splitting-op subgraphs (#36595)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-03-10 07:39:01 -07:00 |
|
wang.yuqi
|
a3189a08b0
|
[Model] Consolidate score logic by introduce score_type (#36479)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-10 13:32:25 +00:00 |
|
Mark McLoughlin
|
234860399b
|
[Frontend][Core] Revert "Add shutdown timeout" (#34730 and #36270) (#36628)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2026-03-10 06:20:41 -07:00 |
|
Harry Mellor
|
c88510083b
|
Fix Qwen2.5-VL test for Transformers v5 (#36532)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-10 12:05:34 +00:00 |
|
Chang Su
|
507ddbe992
|
feat(grpc): extract gRPC servicer into smg-grpc-servicer package, add --grpc flag to vllm serve (#36169)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2026-03-10 03:29:59 -07:00 |
|