Angela Yi
f67299f66d
[compile] Enable sequence parallelism matching w/o custom ops enabled ( #27126 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: ProExpertProg <lgovedic@redhat.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <luka.govedic@gmail.com >
(cherry picked from commit f36292dbee )
2025-11-15 22:05:00 -08:00
Fardin Hoque
5f6666fb5a
LLaMA4 LoRA Adapter Enablement ( #28602 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com >
Co-authored-by: Wei Wei <wwei6@meta.com >
(cherry picked from commit 964d65deed )
2025-11-15 21:57:58 -08:00
Nicolò Lucchesi
66a62d73da
[Bugfix][Nixl] Fix kernel physical<>logical block_size issue ( #28677 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
(cherry picked from commit 96b23b8e3b )
2025-11-15 21:57:42 -08:00
Lucas Wilkinson
c505dd6b61
[BugFix] Fix FA3 IMA with FULL_AND_PIECEWISE and cascade attention (default) ( #28702 )
...
(cherry picked from commit db56a59970 )
2025-11-15 21:56:16 -08:00
Nick Hill
f7adf64aac
[BugFix] Fix multi-modal async scheduling race condition ( #28706 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
(cherry picked from commit bc3e43069a )
2025-11-15 21:56:05 -08:00
Jiangyun Zhu
240d6b1758
[Bugfix] fix dots.ocr pp support ( #28705 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
(cherry picked from commit c36bcfe6b3 )
2025-11-15 21:54:30 -08:00
Roger Wang
b315ba9052
[Misc] Update xformers to 0.33.0.post1 ( #28678 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
(cherry picked from commit 0aecd9138f )
2025-11-15 21:54:26 -08:00
Qiu
9b24cf6f47
[bugfix] correct local_chunk_len for DCP in reorg_kvcache with long context ( #28526 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit 968060c15a )
2025-11-15 21:54:19 -08:00
Nick Hill
facbc2c21e
[BugFix] Ensure EngineArgs.create_engine_config is idempotent ( #28515 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
(cherry picked from commit 327c0a9a23 )
2025-11-15 21:54:15 -08:00
Roger Wang
e2fd9a2edf
[Misc] Turn off encoder torch compile by default ( #28634 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
(cherry picked from commit d3387750f1 )
2025-11-15 21:54:05 -08:00
Huy Do
1326f17492
Use official xformers-0.0.33 built for PT 2.9 ( #28600 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
(cherry picked from commit c33b87e777 )
2025-11-15 21:53:04 -08:00
Harry Mellor
caf412e593
Skip models that cannot currently init on Transformers v5 ( #28471 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
(cherry picked from commit 51c599f0ec )
2025-11-15 21:52:58 -08:00
Harry Mellor
a035b5cffb
[CI] Skip "Multi-Modal Models Test (Extended) 3" test that's broken in current Transformers ( #28559 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
(cherry picked from commit a39dd7bb06 )
2025-11-15 21:52:46 -08:00
Harry Mellor
5b4dcecdd7
Remove deprecated fields from CompilationConfig ( #27593 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
(cherry picked from commit a742134cc5 )
2025-11-15 21:48:13 -08:00
Isotr0py
609bb244bd
[Performance] Cache loaded custom logitsprocs to avoid overheads ( #28462 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
(cherry picked from commit 3f770f4427 )
2025-11-15 21:44:19 -08:00
Roger Wang
3a9ea77c35
[Bugfix] Fix max image size for PaddleOCR-VL ( #28442 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
(cherry picked from commit 4fd4b743a2 )
2025-11-15 21:44:19 -08:00
Robert Shaw
28a82bb5e6
[Bugfix] Fix Stream Sync for Shared Expert Overlap ( #28430 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
Co-authored-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
(cherry picked from commit e605e8e323 )
2025-11-15 21:44:19 -08:00
Michael Goin
2a21f3e7c2
Only register rocm_aiter_ops if aiter is found ( #28428 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
(cherry picked from commit f2d9ad0620 )
2025-11-15 21:36:19 -08:00
Lucas Wilkinson
ab625ba2fc
[CI/Test Fix] Fix CP tests on Blackwell ( #28404 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit 39029d5192 )
2025-11-15 21:36:19 -08:00
Wentao Ye
324c8cbd79
[Feature] Refactor batch invariant fp8 DeepGEMM ( #27606 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
(cherry picked from commit 35d801f13f )
2025-11-15 21:35:58 -08:00
Adrian Abeyta
75ecaf48fe
[Bugfix] Ensure calculated KV scales are applied in attention. ( #27232 )
...
Signed-off-by: adabeyta <aabeyta@redhat.com >
(cherry picked from commit a5a790eea6 )
2025-11-15 21:33:58 -08:00
Robert Shaw
30700b1cd7
[CI] Fix Plugin Tests Tests ( #28413 )
...
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
2025-11-10 22:36:11 +00:00
Andrew Xia
4b94ed8f92
[Frontend][2/n] remove empty content from _parse_tool_calls_from_content ( #28331 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-11-10 14:07:49 -08:00
Lucas Wilkinson
6dec9f6109
[BugFix] Fix DeepGEMM over-allocating workspace ( #28254 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-10 17:01:17 -05:00
Wei Wei
bf6a3d0ff5
[Misc] Add more scoping for improved trace ( #28329 )
...
Signed-off-by: Wei Wei <wwei6@meta.com >
2025-11-10 21:03:21 +00:00
Sage Moore
40d33264c6
[Bugfix][EPLB] Disabled shared expert overlap when EPLB is enabled ( #28377 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: Sage Moore <sagemoore@utexas.edu >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-10 20:39:19 +00:00
Jonas M. Kübler
9c84ca8293
[FA/Chore] Bump FA version for FP8 two-level accumulation ( #27889 )
...
Signed-off-by: Jonas Kuebler <kuebj@amazon.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2025-11-10 12:06:04 -08:00
Rémi Delacourt
6d54336ae5
[Bugfix] Fix llguidance backend, rollback when EOS was encountered ( #25905 )
...
Signed-off-by: Rémi Delacourt <remi@mistral.ai >
Signed-off-by: remi <remi@mistral.ai >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-11-10 14:53:32 -05:00
jiahanc
34553b9d27
[Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next ( #27492 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
2025-11-10 12:34:57 -05:00
Varun Sundar Rabindranath
b039bfda8f
[Bugfix] Fix persistent_masked_m_silu_mul_quant tests ( #28366 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-10 09:21:52 -08:00
Cyrus Leung
d0e186c16f
[V0 Deprecation] Remove unused context_len and seq_len from M-RoPE ( #28395 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-11 00:30:06 +08:00
vllmellm
f080a83511
[RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops ( #24490 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-11-10 08:20:53 -08:00
caozuoba
40e2eeeb92
[Kernel] Optimization of the mm_k operator. ( #28280 )
...
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-10 16:03:46 +00:00
zejunchen-zejun
b06b9470ca
[Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2 when running aiter fused moe for fp4 model ( #27474 )
...
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com >
2025-11-10 10:38:56 -05:00
TJian
4673e465ff
Add @tjtanaa to codeowner for ROCm and multi-modal ( #28360 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-11-10 21:39:17 +08:00
Ferrebo
912744d066
[Fix] optimize visual token mask with caching and multi-token support ( #28374 )
...
Signed-off-by: Ferrebo <itachi971009@gmail.com >
Signed-off-by: kebo01 <kebo01@baidu.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-10 13:23:49 +00:00
Yu Jiaqi
15be507c86
[bugfix] fix siglip batch text output error ( #28365 )
...
Signed-off-by: piood <2477084691@qq.com >
2025-11-10 21:21:15 +08:00
Mark McLoughlin
6f7de33bed
[Metrics] Refactor LoRA state tracking ( #26801 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-10 16:34:36 +08:00
Shinichi Hemmi
a98cc35c34
Restore PlaMo2 unit test as pfnet/plamo-2-1b now supports transformers >=4.56 ( #28019 )
...
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com >
2025-11-10 06:50:02 +00:00
Lucas Wilkinson
e8697faf03
[V0 deprecation] Remove no longer used get_metadata_cls ( #28370 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-10 14:32:09 +08:00
Xiake Sun
03fa4d3fb3
[Hardware][AMD][Model] Add Triton MoE tuning support and optimized configs for Qwen3 omni for MI308X ( #28373 )
...
Signed-off-by: Xiake Sun <xiake.sun@amd.com >
Signed-off-by: Xiake Sun <xisun@amd.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-10 04:53:40 +00:00
Varun Sundar Rabindranath
6b2b9fd934
[CI] lora/test_mixtral.py : Add additional expected outputs due to flakiness ( #28322 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-10 10:45:29 +08:00
JartX
c5f685b3ae
[ROCm][Platform] Add RX7900XTX device id in _ROCM_DEVICE_ID_NAME_MAP ( #28279 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
2025-11-09 23:09:36 +00:00
Jiangyun Zhu
c4768dcf47
[Kernel] Fix fused_gdn_gating ( #28343 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-09 14:26:35 -07:00
Zhewen Li
a65a934ebe
[CI/Build] Temporary fix to LM Eval Small Models ( #28324 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-09 21:08:38 +00:00
usberkeley
4a8d6bd168
Fix cu_num_generated_tokens slicing logic in LogprobsLists.slice() method ( #28214 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-11-09 19:11:46 +00:00
Lucas Wilkinson
636efd10a5
[Core] Separate out attention metadata building logic from prepare inputs ( #26764 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-09 13:51:43 -05:00
Nick Hill
289eb6c537
[Core] Simplify async KV output aggregation ( #28327 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-09 09:44:13 -08:00
Nicolò Lucchesi
19d91ece4b
[CI] Fix flaky test_eagle_correctness test ( #28364 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-11-09 16:04:59 +00:00
Jiangyun Zhu
7ae5a5fb11
[Misc] Add some comments in qwen3-next ( #28267 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-08 23:59:24 -08:00
Yong Hoon Shin
de2b78305f
[ROCm] Add env to enable/disable aiter triton gemm ( #28321 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-11-08 22:27:00 -08:00
Ning Xie
e5e9067e61
[Misc] fix typo and add detailed log ( #28178 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-11-09 05:33:46 +00:00
yihong
3a7d580343
fix: close issue 28338 by fixed python version ( #28339 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-11-09 05:07:26 +00:00
Kevin H. Luu
05f8d69077
[chore] Move some wikimedia images to S3 ( #28351 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2025-11-09 01:58:26 +00:00
Mohammad Miadh Angkad
404d7a9d14
[Performance][gpt-oss] Revert gpt-oss max cudagraph size to 1024 ( #28345 )
...
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu >
2025-11-08 15:50:10 -07:00
ElizaWszola
171133f929
[Bugfix] Fix test fused quant layernorm tests ( #27865 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-08 14:31:33 -08:00
Cole Murray
32787d0644
Remove setuptools upper bound constraint (<80) ( #28337 )
...
Signed-off-by: Cole Murray <colemurray.cs@gmail.com >
2025-11-08 22:30:18 +00:00
Benjamin Chislett
975676d174
[Feat] Drop-in Torch CUDA Profiler ( #27841 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-11-08 14:07:37 -08:00
Ev Lacey
77d702a22b
Enhance run_cluster.sh for multi-NIC support ( #28328 )
...
Signed-off-by: Ev Lacey <elacey@nvidia.com >
2025-11-08 22:04:16 +00:00
zhangsicheng5
2108a571d7
[DCP] Support dcp kv_cache interleave size > 1 ( #26696 )
...
Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com >
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com >
Signed-off-by: Qiu <qiuchunshuo@huawei.com >
Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com >
2025-11-09 04:45:27 +09:00
Andy Lo
47604137a2
[Bugfix] Spec decode + structured output + spec model max len edge case ( #28298 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2025-11-08 19:44:25 +00:00
Robert Shaw
26990d25dc
[Bugfix] Update device name for H200 detection ( #28349 )
...
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-11-08 19:01:11 +00:00
Harry Mellor
d9ab1ad9d1
reasoning_content -> reasoning (#27752 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-08 12:15:08 +00:00
22quinn
608bb14462
[Attention] Remove max cudagraph size limit of 992 ( #27840 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-11-07 22:33:27 -08:00
Xiaozhu Meng
4a36681f85
[flashinfer][fix] do not check nvcc availability when using pre-downloaded cubins ( #27990 )
...
Signed-off-by: Xiaozhu <mxz297@gmail.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2025-11-07 22:25:21 -08:00
Abolfazl Shahbazi
d15afc1fd0
Refactor CPU/GPU extension targets for CMake build ( #28026 )
...
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com >
2025-11-08 14:17:35 +08:00
Isotr0py
934a9c3b79
[Model] Consolidate Deepseek-MoE implementation with DeepSeek-v2 ( #28101 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-08 05:01:27 +00:00
gnovack
70af44fd10
[bugfix] support eagle with lora cudagraph specialization ( #28318 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
2025-11-08 03:25:45 +00:00
Aurick Qiao
781f5ebf52
Bump arctic-inference requirement ( #28174 )
...
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-07 18:31:18 -08:00
Michael Goin
0852527647
[Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM ( #28124 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-07 18:20:55 -08:00
Hamid Mukhtar
61d25dc44b
Update gpu.rocm.inc.md to add support for AMD Ryzen AI MAX / AI 300 Series (gfx1151, gfx1150) ( #28308 )
...
Signed-off-by: Hamid Mukhtar <15519013+hammmmy@users.noreply.github.com >
2025-11-08 02:09:21 +00:00
Xiaohong (Sean) Chen
d0c7792004
[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding ( #21068 )
...
Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Danielle Robinson <dcmaddix@gmail.com >
Co-authored-by: Haipeng Li <li2haipeng@gmail.com >
Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com >
2025-11-08 01:58:22 +00:00
Boyuan Feng
b158df2813
remove resolve_op_overloads and use splitting_ops directly ( #28081 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-11-08 01:13:13 +00:00
Kunshang Ji
1aaecda078
[XPU] Enable Expert parallel for MoE models ( #28263 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-08 00:33:11 +00:00
Harry Mellor
811df41ee9
Update Flashinfer from v0.4.1 to v0.5.2 ( #27952 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-07 16:24:42 -08:00
Nick Hill
67a2da890e
[PerfFix] Avoid separate thread for MP executor shm spin (take 2) ( #28319 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-07 22:11:03 +00:00
Nick Hill
da786e339e
[Core] Rework handling of async scheduling config ( #28250 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-07 20:01:23 +00:00
Benjamin Chislett
18903216f5
[Bugfix] Fix and add tests for GptOss reasoning parser ( #28000 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-11-07 19:28:04 +00:00
Simon Mo
d0ceb38ae8
[Build] Fix release pipeline failing annotation ( #28272 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
Signed-off-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-07 10:06:45 -08:00
youkaichao
155ad56d7b
[doc] add guide about the provided PTX was compiled with an unsupported toolchain ( #28305 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-11-08 00:26:34 +08:00
Fadi Arafeh
5fb4137c99
[README] Add Arm CPUs to the list of supported targets ( #28290 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-11-07 15:41:47 +00:00
Nicolò Lucchesi
68a72a5cc1
Revert "[PerfFix] Avoid separate thread for MP executor shm spin ( #28012 )" ( #28289 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-11-07 15:07:01 +00:00
Boyuan Feng
0f872b7977
[Log] update shm wait time msg ( #28255 )
2025-11-07 09:43:30 -05:00
Wentao Ye
4b1ff13221
[Feature] Default ignore_eos True for random dataset ( #28227 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-07 07:35:33 -05:00
Iceber Gu
e0d6b4a867
[CLI] add --max-tokens to vllm complete ( #28109 )
...
Signed-off-by: Iceber Gu <caiwei95@hotmail.com >
2025-11-07 12:21:40 +00:00
Pavani Majety
72b1c2ae2c
[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes ( #27439 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-11-07 04:18:39 -08:00
Lukas Geiger
e0919f331d
[Core][MM] Add mechanism to configure multimodal fields which should stay on CPU ( #28168 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-07 12:14:29 +00:00
Kevin H. Luu
8e19d470af
[fix] Revert "fixing mm placeholder replacement issue with gemma3" ( #28285 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2025-11-07 12:09:09 +00:00
Mengqing Cao
1958bda9b4
[Misc][Model][Refactor] Pass the prefix into Linear layers ( #28259 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-11-07 19:38:38 +08:00
Zhang Xiangze
7bdb42b2f2
[CPU]Avoid repeated random sample compile ( #28260 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com >
2025-11-07 11:03:57 +00:00
汪志鹏
315068eb4a
[FixBug]Aeala/ShareGPT_Vicuna_unfiltered marked as multimodal benchmark ( #28265 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com >
2025-11-07 09:35:22 +00:00
Jialin Ouyang
ccd98b59c1
[Perf] Introduce FlattenLogprobs to store logprobs results to reduce GC overhead ( #28171 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-07 00:27:12 -08:00
Jee Jee Li
21b82f4ea2
[Kernel] LoRA triton kernels support PDL ( #27402 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-07 08:05:48 +00:00
Copilot
a736e5ff77
[CI] Reduce Blackwell Fusion test runtime by filtering tests and only run all tests in nightly ( #28074 )
2025-11-07 15:58:16 +08:00
baonudesifeizhai
9da9208b20
[Bug] Fix missing token_ids for reasoning parser models in chat completions #28246 ( #28256 )
2025-11-07 07:31:58 +00:00
smit kadvani
11fd69dd54
[amd][gptoss] Perf gain because of block alignment ( #28024 )
...
Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com >
Co-authored-by: Smit Shaileshbhai Kadvani <kadvani@meta.com >
2025-11-07 05:27:42 +00:00
Harry Mellor
c0a4b95d64
Fix issues from #28242 ( #28257 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-07 04:23:17 +00:00
Alexis MacAskill
a47d94f18c
Add runai model streamer e2e test for GCS ( #28079 )
...
Signed-off-by: Alexis MacAskill <amacaskill@google.com >
2025-11-07 03:07:54 +00:00
Alex Brooks
e70fbc599b
[CI/Build] Loosen STT LoRA Translate Check (Flaky Test) ( #28247 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
Signed-off-by: Alex Brooks <alex.brooks@ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-11-07 02:51:27 +00:00
Lucas Kabela
4bf56c79cc
[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile ( #28242 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2025-11-07 00:16:03 +00:00
Junhong Liu
59b453eaa2
Speed up mm processor kwargs per request by spliting dynamic and static kwargs ( #26483 )
...
Signed-off-by: Junhong <liujunhong11@huawei.com >
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com >
Co-authored-by: Junhong <liujunhong11@huawei.com >
2025-11-07 07:51:28 +08:00
Eugene Khvedchenya
827e4237bc
Fix failing test for CRadio ( #27738 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: wang.yuqi <noooop@126.com >
2025-11-06 15:32:25 -08:00
Varun Sundar Rabindranath
ca6f755d24
[BugFix] Fix FusedMoELoRA + ModularKernel Integration ( #28237 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-06 22:53:30 +00:00
Matthew Bonanni
ca90f50304
[Test] Add non-MoE DP test coverage ( #28235 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-06 20:59:57 +00:00
Fang Han
da855b42d2
[Doc]: Make extraInit containers fully configurable in helm chart ( #27497 )
...
Signed-off-by: Fang Han <fhan0520@gmail.com >
2025-11-06 20:27:16 +00:00
Aleksandr Malyshev
449de9001a
[ROCm] triton fp8 kernel ( #27058 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com >
2025-11-06 14:46:44 -05:00
Vico Chu
d4aa65c998
[Chore] eliminate duplicated and unconditional object serialization in anthropic messages api ( #27792 )
...
Signed-off-by: Vico Chu <vico24826@gmail.com >
2025-11-06 19:09:19 +00:00
Julien Denize
7a8375f8a0
Add llama 4 scaling support ( #28145 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2025-11-06 18:55:17 +00:00
Andy Lo
5e0c1fe69c
[Structured outputs] Upgrade llguidance to 1.3.0 ( #28039 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-11-06 10:24:47 -08:00
Russell Bryant
4507a6dae4
CODEOWNERS: Add myself as reviewer on security docs ( #28216 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-11-06 17:39:42 +00:00
Roy Wang
d1dd5f53e4
[Frontend] Fix logging format when enable response logging ( #28049 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
2025-11-06 16:25:39 +00:00
StanHatko
e52e4da971
[HARDWARE][CPU] Add Option for Disabling Binding to Specific CPU Cores ( #27953 )
...
Signed-off-by: Stan Hatko <stan_hatko@live.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2025-11-06 23:47:11 +08:00
Milos Puzovic
2176778cd3
[Doc] Add Arm CPUs are on the list of supported targets in vLLM ( #26018 )
...
Signed-off-by: Milos Puzovic <milos.puzovic@arm.com >
2025-11-06 15:30:26 +00:00
Eric Yue
0370679ce9
[Kernel][Model] Tune fused_moe Triton configs for MiniMax-M2 on H100 ( #28200 )
...
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com >
2025-11-06 07:29:46 -08:00
Harry Mellor
8816e375d3
[Docs] Switch to directory style URLs ( #28058 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-06 07:06:33 -08:00
Michael Goin
f32229293e
Disable nm-testing models with issues in CI ( #28206 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-06 06:19:07 -08:00
xiangze-arm
c757a15f0f
[CPU]Improve cpu fused moe perf ( #27244 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com >
2025-11-06 11:04:18 +00:00
Chauncey
59a50afa08
[Frontend] OpenAI Responses API supports Tool/Function calling - non-harmony ( #26874 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-06 10:40:03 +00:00
courage17340
981cadb35c
[Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty ( #28181 )
...
Signed-off-by: courage17340 <courage17340@163.com >
2025-11-06 17:52:13 +08:00
wangxiyuan
c3ee80a01a
[V0 deprecation]clean up is_v1_supported_oracle ( #28116 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-06 16:05:32 +08:00
Aditya Tewari
3755c14532
[CPU] Enable torch profiling ( #28130 )
...
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com >
2025-11-06 07:32:05 +00:00
Seungduk Kim
201dc98acc
Fix hard-coded parameter name in gemma3n.py ( #27946 )
...
Signed-off-by: Seungduk Kim <seungduk.kim@yanolja.com >
Signed-off-by: Biswa Panda <biswa.panda@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-11-05 23:07:36 -08:00
Julien Denize
a404e2c0f1
Patch Mistral Tokenizer ( #28146 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2025-11-06 06:43:16 +00:00
Xiaozhu Meng
e31946f86e
[flashinfer] fix FI all2all with FI cutlass moe ( #28166 )
...
Signed-off-by: Xiaozhu <mxz297@gmail.com >
2025-11-06 05:52:16 +00:00
gmagogsfm
bde5039325
[CI] Add compile/test_multimodal_compile.py to CI ( #28151 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-06 05:41:47 +00:00
Jacob Zhong
d72299d47b
Make the cv2 dependency optional ( #27780 )
...
Signed-off-by: Jacob <cmpute@qq.com >
2025-11-06 05:08:55 +00:00
Lukas Geiger
80679f108f
[Core][MM] Use non-blocking CPU-GPU copy of multimodal data ( #28141 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-06 04:05:12 +00:00
Isotr0py
43ecd0a900
[Chore] Clean up deepseek v2/v3 config copy ( #28055 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-06 03:46:30 +00:00
Chauncey
07d614511f
[Misc] Remove the duplicate code ( #28111 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-05 21:07:47 -05:00
Vadim Gimpelson
f948ab6945
[CI Failure] nm-testing/Qwen2-0.5B-Instruct-FP8-SkipQKV was removed from HF. Skip it in tests ( #28170 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-06 01:22:13 +00:00
Wentao Ye
d71af5f502
[Feature] Enable TP + EP shared_experts overlap with router, 3.7% E2E performance improvement ( #28164 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-05 17:21:08 -08:00
Wentao Ye
90189c71a9
[Bug] Fix env string "0" same to True ( #28159 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-05 17:04:20 -08:00
Wentao Ye
d79d9f0780
[Bug] Fix cpu disable shared_experts VLLM_DISABLE_SHARED_EXPERTS_STREAM ( #28157 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-05 17:03:09 -08:00
Vadim Gimpelson
b6a248bdd7
[PERF] Decouple projections from GDN custom op. Attempt 2 ( #28083 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-05 17:01:12 -08:00
Dayeol Lee
1767658559
[Debugging] Add annotation for easier trace analysis ( #22496 )
2025-11-05 16:52:52 -08:00
Kuntai Du
efe73e9b57
[Core][Hybrid allocator + connector 2/n] Unify remove_skipped_blocks by get_last_useful_token ( #25431 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-11-06 00:12:00 +00:00
Zhewen Li
0b8e871e5e
[CI/Build] Fix test_defaults_with_usage_context in AMD CI ( #27926 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-05 15:40:24 -08:00
Zhewen Li
5ee93a5956
[CI/Build] Update checking logic in cutlass_group_gemm_supported ( #27948 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-05 15:40:10 -08:00
Snehlata
e15601789b
[Feature]: Add corrupted request metric to V1 metrics system. ( #27306 )
...
Signed-off-by: atalhens <sneh.lata@nutanix.com >
2025-11-05 13:45:29 -08:00
Richard Zou
65ac8d8dc4
[Docs] Add guide to debugging vLLM-torch.compile integration ( #28094 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-11-05 21:31:46 +00:00
Isotr0py
ffb08379d8
[Chore] Remove Nemotron-Nano-VL config copy ( #28126 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-05 20:06:45 +00:00
R3hankhan
e04492449e
[Hardware][IBM Z] Optimize s390x Dockerfile ( #28023 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2025-11-05 11:25:44 -08:00
Michael Yao
518ec6b722
[Docs] Clean up README_TUNING.md ( #28088 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-11-05 19:01:34 +00:00
wang.yuqi
802748bddb
[Bugfix] Fix Qwen3-Reranker-8B load ( #28117 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-11-05 18:33:50 +00:00
Paul Zhang
faedbb4d4f
[Feature] Extend batch invariant torch.compile to B200 ( #27856 )
...
Signed-off-by: PaulZhang12 <paulzhan@fb.com >
2025-11-05 10:04:49 -08:00
Samuel Shen
40db194446
[CI]: Add LMCacheConnector Unit Tests ( #27852 )
...
Signed-off-by: Samuel Shen <slshen@uchciago.edu >
Co-authored-by: Samuel Shen <slshen@uchciago.edu >
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu >
2025-11-05 09:45:57 -08:00
Chen Zhang
c765f0b443
[FlashInfer] Avoid FlashInfer block_size 16 + head_size 256 on blackwell ( #27994 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-11-05 09:25:32 -08:00
gmagogsfm
002b07c4b2
[Bugfix] vLLM should check Inductor config for compile cache enablement status ( #27637 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2025-11-05 12:22:44 -05:00
Walter Beller-Morales
752ddeacaa
[Core] add support for reasoning parser plugins ( #28075 )
...
Signed-off-by: walter beller-morales <walter.beller.morales@gmail.com >
2025-11-06 01:15:06 +08:00
Jiangyun Zhu
c18f88c6ca
[Kernel] Fuse computation of g and beta for Gated Delta Net ( #28095 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-05 09:14:55 -08:00
Jiaju Zhang
6fd0df8132
[misc] add vLLM Beijing Meetup ( #28127 )
...
Signed-off-by: Jiaju Zhang <jjzhang@redhat.com >
2025-11-05 17:12:59 +00:00
Isotr0py
3f5a4b6473
[Bugfix] Validate custom logits processor xargs for online serving ( #27560 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-05 16:53:33 +00:00
Pleaplusone
6cae1e5332
[ROCm][MLA] Support block-size > 1 for AITER MLA backend ( #27224 )
...
Signed-off-by: ganyi <ygan@amd.com >
Co-authored-by: wuhuikx <hattie.wu@amd.com >
2025-11-05 10:43:02 -05:00
Alexei-V-Ivanov-AMD
80c9275348
Enabling cooperative multi-gpu tests on multi-gpu nodes ( #27986 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-11-05 10:35:49 -05:00
Ilya Markov
e50c454672
[BugFix] Support EP/DP + EPLB with MTP ( #25311 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2025-11-05 15:22:17 +00:00
Chen Zhang
5d16d0fa62
[DCP] check return_lse for all layers in dcp ( #27929 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-11-05 22:27:25 +08:00
bigmoyan
0606bea2b6
add kimi reasoning parser ( #28128 )
...
Signed-off-by: wangzhengtao <wangzhengtao@msh.team >
Co-authored-by: wangzhengtao <wangzhengtao@msh.team >
2025-11-05 21:48:33 +08:00
Frost Mitchell
6e97eccf5d
[XPU] Enable custom routing functions in IPEX for Llama4 ( #28004 )
...
Signed-off-by: frost-intel <frost.mitchell@intel.com >
2025-11-05 13:39:57 +00:00
Boyuan Feng
6ab183813c
[Graph Partition][Cache] Use inductor partition ops config ( #27702 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-11-05 13:04:48 +00:00
amirkl94
6b7a81185d
Bugfix: Cutlass FP8 FusedMoE bad scaling factors ( #27255 )
...
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-05 06:06:06 -05:00
Eric Yue
b57789b62b
Fix excessive logging noise by reducing the log level of the MinimaxM2ToolParser import success message ( #27635 )
...
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com >
2025-11-05 19:03:51 +08:00
Chauncey
377061d481
[Misc] fix import error for DeepSeekR1ReasoningParser ( #28114 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-05 19:02:32 +08:00
Kuntai Du
86dca07d9b
[Hybrid allocator + kv connector] revert connector test changes related to hybrid allocator ( #28011 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-11-05 10:36:31 +00:00
Qiu
16b37f3119
[bugfix] fix wrong dcp_local_seq_lens calc ( #27518 )
...
Signed-off-by: Qiu <qiuchunshuo@huawei.com >
2025-11-05 17:58:13 +08:00
Chauncey
0976711f3b
[Refactor] to simplify and extract the shared logic between chat completion and responses ( #27961 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-05 15:46:39 +08:00
Chauncey
e261d37c9a
[Refactor] Lazy-loaded reasoning_parser ( #28092 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-05 15:37:02 +08:00
Alex Brooks
b7cbc25416
[Model, Core] Support Granite Speech & LoRA for STT ( #24455 )
2025-11-05 08:33:48 +01:00
Lucas Wilkinson
d43ad5a757
[BugFix] Fix DCP Assert (AssertionError: DCP not support reorder_batch_threshold > 1 now.) ( #28100 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-05 14:54:43 +08:00
Isotr0py
0ff05e3770
[Bugfix] Fix encoder-only model support for transformers backend ( #28021 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-04 22:24:41 -08:00
wangxiyuan
428bc7bf1c
[V0 deprecation] Remove VLLM_USE_V1 usage in most modules ( #27955 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-04 20:51:16 -08:00
Zhewen Li
878fd5a16f
[CI/Build] Enable some fixed tests in AMD CI ( #28078 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-05 03:15:59 +00:00
Kunshang Ji
18b39828d9
[XPU] Add gpt-oss model support for Intel GPU ( #27786 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-05 02:17:23 +00:00
tou
4ea62b77f5
[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 ( #27740 )
2025-11-05 09:25:09 +08:00
Vadim Gimpelson
d4e547bb7e
Revert "[PERF] Decouple projections from GDN custom op" ( #28080 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-04 15:58:23 -08:00
Aleksandr Malyshev
2d977a7a9e
[ROCm] gemm_a16w16 upstreaming ( #26969 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
2025-11-04 16:01:00 -05:00
Chenheli Hua
1fb4217a05
[Multimodal] Make MediaConnector extensible. ( #27759 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
2025-11-04 18:28:01 +00:00
nadavkluger
611c86ea3c
Added disable rule to track files under benchmarks/lib ( #28048 )
...
Signed-off-by: Nadav Kluger <nadav.k@fmr.ai >
2025-11-04 18:18:43 +00:00
Pleaplusone
dc937175d4
[ROCm][Perf] New design on ROCm AITER MHA backend Implementation ( #25763 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-11-04 18:05:33 +00:00
Harry Mellor
2f1cc8cef1
Remove deprecated --rope-scaling and --rope-theta ( #28006 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-04 18:01:56 +00:00
Nick Hill
938a81692e
[AsyncScheduling] Don't schedule past request max_tokens ( #27922 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-04 17:06:28 +00:00
Nick Hill
c9f66da8fd
[PerfFix] Avoid separate thread for MP executor shm spin ( #28012 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-04 08:33:55 -08:00
yt0428
05cae69f0f
[model] Add support for openPangu_Ultra_MoE ( #27521 )
...
Signed-off-by: yuantao <2422264527@qq.com >
Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-04 08:17:20 -08:00
Vadim Gimpelson
5fd8f02ea9
[PERF] Decouple projections from GDN custom op ( #27512 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-04 08:11:41 -08:00
lyrisz
97e3dda84b
[Perf] SM100 - add swap AB optimization to CUTLASS FP8 GEMM ( #27284 )
...
Signed-off-by: Faqin Zhong <faqin.zhong@gmail.com >
Co-authored-by: Faqin Zhong <zhofaqin@amazon.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-04 07:49:25 -08:00
Nick Hill
5a0a6dfd55
[BugFix] Fix incorrect preallocated sampled_token_ids tensor size ( #28025 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-04 07:38:16 -08:00
bnellnm
938772af03
[Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. ( #27123 )
2025-11-04 21:59:45 +08:00
tomeras91
e4ee658672
[Model] add optimal triton fused moe configs for NemotronH MoE ( #27967 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2025-11-04 12:59:43 +00:00
tomeras91
77f8001f53
[Model][Bugfix] fix pipeline parallelism support for NemotronH ( #27968 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2025-11-04 12:28:36 +00:00
Zhuohan Li
300a265978
[Core] Enable StatLogger in LLMEngine ( #28020 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-11-04 04:13:35 -08:00
Jerry Zhang
03c4c4aa9d
Support using Int4PreshuffledTensor after loading ( #26066 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
2025-11-04 06:00:57 -05:00
yugong333
2ec401bc39
Load tuned fused_moe_lora shrink and expand kernel configs separately ( #27435 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-04 18:27:35 +08:00
Varun Sundar Rabindranath
4022a9d279
[BugFix][Performance] Restore flashinfer autotuning for all scenarios ( #27904 )
2025-11-04 15:56:21 +08:00
Zhewen Li
53f6e81dfd
[CI/Build] Fix OpenAI API correctness on AMD CI ( #28022 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-04 07:20:50 +00:00
CSWYF3634076
43a6acfb7d
[Model] fix ernie45 reasoning_parser ( #27973 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-11-04 07:16:46 +00:00
Mark McLoughlin
58279c60b5
[KV Connector] Make KVCacheConfig an explicit constructor argument ( #27887 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-03 23:00:49 -08:00
Zhewen Li
2f84ae1f27
[CI/Build] Update LM Eval Version in AMD CI ( #27944 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-04 06:36:40 +00:00
xiangze-arm
f32cbc9a0c
[CPU]Improve dynamic 4bit moe performance ( #27240 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com >
2025-11-04 06:33:23 +00:00
Wentao Ye
7e4be74104
[Bug] Batch invariant: Fix flash attn MLA RuntimeError: scheduler_metadata must have shape (metadata_size) ( #27884 )
2025-11-04 14:05:55 +08:00
Mark McLoughlin
380ba6816d
[Metrics] Enable sleep state metric outside of dev mode ( #27867 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-03 20:35:36 -08:00
liuzhenwei
14a125a06d
[NIXL][XPU] Pin NIXL version to 0.7.0 ( #27849 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
2025-11-04 03:28:35 +00:00
Chauncey
c02fccdbd2
[Refactor] Lazy import tool_parser ( #27974 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-04 10:10:10 +08:00
li2haipeng
6ddae74054
[LoRA] Lora shrink swizzle ( #27694 )
...
Signed-off-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com >
Signed-off-by: Haipeng Li <li2haipeng@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-04 09:30:20 +08:00
vllmellm
b13a447546
[Bugfix][ROCm] Fix ViT rotary embeddings for torch.compile compatibility on ROCm ( #27748 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-11-03 17:12:19 -08:00
QiliangCui
7956b0c0bc
Remove the tpu docker image nightly build. ( #27997 )
...
Signed-off-by: Qiliang Cui <derrhein@gmail.com >
2025-11-04 00:35:54 +00:00
Tyler Michael Smith
3758757377
[Bugfix] Fix MoE Routing Simulation ( #28002 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-11-03 22:26:49 +00:00
Hank_
ccd3e55e51
[Bugfix][plugin] fla crash on plugin ( #27322 )
2025-11-04 05:27:03 +08:00
Matthew Bonanni
01baefe674
Add TP parameter to attention tests ( #27683 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-03 13:04:40 -08:00
Ning Xie
786030721e
[Docs] add runai_streamer_sharded to LoadConfig ( #27937 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-11-03 20:35:16 +00:00
Matthew Bonanni
145c00a4d3
[Bugfix] change FlashMLA reorder_batch_threshold ( #27777 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-03 15:17:10 -05:00
Lucas Kabela
55011aef24
[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile ( #27764 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2025-11-03 11:12:15 -08:00
Sophie du Couédic
a4398fbb5e
[Feature][Benchmarks] Support inf burstiness ( #26941 )
...
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com >
2025-11-03 18:33:17 +00:00
Aurick Qiao
2c19d96777
[Spec Decode] Integrate Suffix Decoding from Arctic Inference ( #25784 )
...
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com >
2025-11-03 09:23:31 -08:00
Lucas Wilkinson
4bc400f47e
[CI/Testing] Add basic single node dual batch overlap test ( #27235 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-03 17:00:46 +00:00
ahao-anyscale
cac4c10ef0
[BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile ( #27616 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2025-11-03 11:13:51 -05:00
pwschuurman
f7d2946e99
[Bugfix] Skip gs:// model paths for speculator detection ( #27846 )
...
Signed-off-by: Peter Schuurman <psch@google.com >
2025-11-03 14:31:03 +00:00
gnovack
294c805f1d
Early exit for MoE LoRA kernels ( #27131 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-03 20:22:17 +08:00
zhang-prog
40b69e33e7
[Model] Add PaddleOCR-VL Model Support ( #27758 )
...
Signed-off-by: zhangyue <zhangyue66@baidu.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: zhangyue66 <zhangyue66@baidu.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-03 19:04:22 +08:00
Jee Jee Li
32257297dd
[CI/Build] Remove the flaky gpt-oss lora test ( #27966 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-03 16:50:06 +08:00
Misha Efimov
ba464e6ae2
Add ORCA endpoint load metrics support ( #24905 )
...
Signed-off-by: Misha Efimov <mef@google.com >
2025-11-03 08:21:31 +00:00
Kunshang Ji
7f4bdadb92
[XPU]Refine Dockerfile.xpu, avoid oneccl dependency issue ( #27964 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-03 07:36:59 +00:00
Rémi Delacourt
cec7c28833
[Bugfix] Padded Eagle Specdec with Chunked Prefill ( #26263 )
...
Signed-off-by: Rémi Delacourt <remi@mistral.ai >
Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com >
Signed-off-by: remi <remi@mistral.ai >
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com >
2025-11-03 02:22:46 -05:00
Thomas Parnell
18961c5ea6
[Hybrid] Pass kernel block size to builders ( #27753 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-11-03 05:48:03 +00:00
Sungyoon Jeong
470ad118b6
[Frontend] Align finish_reason when tool is called with OpenAI ( #25054 )
...
Signed-off-by: Sungyoon Jeong <sungyoon.jeong@furiosa.ai >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-11-03 04:21:18 +00:00
Biswa Panda
1bf43ae35d
[BugFix][LoRA] use adapter_id instead of id field of lora_request ( #27728 )
...
Signed-off-by: Biswa Panda <biswa.panda@gmail.com >
2025-11-03 10:08:08 +08:00
Vensen
0ce743f4e1
Fix(llm): Abort orphaned requests when llm.chat() batch fails Fixes #26081 ( #27420 )
...
Signed-off-by: vensenmu <vensenmu@gmail.com >
2025-11-02 16:24:01 +00:00
Cyrus Leung
6c317a656e
[Misc] Provide Siglip2 chat template ( #27939 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-02 13:42:38 +00:00
Asaf Joseph Gardin
00b31a36a2
[V1] [Hybrid] Mamba1 Automatic Prefix Caching ( #26377 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
2025-11-02 04:16:23 -08:00
Julien Denize
73444b7b56
Performance fix MistralTokenizer: cache special ids and tokens ( #27925 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
2025-11-02 08:48:33 +00:00
Cyrus Leung
853a8eb53b
[Bugfix] Fix Qwen Omni audio inference ( #27920 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-02 05:06:05 +00:00
Ben Browning
758ea2e980
[CI/Build] Fix flaky test_transcription_validation.py::test_basic_audio_gemma ( #27924 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2025-11-02 03:45:02 +00:00
Yue Zhang
685c99ee77
[KV offload] Offloading connector async scheduling support ( #27648 )
...
Signed-off-by: KevinCheung2259 <2651309292@qq.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-11-01 21:08:56 +00:00
Benjamin Bartels
1e88fb751b
Adds anthropic /v1/messages endpoint to openai api_server ( #27882 )
...
Signed-off-by: bbartels <benjamin@bartels.dev >
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev >
2025-11-01 12:45:42 -07:00
Nick Hill
c2ed069b32
[BugFix] Fix mixed penalties batch with async scheduling ( #27910 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-01 10:51:24 -07:00
wenxindongwork
af6e19f50f
[Core][TPU] Support TPU Data Parallalism ( #27365 )
...
Signed-off-by: wenxindongwork <wenxindong@google.com >
2025-11-01 17:14:44 +00:00
Cyrus Leung
99d69af9ec
[Bugfix] Python 3.10 compatibility for Self ( #27918 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-01 15:28:54 +00:00
Haco
d811b442d3
[Bugfix] DeepSeek V3.2 MTP metadata & CUDA graph issues ( #26779 )
...
Signed-off-by: xiaohajiayou <923390377@qq.com >
2025-11-01 10:52:43 -04:00
wangxiyuan
30a14b034f
[V0 deprecation] Remove VLLM_USE_V1 usage in platform and v1 module ( #27798 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-01 10:17:45 +00:00
Harry Mellor
799ce45cc1
[Docs] Mock all imports for docs ( #27873 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-01 10:02:23 +00:00
ai-jz
2c0c7c39bd
feat(benchmarks): support HF model names in multi-turn benchmark ( #27850 )
2025-11-01 08:04:52 +00:00
Yihua Cheng
e675118849
[Add] cmdline argument parsing for KV cache offloading modules ( #27621 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-01 07:17:07 +00:00
TJian
e2347dbf58
[Bugfix] [Model] Missing MRoPE function definition from KeyeForConditionalGeneration ( #27895 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-11-01 13:45:23 +08:00
Cyrus Leung
879a06579e
[CI/Build] Bump transformers version ( #27528 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-31 22:11:07 -07:00
yugong333
29de3cdee4
Adding SplitK in fused_moe_lora kernel ( #27818 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-01 12:55:46 +08:00
Yan Ma
7e2729b57e
[Multimodal][XPU]Enable vision attn backend for xpu platform ( #27525 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: Yejing Lai <yejing.lai@intel.com >
Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-01 04:45:02 +00:00
Jee Jee Li
3a5de7d2d6
[Bugfix] Fix KDA output ( #27905 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-01 11:54:36 +08:00
Jee Jee Li
bc4486d609
[Kernel] Enable FusedMoEModularKernel support bias ( #27754 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-01 02:05:12 +00:00
Nick Hill
0cdbe7b744
[Core] Async scheduling + structured outputs compatibility ( #26866 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-01 00:35:04 +00:00
Chen Zhang
df334868ca
[Hybrid] A simpler algorithm to find kernel_block_size ( #26476 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-31 21:30:28 +00:00
Bram Wasti
0e0a638c3b
Batch invariance doc ( #27839 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Bram Wasti <bwasti@fb.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-31 17:22:19 -04:00
Matthew Bonanni
f29aeb5a25
Add FLASHINFER_MLA to test_mla_backends and add B200 CI run ( #27663 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-31 11:12:19 -07:00
Vinay R Damodaran
5e8862e9e0
[Feature] Pydantic validation for scheduler.py and structured_outputs.py ( #26519 )
...
Signed-off-by: Vinay Damodaran <vrdn@hey.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-31 18:05:50 +00:00
Nick Hill
9e5bd3076e
[Cleanup] Remove no-longer-used SpeculativeConfig.enable_chunked_prefill ( #27826 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-31 10:57:45 -07:00
Shu Wang
fc16f1c477
Flashinfer_CUTLASS_MOE fuses quantization for TP ( #27223 )
...
Signed-off-by: Shu Wang. <shuw@nvidia.com >
2025-10-31 17:54:29 +00:00
ZiTian Zhao
bc306fe5e9
fix incorrect type annotation in KimiMLP ( #27885 )
...
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com >
2025-10-31 17:38:02 +00:00
Chenguang Zheng
103a468bbf
[bugfix] Missing cached item in beam search ( #27874 )
...
Signed-off-by: fake0fan <645327136@qq.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-31 17:34:27 +00:00
Rob Mulla
70bfbd7b16
Docs update tpu install instructions ( #27824 )
...
Signed-off-by: Rob Mulla <rob.mulla@gmail.com >
Signed-off-by: Rob Mulla <RobMulla@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-31 10:29:55 -07:00
GuanLuo
d6517be3cd
[Bugfix] Missing NIXL metadata for handshake initialization if instance spans multi-node ( #26338 )
...
Signed-off-by: Guan Luo <gluo@nvidia.com >
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com >
Signed-off-by: Guan Luo <41310872+GuanLuo@users.noreply.github.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-10-31 10:16:00 -07:00
Isotr0py
7e06c40e63
[Bugfix] Fix broken MRoPE for GLM-4.1V/GLM-4.5V ( #27860 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-31 17:04:51 +00:00
Madeesh Kannan
675704ac01
[Bugfix] Allow 64-bit integer values for LoRA IDs to avoid overflow/truncation ( #27876 )
...
Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com >
2025-10-31 16:58:42 +00:00
Jee Jee Li
0384aa7150
[CI/Build] Add gpt-oss LoRA test ( #27870 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-31 22:17:21 +08:00
Jiangyun Zhu
3857eb8725
[Perf] Decouple torch op from GDA to leverage torch.compile ( #27871 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-31 21:35:52 +08:00
Huamin Li
933cdea440
[BugFix] Don’t compute reorder threshold when there are no attention groups ( #27861 )
2025-10-31 11:36:18 +00:00
Isotr0py
3933f18a5e
[Bugfix] Avoid too small block m/n for FlexAttention kernel option ( #27853 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-31 19:33:12 +08:00
toncao
e5ef4dfc11
[Kimi-Linear] Correct prefixes and add compatibility to AWQ quants ( #27834 )
...
Signed-off-by: toncao <cpatonn@gmail.com >
Co-authored-by: toncao <cpatonn@gmail.com >
2025-10-31 17:36:37 +08:00
Akash kaothalkar
36960501d3
[Hardware][Powerpc] Fix VLLM_CPU_OMP_THREADS_BIND="auto" low CPU utilization for Power ( #27734 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-10-31 07:45:26 +00:00
Seiji Eicher
b2e65cb4a7
[benchmark] Make request IDs unique across clients by default ( #27723 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2025-10-30 17:40:35 -07:00
Wentao Ye
2bf0bcc1fc
[CI Test] Add Scheduled Integration Test ( #27765 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 17:29:26 -07:00
Jakub Sochacki
697f507a8e
[CI/Build][Intel] Enable performance benchmarks for Intel Gaudi 3 ( #26919 )
...
Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl >
2025-10-31 07:57:22 +08:00
Matthew Bonanni
d5d2a0fe74
[Misc] Make all tool scripts executable ( #27831 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-30 23:46:02 +00:00
Nick Hill
c9791f1813
[BugFix] Fix broken import in initialize_ray_cluster() ( #27838 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-30 16:26:13 -07:00
Paul Zhang
e7acb20076
[Feature] Batch invariant torch.compile ( #27660 )
...
Signed-off-by: PaulZhang12 <paulzhan@fb.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-30 13:11:29 -07:00
Jialin Ouyang
4b68c4a55b
[Core][Perf] Only invoke save_new_computed_blocks when computed blocks are not empty ( #27799 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-30 19:47:30 +00:00
Wentao Ye
a8141fa649
[Refactor] Remove VLLM_DEEPEP_LOW_LATENCY_ALLOW_NVLINK ( #27750 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 15:32:39 -04:00
Sumanth R Hegde
4917002523
[Fix] Skip record_sleep_state logic in PrometheusStatsLogger if not in dev mode ( #27789 )
...
Signed-off-by: SumanthRH <sumanthrh99@gmail.com >
2025-10-30 19:26:27 +00:00
cong-meta
a2981c4272
[EP/DP][API Server] Enable DP-aware routing in OpenAI API requests ( #24945 )
...
Co-authored-by: Cong Chen <prowindy@gmail.com >
2025-10-30 12:10:16 -07:00
Jialin Ouyang
4574d48bab
[Core][Bookkeeping] Update cu_num_accepted_tokens for all req_index ( #27629 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-30 11:52:36 -07:00
Tyler Michael Smith
ab98f6556f
[Bugfix] Fix 2 precommit issues - (mamba_block_size, kv_cache_config) ( #27811 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-10-30 11:52:18 -07:00
Roger Meier
2918c1b49c
[Model] Use the same fused_moe configs for all H200 devices ( #23642 )
...
Signed-off-by: Roger Meier <r.meier@siemens.com >
2025-10-30 17:36:56 +00:00
Mengqing Cao
1004205795
[MTP] Refactor mtp predictor to avoid d2h operation ( #27643 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-10-30 17:27:39 +00:00
Huy Do
ba33e8830d
Reapply "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" ( #27768 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-10-30 10:22:30 -07:00
Kebe
33a0ea5f32
[Docs] add Shanghai Meetup - 2025/10 ( #27545 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: esmeetu <jasonailu87@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: esmeetu <jasonailu87@gmail.com >
2025-10-31 00:33:13 +08:00
Ilya Markov
60f76baa66
[Misc] Replace CUDA_VISIBLE_DEVICES in DP with torch.cuda.set_device for device selection on cuda-like devices ( #27564 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-10-30 11:41:44 -04:00
Varun Sundar Rabindranath
e5e076cad7
[BugFix] Stopgap - Flashinfer Autotuner + GPT-OSS + DP/TP ( #27762 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-30 08:24:31 -07:00
Li, Jiang
eebf00cb0c
[Bugfix][CPU] Fix MRoPE dispatch on the CPU backend ( #27800 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-10-30 15:12:05 +00:00
Fan Yin
9956aae4ea
[Model][Ouro] Support Ouro Model ( #27794 )
...
Signed-off-by: yinfan.1024 <yinfan.1024@bytedance.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-30 22:34:41 +08:00
Zhewen Li
0fe0140408
[KV offload] Enable CPU KV offload on CUDA alike Platforms ( #27770 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-30 22:10:29 +08:00
Zhiyuan Li
4e68cc9b6a
[Model] Introduce Kimi Linear to vLLM ( #27809 )
...
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn >
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com >
2025-10-30 21:02:27 +08:00
Huamin Li
1994de99ea
[CI Failure] Fix test_kv_cache_model_load_and_run ( #27717 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-30 12:27:53 +00:00
wang.yuqi
4464723f22
[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. ( #25524 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-30 12:13:05 +00:00
Sairam Pillai
74374386e2
[Bugfix] Improve GPU validation logging in Ray fallback scenarios ( #25775 )
...
Signed-off-by: Sairam Pillai <sairam.pillai61@gmail.com >
2025-10-30 11:57:59 +00:00
Wentao Ye
c01f6e525f
[CI] Fix mypy for vllm/v1/core and vllm/v1/engine ( #27108 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 11:32:17 +00:00
Huamin Li
c7d2a554ba
[CI Failure] fix test_default_mm_loras ( #27795 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-30 18:13:03 +08:00
wangxiyuan
af826e0820
[V0 deprecation] Remove VLLM_USE_V1 usage in config module ( #27784 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-30 09:42:49 +00:00
Zhewen Li
e806178d2a
[BugFix][VL] Fix FA selection on Qwen2.5-VL ( #27790 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-30 07:54:44 +00:00
Huamin Li
5be1bed790
[CI/Build]Add eval config for Qwen3-235B-A22B-Instruct-2507-FP8 ( #27113 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-30 07:50:56 +00:00
yitingdc
31b55ffc62
use stringData in secret yaml to store huggingface token ( #25685 )
...
Signed-off-by: yiting.jiang <yiting.jiang@daocloud.io >
2025-10-30 00:47:36 -07:00
Bram Wasti
ded8ada86a
Add more dims for batch invariant shims ( #27489 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Bram Wasti <bwasti@fb.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-30 05:28:45 +00:00
Kuntai Du
8bff831f0a
[Benchmark] Cleanup deprecated nightly benchmark and adjust the docstring for performance benchmark ( #25786 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-10-30 04:43:37 +00:00
Lucas Wilkinson
b5d70751d8
[BugFix] Reordering extend logic fix ( #27739 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-29 21:39:34 -07:00
Fardin Hoque
b8c48c5d72
kernels/moe test pruning ( #27053 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-30 12:10:34 +08:00
Benjamin Bartels
17d055f527
[Feat] Adds runai distributed streamer ( #27230 )
...
Signed-off-by: bbartels <benjamin@bartels.dev >
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev >
Co-authored-by: omer-dayan <omdayan@nvidia.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-29 21:09:10 -07:00
Nick Hill
2ce5c5d3d6
[BugFix] Handle unscheduled requests properly when async scheduling ( #27756 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-29 21:04:25 -07:00
Kunshang Ji
b5bae42f91
[XPU] Update latest IPEX 2.8 release ( #27735 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-10-30 11:17:13 +08:00
Chen Zhang
d7fb10c574
[Bugfix] mamba-block-size is set for vision language model ( #27773 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-29 19:39:57 -07:00
Yan Ma
b798e39f93
[XPU][bugfix] fix rope for llama4 and deepseek ( #25145 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2025-10-30 09:43:13 +08:00
Chenheli Hua
48eb8eba58
[Temp fix] Disable torch.compile for Qwen2.5 VL's VisionBlock temporarily. ( #27760 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 23:17:48 +00:00
Wentao Ye
b5d90f7400
[Bug] Fix DBO IMA issue for DeepEPHT ( #27666 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-29 16:28:27 -04:00
Nick Hill
d4aa144343
[BugFix] Fix handling of resumed reqs in SharedStorageConnector ( #27719 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-29 20:16:52 +00:00
Wentao Ye
fcb1d570bb
[Bug] Fix DeepEP low latency assert self.batched_router_logits.size(-1) == full_router_logits.size(-1) Bug ( #27682 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-29 14:50:39 -04:00
Nicolò Lucchesi
accb8fab07
[KVConnector] Add metrics to Prometheus-Grafana dashboard ( #26811 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
2025-10-29 18:44:49 +00:00
Wentao Ye
5b0448104f
[Bug] Raise error explicitly if using incompatible backend ( #27424 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-29 13:29:20 -04:00
22quinn
f7a6682872
[CI/Build] Test torchrun with 8 cards ( #27548 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-10-29 10:26:06 -07:00
Boyuan Feng
a9fe0793f2
use_aot_compile should respect VLLM_DISABLE_COMPILE_CACHE (#27698 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-29 17:08:54 +00:00
JartX
7568a282b9
[FIXBUG] Qwen3VL hallucinations without Contiguous on Torch.SDPA ( #27744 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-29 16:55:35 +00:00
Braulio Dumba
1da3309ace
[Core] Exposing engine sleep & wake_up state as prometheus metrics ( #24176 )
...
Signed-off-by: Braulio Dumba <Braulio.Dumba@ibm.com >
2025-10-29 09:32:01 -07:00
Wentao Ye
5522fb274b
[Chore] Optimize P2PNCCLEngine http_address ( #27488 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 00:05:09 +08:00
Nicolò Lucchesi
0f95a1c3f2
[CI] Fix flaky test_two_responses_with_same_prev_id test ( #27745 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-29 15:10:35 +00:00
Xiake Sun
ded24e3e54
[ROCm][Platform] Add MI308X device id in _ROCM_DEVICE_ID_NAME_MAP ( #27623 )
...
Signed-off-by: Xiake Sun <xiake.sun@amd.com >
2025-10-29 14:44:03 +00:00
Roger Young
d6704dd099
Fix MiniMax-M2 rmsnorm precision and remove useless code ( #27627 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-10-29 21:01:05 +08:00
Cyrus Leung
ecca3fee76
[Frontend] Add vllm bench sweep to CLI ( #27639 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-29 05:59:48 -07:00
Zhewen Li
9a0d2f0d92
[CI/Build] Skip cpu offloading test on AMD ( #27690 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-29 12:55:51 +00:00
Isotr0py
ad3ec89532
[VLM] Add Qwen3-VL generation test ( #25185 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 12:19:37 +00:00
Kevin H. Luu
3481e40743
[chore] Remove models weight on S3 logic ( #27725 )
...
Signed-off-by: kevin <kevin@anyscale.com >
2025-10-29 10:29:49 +00:00
Eugene Khvedchenya
5e72216d17
Feature/video support in random mm dataset ( #25963 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Eugene Khvedchenya <ekhvedchenia@nvidia.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 18:24:52 +08:00
Isotr0py
1a33aacf82
[Misc] Raise error for missing video metadata in MultiModalDataParser ( #27664 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-29 10:06:42 +00:00
Yue Zhang
7ba6aa8f56
[Fix] import get_kv_cache_torch_dtype error in LMCacheConnector integration ( #27670 )
...
Signed-off-by: KevinCheung2259 <2651309292@qq.com >
2025-10-29 10:03:54 +00:00
Alec S
ab2eb27b74
[Frontend] [gpt-oss] Mcp type bug ( #27689 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Signed-off-by: Alec Solder <alecs@fb.com >
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Co-authored-by: Alec Solder <alecs@fb.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-29 10:01:32 +00:00
Alec S
3c7fefdeba
[Frontend] [gpt-oss] Tool json call parsing error retry ( #27675 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Signed-off-by: Alec Solder <alecs@fb.com >
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Co-authored-by: Alec Solder <alecs@fb.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-29 09:42:44 +00:00
bnellnm
1891cf605a
[Bugfix] Fix modular kernel tests ( #27707 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-10-29 16:14:33 +08:00
Jiangyun Zhu
8df98c2161
[perf] Enable concurrent execution of "shared_experts" and "selected_experts" in qwen3-next ( #27578 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-29 08:12:54 +00:00
Cyrus Leung
4fb8771cc0
[CI/Build] Move pre-commit only scripts to tools/pre_commit ( #27657 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-29 08:04:33 +00:00
Dipika Sikka
413ef7a3b4
[Speculators] Move tests + fix integration ( #27308 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
Signed-off-by: rahul-tuli <rtuli@redhat.com >
Co-authored-by: Rahul Tuli <rtuli@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-10-29 00:54:21 -07:00
Zhewen Li
8b62495076
[Bugfix] Fix non-contiguous tensor error in rocm_unquantized_gemm_impl ( #27605 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-29 00:00:15 -07:00
Zhewen Li
83fd49b1fc
[CI/Build][Bugfix]Fix Quantized Models Test on AMD ( #27712 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-29 06:27:30 +00:00
Shaoting
a4a4f0f617
[KV Connector] Update lmcache connector with latest compatibility ( #27681 )
...
Signed-off-by: Samuel Shen <slshen@uchicago.edu >
Co-authored-by: Samuel Shen <slshen@uchicago.edu >
2025-10-29 05:38:37 +00:00
Lukas Geiger
0d8161b075
[Model] Fix Qwen3VL and Qwen3Omni after torch.compile changes ( #27705 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 05:28:20 +00:00
liuzhenwei
d2c33c397a
[NIXL][XPU] update name of nixl wheel ( #27631 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
2025-10-29 12:43:29 +08:00
Varun Sundar Rabindranath
f6d5f5888c
[Build] Revert triton_kernels requirements ( #27659 )
2025-10-28 21:07:09 -07:00
Simon Mo
9007bf57e6
Revert "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" ( #27714 )
2025-10-28 20:58:01 -07:00
Huy Do
f257544709
Install pre-built xformers-0.0.32.post2 built with pt-2.9.0 ( #27598 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-28 19:39:15 -07:00
Jialin Ouyang
0b51c9bd8b
[Core] Early return in SlidingWindowManager.remove_skipped_blocks ( #27673 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-29 01:32:33 +00:00
Wentao Ye
d3ab240f39
[Bug] Fix deepep low latency use nvlink by default ( #27677 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-28 23:53:12 +00:00
Lucas Kabela
94666612a9
[Misc][qwen2_5_vl][torch.compile] Enable supports_torch_compile on generic nn.Module and demonstrate speedup on Qwen Vision model ( #23207 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
Signed-off-by: Lucas Kabela <lucasakabela@gmail.com >
2025-10-28 22:36:43 +00:00
Nick Hill
4fe5895361
[AsyncScheduling] Make async overlap work with logprobs ( #27615 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-28 22:35:54 +00:00
Or Ozeri
111faf1118
[Core] Scheduler: Publish connector events after output ( #25875 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-10-28 21:01:33 +00:00
Wentao Ye
6afc28a9ba
[Test] Batch Invariant: Unit test using parameterized backend ( #27478 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-28 13:51:35 -07:00
Lucas Wilkinson
141e6a0505
[Misc] Make reorder batch also separate extends ( #27367 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-28 10:55:10 -07:00
Matvei Pashkovskii
130aa8cbcf
Add load pattern configuration guide to benchmarks ( #26886 )
...
Signed-off-by: Matvei Pashkovskii <mpashkov@amd.com >
Signed-off-by: Matvei Pashkovskii <matvei.pashkovskii@amd.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-28 10:49:15 -07:00
Zhengxu Chen
e3d8186666
[compile] Add fallback path to AOT compile when serialization fails. ( #27350 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 12:54:26 -04:00
Cyrus Leung
f5710ef02a
[Misc] Make LayerBlockType a Literal instead of Enum ( #27658 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-28 16:23:35 +00:00
Mohammad Miadh Angkad
a8c02fb5bf
[Bugfix][CI] Fix v1 attention backend tests and add CI coverage ( #26597 )
...
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu >
Signed-off-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-28 11:42:05 -04:00
Kero Liang
02af36df36
[Bugfix] Fix allocation & free logic of SingleWriterShmRingBuffer ( #27117 )
...
Signed-off-by: Kero Liang <kerorek@outlook.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: donglu <donglu@cohere.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-28 15:01:24 +00:00
Zhiyuan Li
e88bdd60d9
[FLA] Introduce Kimi Delta Attention(KDA) to VLLM ( #27654 )
...
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn >
2025-10-28 22:56:28 +08:00
Samuel Shen
05e034f085
[nit]: Fix import for the lmcache integration ( #27600 )
...
Signed-off-by: Samuel Shen <slshen@uchicago.edu >
Co-authored-by: Samuel Shen <slshen@uchicago.edu >
2025-10-28 14:40:55 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
936643a868
[BugFix] Also consider RAY_EXPERIMENTAL_NOSET_* when storing compilation cache ( #27294 )
...
Signed-off-by: Hollow Man <hollowman@opensuse.org >
2025-10-28 10:22:28 -04:00
Junpu Fan
b186149e8e
[Bugfix][Frontend] validate arg priority in frontend LLM class before add request ( #27596 )
...
Signed-off-by: Junpu Fan <junpufan@gmail.com >
2025-10-28 14:02:43 +00:00
22quinn
2abbd351ef
[Core] Enable async scheduling for external_launcher mode ( #27394 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com >
2025-10-28 13:52:47 +00:00
wangln19
446912d1cb
fix: allow HuggingFace standard chat template params via **kwargs ( #27622 )
...
Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-28 21:12:34 +08:00
Zhengxu Chen
a00d6254e9
[compile] Disable dynamo guards check for AOT compilation. ( #27288 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 12:58:12 +00:00
Asaf Joseph Gardin
05181cc57f
[Hybrid] Add mamba_block_size to Engine Args ( #27289 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
2025-10-28 12:54:24 +00:00
Zhengxu Chen
259504e147
[compile] Add enable_prompt_embeds to compile hash. ( #27285 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 20:46:03 +08:00
Wentao Ye
0484b64248
[Bug] Fix shape issue for eplb expert weights ( #27589 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 20:44:05 +08:00
Cyrus Leung
f58d9b6404
[Misc] Separate out utils.counter and move utils.Device to engine ( #27588 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-28 12:20:46 +00:00
Matthew Bonanni
44b5ce956d
[Bugfix] In LongRoPE, decide short vs long based on max_model_len ( #27431 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-28 12:00:56 +00:00
Nick Hill
7a865f2325
[V0 Deprecation] Remove vestigial V0 logits_processors.py file ( #27601 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-28 19:17:45 +08:00
wangln19
2fa90bda27
Fix a robust parsing issue in KimiK2ToolParser that causes IndexError ( #27565 )
...
Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
2025-10-28 11:11:50 +00:00
Zhewen Li
0291fbf65c
[CI/Build] Fix amd model executor test ( #27612 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-28 08:58:11 +00:00
Jialin Ouyang
b46e4a06f1
[Core][Bookkeeping Optimization] Update against numpy view of is_token_ids tensor ( #27618 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-28 08:13:10 +00:00
Li, Jiang
d34f5fe939
[Bugfix][CPU] Fallback oneDNN linear to torch linear to fix half gemm support on legecy platforms ( #27526 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-27 23:25:44 -07:00
Eric Yue
bdb01a38fe
[Hardware][AMD][Model] Triton MoE tuning configs for GLM-4.6 for MI300X ( #27323 )
...
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com >
2025-10-27 22:58:06 -07:00
vllmellm
5b3c35a68e
[ROCm] [Doc] Update ROCm installation docs ( #27327 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-28 13:00:50 +08:00
Chauncey
61fbfe5274
[Bugfix] fixed inconsistent finish_reason handling between V0 and V1 engines ( #27555 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-28 02:18:08 +00:00
Kuntai Du
255e34ca50
[Stability fix] turn off HMA allocator when connector is set ( #27592 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
Signed-off-by: Kuntai Du <kuntai@uchicago.edu >
2025-10-27 18:32:23 -07:00
Roger Wang
a8d2e326ec
[Bugfix][CI] Fix config resolving logic with remote models ( #27610 )
2025-10-28 00:48:32 +00:00
Andrew Xia
53a56e658b
[gpt-oss][2/N] Support input_messages in responsesRequest ( #26962 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-10-27 23:15:49 +00:00
usberkeley
69f064062b
Code quality improvements: version update, type annotation enhancement, and enum usage simplification ( #27581 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-10-27 17:50:22 +00:00
Micah Williamson
921e78f4bb
[ROCm] Update AITER branch for ROCm base docker ( #27586 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-10-27 17:22:33 +00:00
Cyrus Leung
6ebffafbb6
[Misc] Clean up more utils ( #27567 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 15:30:38 +00:00
Ben Browning
3b96f85c36
[Chore]: Stream tokens vs characters in tool call parser tests ( #26513 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2025-10-27 23:06:25 +08:00
tingtinggithub
23ad820553
fixing mm placeholder replacement issue with gemma3 ( #27538 )
...
Signed-off-by: tingtingtang1992 <streamttt@gmail.com >
2025-10-27 14:34:01 +00:00
Varun Sundar Rabindranath
5d3be3ba4c
[Bugfix][LoRA][FusedMoE] Select MxFP4 Backend based on LoRA Enablement ( #27487 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-27 07:32:50 -07:00
Yu Jiaqi
4f882be4a0
[Model] Siglip2 Model Support ( #27566 )
...
Signed-off-by: piood <2477084691@qq.com >
2025-10-27 06:57:37 -07:00
Asaf Joseph Gardin
9273754222
[Hybrid] Added supports_mamba_prefix_caching Protocol ( #27339 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
2025-10-27 13:05:20 +00:00
Jee Jee Li
f4e8154076
[Kernel] Enable moe LoRA kernel support FP16 ( #27468 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-27 19:48:37 +08:00
Fadi Arafeh
a663f6ae64
[cpu][perf] Fix low CPU utilization with VLLM_CPU_OMP_THREADS_BIND on AArch64 ( #27415 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-10-27 11:14:55 +00:00
Chauncey
a4fc21895e
[Bugfix] Fixed when return_token_ids=False, the first event still contains prompt_token_ids. ( #27561 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-27 11:06:43 +00:00
Shanshan Shen
a3e8611da5
[Bugfix] Limit the default value of max_model_len when it is not specified by users ( #27556 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-10-27 10:16:20 +00:00
Cyrus Leung
7c2bdb83dc
[Misc] Clean up utils ( #27552 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 09:05:40 +00:00
Danielle Robinson
9932ed6a83
[Kernel] Adding split_K implementation for fused_moe_lora ( #27291 )
...
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com >
Signed-off-by: Danielle Robinson <dcmaddix@gmail.com >
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-27 02:05:24 -07:00
Jee Jee Li
2d631d28c6
[Doc] Slight improvement to M2 and beyond ( #27554 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-27 09:02:10 +00:00
Cyrus Leung
b368382964
[Model] Deprecate merge_by_field_config=False ( #27551 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 16:43:00 +08:00
gnovack
a806c14cc7
[Performance][LoRA] add context varying params to 'do_not_specialize' in fused moe lora ( #27445 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
2025-10-27 06:31:55 +00:00
yyzxw
181bf5bbde
[Docs] reemove the incorrect enable_reasoning parameter ( #27550 )
...
Signed-off-by: zxw <1020938856@qq.com >
2025-10-26 23:17:19 -07:00
Cyrus Leung
cbd5e07a51
[Model] Use merge_by_field_config for MM models (Qwen series) ( #27546 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 05:38:05 +00:00
CSWYF3634076
63b22e0dbb
[Model][Bugfix] fix ernie45 moe 300B SharedFusedMoE output tuple ( #27316 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-10-26 20:53:31 -07:00
Roger Young
5980604c44
Fix MiniMax-M2 copyright ( #27537 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-10-27 03:29:51 +00:00
youkaichao
361a7463d3
fix m2 test ( #27536 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-10-27 01:04:36 +08:00
Roger Young
720af6ab79
[Model][MiniMax-M2] Support MiniMax-M2 Model ( #27535 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-10-27 00:59:11 +08:00
Cyrus Leung
55cba4a05c
[CI/Build] Update causal-conv1d installation ( #27529 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 22:14:22 +08:00
Cyrus Leung
c7abff2990
Revert "[CI/Build] Use CPU for mm processing test on CI ( #27522 )" ( #27531 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 04:44:27 -07:00
Yeshwanth N
71b1c8b667
[Chore]:Extract math and argparse utilities to separate modules ( #27188 )
...
Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com >
Signed-off-by: Yeshwanth N <yeshsurya@gmail.com >
Signed-off-by: yeshsurya <yeshsurya@gmail.com >
2025-10-26 04:03:32 -07:00
Cyrus Leung
8fb7b2fab9
[Doc] Fix links to GH projects ( #27530 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 17:55:51 +08:00
Cyrus Leung
be7b55a83d
[Doc] Remove Molmo warning ( #27527 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 16:22:52 +08:00
Lucia Fang
315b860abe
[bugfix]fix empty prompts for async-engine mode in benchmark throughput ( #27494 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
2025-10-26 08:16:35 +00:00
rongfu.leng
87c41c26ad
[Bugfix] Fix processor initialization for model from modelscope instead of HF ( #27461 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-26 07:44:31 +00:00
JartX
65d2cf9511
[BUGFIX][ROCM] ViT FlashAttention on ROCm (no GFX9) and contiguous on qwen3vl ROCm TORCH_SDPA ( #27190 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-10-26 15:08:52 +08:00
Isotr0py
d63cd9ff10
[CI/Build] Use CPU for mm processing test on CI ( #27522 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-26 13:09:18 +08:00
Cyrus Leung
66a168a197
[CI/Build] Refactor processing tests ( #27470 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-25 16:14:30 +00:00
Matthew Bonanni
a99564ac5b
[Attention] Add missing kv cache scale setup ( #27490 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-25 00:12:49 -07:00
Cyrus Leung
4c5f632165
[Misc] Simplify max tokens in multimodal registry ( #27500 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-24 23:56:01 -07:00
Kuntai Du
b853540388
[Core][Hybrid allocator + kv connector 1/n] Enable hybrid allocator + KV cache connector ( #25712 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
Signed-off-by: Kuntai Du <kuntai@uchicago.edu >
2025-10-24 23:34:18 -07:00
Zhuohan Li
56ed7609a9
Revert "[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selectio… ( #27502 )
2025-10-25 05:31:43 +00:00
Jiangyun Zhu
29c9cb8007
[CI] Add tests for cudagraph ( #27391 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-25 02:37:33 +00:00
Yihua Cheng
83f478bb19
[KVConnector] Migrate the LMCache integration code to be vLLM native ( #25542 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
2025-10-25 00:23:53 +00:00
Varun Sundar Rabindranath
269c4db0a4
[Misc][DP] Guard mxfp4 implementation selection ( #27484 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-24 23:29:24 +00:00
Wentao Ye
52efc34ebf
[Log] Optimize Startup Log ( #26740 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-24 19:27:04 -04:00
Pengchao Wang
d95d0f4b98
[Distributed] Basic set of configuration for large EP deployment on GB200 ( #27328 )
...
Signed-off-by: Pengchao Wang <wpc@fb.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2025-10-24 14:16:44 -07:00
Lehua Ding
0402428200
[Perf][Async Scheduling] Remove CPU->GPU sync in dummy_run ( #27455 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com >
2025-10-24 20:45:36 +00:00
jinghanhu
17af6aa0da
[Document] Add ms-swift library to rlhf.md ( #27469 )
2025-10-24 20:31:50 +00:00
Zhewen Li
fc168c33f3
[CI/Build] Fix test_torch_utils in AMD CI ( #27317 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-24 12:26:00 -07:00
Isotr0py
acc78aeb88
[Bugfix] Fix interns1-vit qk norm code path ( #27480 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-24 17:43:45 +00:00
Ming Yang
0f67d4d962
[Attention] Add MLA prefill backend: trtllm_ragged_attention_deepseek ( #26397 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
2025-10-24 10:24:08 -07:00
kourosh hakhamaneshi
7e1d697b56
[Bugfix] Fix MultiConnector stats reconstruction across process boundaries ( #27366 )
...
Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-10-24 17:08:05 +00:00
Chendi.Xue
699d62e6cf
[NIXL][BUGFIX] delay done_recving queue cleanup to bottom of get_finished ( #27297 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-24 17:01:41 +00:00
Richard Zou
cd390b609d
[compile] Turn standalone_compile back on ( #27460 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-10-24 16:30:27 +00:00
Fadi Arafeh
2080b05099
[cpu][fix] Fix onednn_mm crash on consecutive matmuls with same M,K,N and different dtype ( #27472 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-10-24 15:57:48 +00:00
Lifans
6454afec90
[Doc] Fix minor issues in docs/design/metrics.md ( #27436 )
...
Signed-off-by: Lifan Shen <lifans@meta.com >
2025-10-24 05:40:54 -07:00
Chauncey
41a62564a7
Fix test named tool use ( #27458 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-24 20:27:45 +08:00
fhl2000
284cc92275
[MISC] cudagraph_capture_sizes related improvements ( #26016 )
...
Signed-off-by: fhl <2410591650@qq.com >
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-24 05:11:05 -07:00
ioana ghiban
435be10db9
Fix AArch64 CPU Docker pipeline ( #27331 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com >
2025-10-24 05:11:01 -07:00
Cyrus Leung
b7030d962b
[Benchmark] Enable benchmark to run with encoding_format="bytes" ( #27467 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-24 11:16:50 +00:00
Chauncey
3567816932
[Refactor] move tool parsing logic from protocol.py to the tool parser ( #27383 )
...
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
2025-10-24 09:53:23 +00:00
22quinn
e0ef8a2920
[BugFix] Fix torchrun DP with LLM class ( #27395 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-10-24 08:11:37 +00:00
Isotr0py
42efe609ba
[MM][Bugfix] Replace PatchEmbed's conv3d to linear layer ( #27418 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-24 07:32:47 +00:00
Yu Jiaqi
88d3141ec6
[Docs] remove v1 column for embedding models ( #27446 )
...
Signed-off-by: piood <2477084691@qq.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-23 23:55:03 -07:00
Rui Qiao
09a6a49eaf
[Misc] Avoid "PyTorch non-writable tensors" warning in RayPPCommunicator ( #27443 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-24 14:53:09 +08:00
strinczer
074475541a
[Bugfix] Fix Pydantic union resolution for ResponseFunctionToolCall in Responses API ( #26706 )
...
Signed-off-by: Shai Trinczer <strinczer@icloud.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-23 22:53:42 -07:00
Aaron Pham
d4c574c39f
[Chore] remove structural tags logging lines ( #27451 )
2025-10-24 05:35:45 +00:00
usberkeley
c528b9006a
Fix EventPublisherFactory logic for disabled KV cache events ( #27419 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-10-24 05:00:01 +00:00
fhl2000
85fee74b33
[Bugfix][CI] Move resolving cudagraph_mode before initializing attn_metadata_builder ( #27427 )
...
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
2025-10-23 20:31:14 -07:00
hfan
8dbe0c527f
[Misc] Add TPU usage report when using tpu_inference. ( #27423 )
...
Signed-off-by: Hongmin Fan <fanhongmin@google.com >
2025-10-23 20:29:37 -07:00
Xiangyu Li
5cc6bddb6e
[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm ( #26092 )
2025-10-23 23:26:13 -04:00
Harry Mellor
1f9460c4c1
Fix pooling adapters for Transformers backend ( #27338 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-23 20:23:55 -07:00
xiao-llm
70022ffc00
Granite 4.0 quark quantization support ( #26944 )
...
Signed-off-by: Xiao YU <Xiao.YU@xilinx.com >
Signed-off-by: Xiao Yu <xiao.yu.dc@outlook.com >
Co-authored-by: Xiao YU <Xiao.YU@xilinx.com >
2025-10-24 02:14:03 +00:00
Akash kaothalkar
f417746ad7
[Hardware][POWERPC] Disable oneDNN path in vllm/model_executor/layers/utils.py for Powerpc ( #27422 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-10-23 21:21:36 +00:00
Yu Jiaqi
0552cfb195
[Model] Siglip Embedding Support ( #27324 )
...
Signed-off-by: piood <2477084691@qq.com >
2025-10-23 20:19:48 +00:00
Kebe
51dd14ac2b
[Bugfix][DP] Fix creating too many DP Placement Groups ( #26880 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Co-authored-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-23 20:16:51 +00:00
Matthew Bonanni
dbfbf9f324
[Attention] Fix FlashMLA metadata builder arguments for q_len > 1 ( #27368 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-23 15:58:15 -04:00
Jonathan Chen
ca76486a16
[Chore] Separate out vllm.utils.platform_utils.py ( #27374 )
...
Signed-off-by: Jonathan <chenleejonathan@gmail.com >
2025-10-23 19:08:06 +00:00
Varun Sundar Rabindranath
a9f55dc588
[Misc] Add triton_kernels dependency ( #27370 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-23 12:04:14 -07:00
Isotr0py
81d5bb765a
[Bugfix] Fix AWQ marlin layer skipping ( #27416 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-23 18:30:28 +00:00
Gregory Shtrasberg
0825197bee
[Bugfix][ROCm][DeepSeek] Fix for forward_hip in rope for DeepSeek ( #27373 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-10-23 17:43:53 +00:00
Alexander Matveev
9ef3d5b875
[Bugfix] Fix dp_chunking enablement logic in FusedMoE layer ( #27220 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-10-24 00:03:14 +08:00
Alexei-V-Ivanov-AMD
295c7f0267
Mirroring the test definitions (2025-10-22) ( #27362 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-24 00:02:26 +08:00
wang.yuqi
3fa2c12185
[Frontend][4/N] Improve all pooling task | Add plugin pooling task ( #26973 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Christian Pinto <christian.pinto@ibm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com >
2025-10-23 14:46:18 +00:00
Cyrus Leung
fe2016de2d
[CI/Build] Remove unnecessary flags from test registry ( #27353 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-23 14:42:40 +00:00
Ilya Markov
237cf6d32a
[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selection (fix DP slow startup time &c) ( #26709 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-10-23 20:58:39 +08:00
Navya Srivastava
faee3ccdc2
[Feature] Pydantic validation for speculative.py ( #27156 )
...
Signed-off-by: Navya Srivastava <navya.srivastava1707@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-23 12:19:33 +00:00
Bradley D
570c3e1cd4
[Bugfix] Honor --mm_encoder_attn_backend when used ( #27124 )
...
Co-authored-by: Bradley D <4551889+bradleyhd@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-23 20:09:52 +08:00
Harry Mellor
3a4255c7c4
Run mypy on the lowest supported Python version instead of system Python ( #27048 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-23 05:07:44 -07:00
tomeras91
61089465a6
[Model] Add MoE support for NemotronH ( #25863 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2025-10-23 10:27:23 +00:00
Tova Movshovitz
88afa11010
[Metrics] [KVConnector] Add connector prefix cache hit rate stats ( #26245 )
...
Signed-off-by: tovam <tovam@pliops.com >
2025-10-23 12:21:08 +02:00
Chauncey
d00ce29d89
[CI] Reorganize entrypoints tests ( #27403 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-23 10:10:06 +00:00
Louie Tsai
3b7bdf983b
add SLA information into comparison graph for vLLM Benchmark Suite ( #25525 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
Signed-off-by: louie-tsai <louie.tsai@intel.com >
Signed-off-by: Louie Tsai <louie.tsai@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-23 08:04:59 +00:00
Zhewen Li
50b788a17a
[CI/Build] Fix AMD CI: test_cpu_gpu.py ( #27388 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-23 07:55:00 +00:00
Lucia Fang
fc059c7061
[Bugfix] Fix args settings for guided decoding args ( #27375 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
2025-10-23 07:34:06 +00:00
Cyrus Leung
bfb240cc49
[CI/Build] Fix Prithvi plugin test ( #27393 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-23 07:30:44 +00:00
Jonathan Chen
e255d92990
[Chore] Remove duplicate has_ functions in vllm.utils ( #27372 )
...
Signed-off-by: Jonathan <chenleejonathan@gmail.com >
2025-10-23 06:11:59 +00:00
wang.yuqi
3729ed00ba
[Model] Add num_cached_tokens for PoolingRequestOutput ( #27378 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-23 14:03:42 +08:00
Giancarlo Delfin
6644796bf4
[V1][spec decode] return logprobs for spec decoding ( #26060 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-10-22 22:59:59 -07:00
Andrew Sansom
ff93cc8c84
[CORE] Support Prefix Caching with Prompt Embeds ( #27219 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
2025-10-22 22:18:07 -07:00
PiteXChen
243ed7d32e
[Bugfix][Core] running queue index leakage exception ( #26754 )
...
Signed-off-by: CLFutureX <chenyongqyl@163.com >
2025-10-22 21:40:12 -07:00
fangpings
7e0941055f
[Bugfix] Fix incorrect kv cache metrics in grafana.json ( #27133 )
...
Signed-off-by: Fangping Shi <fangping_shi@apple.com >
Co-authored-by: Fangping Shi <fangping_shi@apple.com >
2025-10-22 20:58:36 -07:00
Cyrus Leung
6738e4a093
[Bugfix] Fix SLA tuner initialization ( #27355 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-22 20:43:04 -07:00
Isotr0py
2566dca2a9
[Bugfix] Fix deepseek-ocr multi-image inference and add merge_by_field_config=True with tensor schema support ( #27361 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-22 17:15:38 -07:00
Matthew Bonanni
b4fda58a2d
[MLA] Bump FlashMLA ( #27354 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-22 15:48:37 -07:00
dongbo910220
a0003b56b0
[Chore] Separate out system utilities from vllm.utils ( #27201 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-22 20:25:25 +00:00
Daisy-Ma-coder
5beacce2ea
[BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 ( #27128 )
...
Signed-off-by: qqma <qqma@amazon.com >
Co-authored-by: qqma <qqma@amazon.com >
2025-10-22 19:36:39 +00:00
rongfu.leng
8669c69afa
[Feature] publisher default set zmq in kv_event config ( #26915 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-22 19:19:33 +00:00
Sage
1651003c35
[Prefix Cache] Use LoRA name for consistent KV-cache block hashing ( #27211 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2025-10-22 18:13:03 +00:00
William Song
1cb8c6c5fe
[Doc] Fix numbering sequence in prefix caching ( #27357 )
...
Signed-off-by: William Song <jinwook@umich.edu >
2025-10-22 17:35:47 +00:00
Luciano Martins
e05a6754a8
[Model] Revert PR #26715 : Restore custom PaliGemma and Gemma3-MM impl… ( #27309 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com >
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com >
2025-10-22 10:05:34 -07:00
Isotr0py
084a9dae80
[Bugfix] Disable FlexAttention direct block mask building for encoder-only models ( #27344 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-22 16:39:08 +00:00
RED
c9461e05a4
Support Anthropic API /v1/messages Endpoint ( #22627 )
...
Signed-off-by: liuli <ll407707@alibaba-inc.com >
Co-authored-by: liuli <ll407707@alibaba-inc.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-22 09:13:18 -07:00
Nicolò Lucchesi
4dfdb821c8
[P/D] Dynamic kv_output_aggregator collect size ( #26734 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-22 18:07:58 +02:00
Russell Bryant
58fab50d82
[Frontend] Require flag for loading text and image embeds ( #27204 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-22 15:52:02 +00:00
Isotr0py
db6f28d898
[Bugfix] Fix HF format InternVL large variants video processing ( #27330 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-22 08:39:23 -07:00
Cyrus Leung
14e2f1231e
[Bugfix] Make get_mrope_input_positions instance methods ( #27342 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-22 08:38:34 -07:00
Chendi.Xue
7c4767f1eb
[NIXL] use Host buffer to support TP_ratio > 1 for XPU ( #27140 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
Signed-off-by: Chendi.Xue <chendi.xue@intel.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2025-10-22 15:28:13 +00:00
Jee Jee Li
9771e0b432
[Bugfix] Add missing 'is_internal_router' attribute to FusedMoEWithLoRA ( #27351 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-22 08:19:12 -07:00
Reinforce-II
980de31ca0
[bugfix] remove unused parameters to reduce unnecessary vram usage ( #26789 )
...
Signed-off-by: Reinforce-II <fate@eastal.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-22 08:16:09 -07:00
Wentao Ye
1c160841ea
[Bug] Fix DeepSeek-V2.5-1210-FP8 issue ( #27267 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-22 11:00:10 -04:00
Mark McLoughlin
4ca13a8667
[NIXL] Terminate handshake listener thread in shutdown ( #26404 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-10-22 16:59:53 +02:00
Isotr0py
675aa2ec64
[Model] Upstream Deepseek-OCR model ( #27247 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-22 07:59:15 -07:00
dongbo910220
3ae082c373
[Chore] Separate out optional dependency checks from vllm.utils ( #27207 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-22 10:44:21 -04:00
Alexei-V-Ivanov-AMD
49c00fe304
Mirroring changes in test-pipeline.yaml into test-amd.yaml ( #27242 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-22 09:59:45 -04:00
Mark McLoughlin
141d3b9fc5
[docs] Update v1 metrics design doc ( #27332 )
...
Signed-off-by: Simon Mo <simon.mo@hey.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: atalhens <sneh.lata@nutanix.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: atalhens <sneh.lata@nutanix.com >
2025-10-22 06:29:15 -07:00
Jee Jee Li
abf3db40ef
[Core] Handle MoE LoRA edge cases ( #27335 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-22 13:14:33 +00:00
gnovack
8e4ca4d14e
Bugfix - pass 'max_num_tokens_padded' into 'moe_lora_align_block_size' ( #27311 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-22 12:23:57 +00:00
Wentao Ye
1a0f4defb7
[Log] Add Warning for LLM(data_parallel_size=k) single-process DP Usage ( #27282 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-22 12:12:21 +00:00
Li, Jiang
843af7f7fc
[Bugfix][CPU] Disable dual stream execution for experts on CPU ( #27320 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-10-22 11:02:27 +00:00
wang.yuqi
1f633b8632
[Frontend][3/N] Improve all pooling task | Support binary embedding response ( #27066 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-22 18:38:57 +08:00
ExtReMLapin
a4c29e6e82
fixed reasoning streaming with tool_choice="required" ( #24108 )
...
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com >
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-10-22 09:42:55 +00:00
Harry Mellor
8f18feb191
Remove last level references not removed in #26355 ( #27260 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-22 09:18:17 +00:00
Huy Do
ed540d6d4c
Update release pipeline for PyTorch 2.9.0 ( #27303 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-10-22 09:18:01 +00:00
wangxiyuan
f6027b2855
[1/N][Platform] Cleanup useless function ( #26982 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-22 09:04:57 +00:00
Jiangyun Zhu
ab3e80042e
[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops enabled ( #27146 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-22 00:22:39 -04:00
Cyrus Leung
ceacedc1f9
[Benchmark] Add plot utility for parameter sweep ( #27168 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-21 20:30:03 -07:00
Nicolò Lucchesi
bfa59be8f1
[CI] Nixl integration tests DP-EP ( #27199 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-22 11:17:48 +08:00
vllmellm
265ecb05fb
[DOC] [ROCm] Add ROCm quickstart guide ( #26505 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-22 03:10:48 +00:00
Lain
09a7e6f617
[Deepseek v3.2] Remove extra logics in indexer ( #26465 )
...
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com >
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
Signed-off-by: Lain <siyuanf@nvidia.com >
Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
2025-10-21 23:34:03 +00:00
Tyler Michael Smith
6c2eef5a5d
[P/D] KVConnector for decode benchmarking ( #25986 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-10-21 16:30:47 -07:00
Benjamin Chislett
19748806f0
[Bugfix] skip cuda graph for drafter when running with eager ( #26821 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-21 15:39:09 -07:00
ExtReMLapin
4a8a567e16
Updated xgrammar backend to not deny supported string formats ( #27253 )
...
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com >
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-21 22:25:23 +00:00
Alexander Matveev
344a0017c0
[Performance] Dual stream execution of "shared_experts" and "selected_experts" inside FusedMoE ( #26440 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-10-21 21:38:29 +00:00
Huy Do
becb7de40b
Update PyTorch to 2.9.0+cu129 ( #24994 )
...
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-21 17:20:18 -04:00
Tao He
250fb1b8ea
[Bugfix] fixes the decoding metadata of dense mla's fp8 kvcache. ( #27144 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-21 18:27:03 +00:00
Nick Hill
647214f3d5
[V0 Deprecation] Remove V0 executors ( #27142 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-21 11:09:37 -07:00
David Whyte-Gray
ddeec11ba9
[Bugfix][P/D] Reduce num_threads used by nixl ucx backend ( #27196 )
...
Signed-off-by: David Whyte-Gray <40244437+dagrayvid@users.noreply.github.com >
2025-10-21 13:41:52 -04:00
Wentao Ye
86ed77022d
[Feature] Batch Invariant for R1 TP 8 on Blackwell ( #27229 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-21 10:25:55 -07:00
Micah Williamson
aa1356ec53
[ROCm] Update Triton, Torch, and AITER branches for ROCm base Dockerfile ( #27206 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-10-21 12:01:23 -04:00
Pavani Majety
ecc3c0940a
Add @pavanimajety to .github/codeowners for Flashinfer, ModelOpt related code ( #27213 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-10-21 22:59:53 +08:00
JartX
ba09652de2
[ROCM] Enable CompressedTensorsWNA16 ( #27187 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
2025-10-21 10:43:23 -04:00
Harry Mellor
bd66b8529b
[CI] Install pre-release version of apache-tvm-ffi for flashinfer ( #27262 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-21 14:23:56 +00:00
dongbo910220
6c728f7771
[Chore] Separate out NCCL utilities from vllm.utils ( #27197 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-21 06:18:23 -07:00
Daniel Cámpora
80e9452984
[Deepseek v3.2] Optimize top_k_per_row ( #26763 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
2025-10-21 08:30:07 +00:00
Roger Wang
c3a2c6ac5f
[MM][Core] Decouple ViT backend from LM backend ( #27061 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-10-21 00:30:10 -07:00
Nicolò Lucchesi
72f431e709
[Nixl] Minor refactor to handshake related metadata ( #26410 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-21 09:07:47 +02:00
Zebing Lin
be4445072c
[Fix][Spec Decode] Fix llama4 draft loading with different quantization ( #27136 )
...
Signed-off-by: linzebing <linzebing1995@gmail.com >
2025-10-20 23:19:00 -07:00
Benjamin Chislett
f381cf2302
[Bugfix] Fix broken MTP weight loading for FP8 KV Scales ( #27227 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-20 22:51:44 -07:00
Varun Sundar Rabindranath
5ff5d94e77
[Bugfix] Fix gpt-oss w4a8 DP/EP on B200 ( #26729 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-21 01:51:14 -04:00
Shu Wang
f95da13c3d
[ModelOpt] Load w13/w2_input_scale for all experts, nvfp4 ( #26135 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com >
Signed-off-by: Shu Wang. <shuw@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-21 01:50:31 -04:00
Po-Han Huang (NVIDIA)
aef368aa08
[BugFix] GPT-OSS Attention DP + MoE TP weight loading issue ( #24032 )
...
Signed-off-by: Po-Han Huang <pohanh@nvidia.com >
2025-10-21 04:03:47 +00:00
Chen Wu
5f6cbf60d6
[Feature][Kernel]FusedMoE LoRA ( #21229 )
...
Signed-off-by: wuchen <cntryroa@gmail.com >
Signed-off-by: banjuede <lmklhc@163.com >
Signed-off-by: Chen Wu <cntryroa@gmail.com >
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: bk-201 <joy25810@foxmail.com >
Co-authored-by: wuchen <wuchen@zetyun.com >
Co-authored-by: Nathan Van Gheem <vangheem@gmail.com >
Co-authored-by: banjuede <lmklhc@163.com >
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: bk-201 <joy25810@foxmail.com >
2025-10-21 03:01:37 +00:00
Russell Bryant
3ada34f9cb
[Frontend] Enforce tokenize=False when applying chat template ( #27205 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-21 02:57:34 +00:00
Lunwen He
0eb8f2b880
create is_in_the_same_node on cpu ( #26832 )
...
Co-authored-by: Lunwen He <lunwenh@meta.com >
2025-10-21 02:04:14 +00:00
Fadi Arafeh
163965d183
[cpu] Dispatch un-quantized linear to oneDNN/ACL by default for AArch64 ( #27183 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
Co-authored-by: Michael Yang <Michael.Yang@arm.com >
2025-10-21 02:02:58 +00:00
Nick Hill
a03cf9bc70
[V0 Deprecation] Remove V0 metrics code ( #27215 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-21 02:02:10 +00:00
Isotr0py
352c0c8a28
[Quantization] Automatically infer AWQ modules_to_not_convert field ( #26909 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-21 01:49:28 +00:00
Andrew Xia
bfe0b4bd2a
[ez] add uv lock to gitignore ( #27212 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-10-21 00:37:44 +00:00
Concurrensee
58fbbcb2f5
[ROCm] enable some tests in entrypoints test groups on AMD ( #26725 )
...
Signed-off-by: Yida <yida.wu@amd.com >
2025-10-21 00:37:16 +00:00
Heng Guo
87778d5f00
[Feature][Quantization] auto_round support for mixed bits quantization ( #23812 )
...
Signed-off-by: n1ck-guo <heng.guo@intel.com >
Signed-off-by: Heng Guo <heng.guo@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-20 22:23:30 +00:00
Nicolò Lucchesi
f9e7ad5400
[Bugfix][CI] Fix Distributed Tests (4 GPUs) async_sched+ray test ( #27195 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-20 16:34:54 +00:00
shivampr
4d0f266113
[Kernel][Model] Tune fused_moe Triton configs for Qwen3-30B A3/A3B on H100 (FP8/BF16) ( #26268 )
...
Signed-off-by: Shivam <shivampr.dev@gmail.com >
2025-10-20 07:48:01 -07:00
Eugene Khvedchenya
e93ff6c8b9
Nemotron Nano V2 VL + EVS Video Support ( #27107 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Natan Bagrov <nbagrov@nvidia.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-20 22:19:11 +08:00
ioana ghiban
1c691f4a71
AArch64 CPU Docker pipeline ( #26931 )
2025-10-20 07:09:40 -04:00
Jiangyun Zhu
9fce7bee74
[Kernel] Accelerate solve_tril with TMA ( #26746 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-20 05:39:02 +00:00
Andy Lo
b63f2143f8
[LoRA] LoRA cuda graph specialization ( #25914 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-20 04:21:09 +00:00
Yi Zhang
f32bf7582e
[Model][VLM] Support Bee-8B Model ( #27012 )
...
Signed-off-by: uyzhang <yi.zhang.4096@gmail.com >
Signed-off-by: Yi Zhang <zhangyi970819@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-20 02:31:26 +00:00
Yongtao Huang
8a81d776ce
Fix typo in ValueError message: use kv_role instead of kv_disagg_role ( #27166 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-10-19 19:47:19 +00:00
Sergei Skvortsov
f6fdacd82c
[Bugfix] Fix error with penalties when speculative decoding and structural output are enabled ( #26586 )
...
Signed-off-by: southfreebird <yvorott@gmail.com >
2025-10-19 19:24:46 +00:00
Cyrus Leung
d31f7844f8
[Misc] Move utils to avoid conflicts with stdlib, and move tests ( #27169 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-19 05:20:55 -07:00
iAmir97
7a6c8c3fa1
[Chore] Separate out vllm.utils.network_utils ( #27164 )
...
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com >
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com >
2025-10-19 03:06:32 -07:00
Jianyu Huang
221bf72577
output type conversion fix ( #27159 )
2025-10-19 08:10:07 +00:00
Cyrus Leung
b3aba04e5a
[Benchmark] Convenience script for multiple parameter combinations ( #27085 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-18 23:57:01 -07:00
dongbo910220
8a297115e2
[Chore] Separate out hashing utilities from vllm.utils ( #27151 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-19 11:09:38 +08:00
22quinn
191eed0bb9
[BugFix] Fix lazy imports involving outlines_core ( #27158 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-10-19 02:35:32 +00:00
Woosuk Kwon
fb860670da
[Minor] Remove unused env variable ( #27161 )
2025-10-18 18:48:35 -07:00
Tova Movshovitz
83e760c57d
[V1][Metrics][Plugin] Add plugin support for custom StatLoggerBase implementations ( #22456 )
...
Signed-off-by: tovam <tovam@pliops.com >
2025-10-18 15:12:46 -07:00
Lucas Wilkinson
c2bba69065
[BugFix] Disable fp8 kv-cache by default for DeepSeek V3.2 ( #27121 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-18 22:05:23 +00:00
Boyuan Feng
e133d6d218
[BugFix] fix graph partition signature ( #27139 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-18 17:34:36 -04:00
dongbo910220
a1946c9f61
[Chore] Separate out profiling utilities from vllm.utils ( #27150 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-18 19:12:01 +00:00
Lucas Wilkinson
9f020f4f31
[BugFix] Fix failing gemma-3-1b-it test: test_lm_eval_accuracy_v1_engine[google/gemma-3-1b-it] ( #27111 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-18 12:44:39 -06:00
Nick Hill
3b45075206
[Minor] Add some clarifying comments to recent changes ( #27130 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-18 09:52:45 -07:00
Yongtao Huang
168e578efc
Fix incorrect string formatting in barrier timeout exceptions ( #27149 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-10-18 09:51:57 -07:00
Isotr0py
6ac5e06f7c
[Chore] Clean up pytorch helper functions in vllm.utils ( #26908 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: isotr0py <2037008807@qq.com >
2025-10-18 09:48:22 -07:00
Lukas Geiger
5c2acb270a
[Models][QwenVL] Remove unnecessary .contiguous() calls ( #27106 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-18 07:05:05 -07:00
Nicolò Lucchesi
b26b70bec4
[Misc] Refactor get_kv_cache_spec into AttentionLayerBase ( #26587 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-18 13:51:21 +00:00
Fadi Arafeh
ab4be40fc5
[fix][cpu] fix prefill attention in CPU attention backend ( #27035 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-10-18 13:30:21 +00:00
Wentao Ye
245e4f2c01
[Feature] Batch Invariant: Support DeepGEMM and Blackwell ( #27127 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-18 09:28:05 -04:00
iAmir97
1d165d6d85
[Chore] Separate out vllm.utils.mem_utils ( #27143 )
...
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com >
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com >
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-18 10:06:59 +00:00
dongbo910220
83004020fd
[Test] Add test for /health endpoint on engine failure ( #26074 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-18 09:59:05 +00:00
Chendi.Xue
12e21701e7
[DOC][FEATURES][CPU]update cpu feature for v1 ( #27135 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-18 01:10:45 -07:00
Varun Sundar Rabindranath
30a33b92ee
[Misc] Rev DeepEP ( #27122 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-18 14:54:29 +08:00
Hanchenli
7c572544e4
[GPT-OSS] Structure_Tag support for gpt-oss tool-call in cot ( #25515 )
...
Signed-off-by: Hanchenli <lihanc2002@gmail.com >
Signed-off-by: Hanchenli <61769611+Hanchenli@users.noreply.github.com >
Signed-off-by: Wei Wei <wwei6@meta.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Wei Wei <wwei6@meta.com >
Co-authored-by: Wei Wei <weiweinpu@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-17 21:55:54 -07:00
Huamin Li
c312320764
[CI/Build] tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache layout in backend-correctness tests ( #26663 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-17 21:11:26 -07:00
ZiTian Zhao
c981f0ea78
[Perf] Add H100 fused MoE config ( #25398 )
...
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com >
2025-10-18 02:21:27 +00:00
Lehua Ding
6367bde739
[BugFix][Core] Fix error when enable async-scheduling in multi-node env ( #25887 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com >
Signed-off-by: Lehua Ding <lehuading@qq.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
2025-10-17 22:16:18 +00:00
Wentao Ye
f50cc221ea
[Test] Make test_failure more stable for batch invariance ( #27054 )
2025-10-17 16:59:08 -04:00
Pradyun92
acedc74b1a
[V1][Spec Decode] Fix greedy temperature detection after sampler refactor ( #27077 )
...
Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com >
Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com >
2025-10-17 13:27:47 -07:00
Zhuohan Li
d29483b58a
[Minor] Remove unnecessary error message ( #27115 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-10-17 20:02:12 +00:00
Michael Goin
950cf9e58e
[Bugfix] Use PIECEWISE cudagraphs on Blackwell if max_model_len > 131072 ( #27114 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-17 19:47:18 +00:00
Isotr0py
3125d79950
[Chore] Remove unused PolyNorm layer ( #27110 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-17 19:03:43 +00:00
vllmellm
e33ee23ee3
[Bugfix] [AITER] [ROCm] Fix Quark MoE Quant Config and AITER Fused MoE quant type logic ( #27029 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-17 12:51:10 -06:00
rasmith
b10c64c834
[ROCm][Bugfix][Model] Fix illegal memory access when running qwen3_moe models with rms_norm (Qwen3-235B-A22B, Qwen3-30B-A3B, etc.) ( #26192 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
Signed-off-by: rasmith <Randall.Smith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-17 14:17:18 -04:00
Aleksandr Malyshev
0925b28a8e
[ROCM] MoE fp4 CK kernel ( #26545 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
2025-10-17 14:06:33 -04:00
Nicolò Lucchesi
99722d5f0e
[CI] Remove forbidden slash ( #27112 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-17 09:38:00 -07:00
燃
4c91a28e30
[bugfix] Qwen3-VL fix video incorrect timestamp calculations while do_sample_frames=True ( #27104 )
...
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com >
2025-10-17 16:26:33 +00:00
Patrick von Platen
b038d9c40c
[Data-parallel] Allow DP>1 for world_size > num_gpus on node (8) ( #26367 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-17 08:24:42 -07:00
Nicolò Lucchesi
2ba60ec7fe
[CI] Nixl integration tests ( #27010 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-17 07:13:31 -07:00
Luka Govedič
bd7157a071
[torch.compile] Enable attention and allreduce fusion without custom ops enabled ( #24604 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-17 08:10:23 -06:00
Yongtao Huang
be429d0cfd
Fix incorrect docstring for stop_profile() method ( #27101 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-10-17 06:30:23 -07:00
Reima Karhila (AMD)
c253745eb8
[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI350 and MI355 ( #25586 )
...
Signed-off-by: Reima Karhila <reima.karhila@amd.com >
Signed-off-by: xaguilar <Xavier.AguilarFruto@amd.com >
Co-authored-by: xaguilar <Xavier.AguilarFruto@amd.com >
2025-10-17 04:56:12 -07:00
Jee Jee Li
daec4d2624
[Model]Improve Qwen3VLMoeForConditionalGeneration packed_modules_mapping ( #27096 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-17 04:47:00 -07:00
Harry Mellor
6c9fdbf725
[Docs] Replace rst style double-backtick with md single-backtick ( #27091 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-17 02:47:34 -07:00
Harry Mellor
483ea64611
[Docs] Replace all explicit anchors with real links ( #27087 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-17 02:22:06 -07:00
Mengqing Cao
e20eba753b
[VLM][Refactor] Remove useless func get_input_positions in MRotaryEmbedding ( #27088 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-10-17 02:00:30 -07:00
cong-meta
bbc1b29665
Update troubleshooting.md and remind VLLM_TRACE_FUNCTION usage ( #27069 )
...
Signed-off-by: cong-meta <prowindy@hotmail.com >
2025-10-17 01:53:06 -07:00
Chauncey
acb1bfa601
[CI] fix docs build failed ( #27082 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-17 07:53:40 +00:00
zhrrr
75c7ad9918
[Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel ( #26717 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Signed-off-by: izhuhaoran <izhuhaoran@qq.com >
2025-10-17 07:30:35 +00:00
Li, Jiang
5550ff9c25
[CI/Build] Update compressed tensor test path to fix CPU CI ( #27068 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-10-16 22:34:56 -07:00
Said Taghadouini
3aeb19a39e
[Model] Add support for LightOnOCR ( #26916 )
...
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com >
Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-17 05:05:24 +00:00
Cyrus Leung
8c017b3490
[Model] Always use Transformers backend for PaliGemma and Gemma3-MM ( #26715 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-17 05:03:35 +00:00
Zhewen Li
9c2c2287a0
[CI/Build] Update Llama4 eval yaml ( #27070 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-17 04:59:47 +00:00
Jee Jee Li
fec2b341ad
[Kernel] Lazy import FlashInfer ( #26977 )
2025-10-17 04:48:18 +00:00
Jee Jee Li
87bc0c492f
[Bugfix] Fix ReplicatedLinearWithLoRA ( #27065 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-17 04:43:16 +00:00
Nick Hill
fe3b9372ad
[Core] Change execute_model_with_error_logging() to be a ctx manager ( #27060 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-17 11:45:32 +08:00
Tao He
bde9e2272a
[Bugfix][Qwen] fixes the weights dtype in qwen3_next: it is actually a bfloat16 ( #27030 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
2025-10-17 03:37:52 +00:00
Boyuan Feng
08405609cc
disable graph partition in custom op ( #26952 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Boyuan Feng <fby.1994@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-17 11:08:47 +08:00
Nick Hill
ab81379ea6
[Perf] Exploit out-of-band buffers in shm_broadcast ( #26961 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-16 20:08:03 -07:00
Harry Mellor
4ffd6e8942
[Docs] Reduce custom syntax used in docs ( #27009 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 20:05:34 -07:00
Tomas Ruiz
965c5f4914
vllm bench serve shows num of failed requests ( #26478 )
...
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com >
2025-10-16 19:55:09 -07:00
Lukas Geiger
4d055ef465
Remove unused imports ( #26972 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-16 19:51:17 -07:00
Boyuan Feng
17c540a993
[torch.compile] fix simple inductor graph partition test ( #27050 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-16 21:09:36 -04:00
Cyrus Leung
4d4d6bad19
[Chore] Separate out vllm.utils.importlib ( #27022 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-17 00:48:59 +00:00
Lucia Fang
11ae016bd7
[torch.compile] Passing only necessary compilation config to inductor pass config ( #27041 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com >
2025-10-17 00:01:52 +00:00
jiahanc
41d3071918
[NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel ( #26714 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-16 16:20:25 -07:00
Harry Mellor
fb5e10d3fb
Refactor Transformers backend to use mixins ( #26906 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 21:50:39 +00:00
Bram Wasti
b2f78cbad4
[small][batch invariance] Rename the env and internal flags to simplify usage ( #26855 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
2025-10-16 21:40:25 +00:00
Wentao Ye
23583ee28c
[Bug] Add Assertion for random-input-len / random-output-len ( #26834 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-16 21:36:39 +00:00
Michael Goin
01c977e96d
[CI] Prune Quantization Tests and skip compilation ( #27038 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-16 17:26:35 -04:00
Wentao Ye
b3dda72c23
[Feature] Migrate DeepGEMM API from get_m_alignment_for_contiguous_layout to get_mk_alignment_for_contiguous_layout ( #26935 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-16 16:46:48 -04:00
Varun Sundar Rabindranath
fb0571b077
[GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels ( #25997 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-16 12:53:11 -07:00
Wentao Ye
2ed8b6b3d0
[Bug] Fix batch invariant test has to is ( #27032 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-16 19:45:14 +00:00
kimbochen
013abde6ef
Adding Warmup to Benchmark Serving ( #26943 )
...
Signed-off-by: Kimbo Chen <chentenghung@gmail.com >
2025-10-16 12:44:32 -07:00
Kyle Sayers
a5464dcf92
[Compressed Tensors] Always clone output for compile robustness ( #26849 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-16 19:29:59 +00:00
Mandy Li
ac3ed5a815
Support block size of 256 used by Intel HPU ( #26883 )
...
Signed-off-by: mandy-li <mandy.j.li@intel.com >
2025-10-16 15:10:57 -04:00
Andrew Xia
e6ba2000ae
[gpt-oss][1/N] EZ: refactor serving_responses for modularity ( #26948 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
2025-10-16 18:44:06 +00:00
Harry Mellor
aa255ff55a
Support set in the CLI generation ( #27031 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 18:07:18 +00:00
ZiTian Zhao
7bb736d00e
Fix Qwen2.5 VL image grid docstring ( #27033 )
...
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com >
2025-10-16 09:57:36 -07:00
Jee Jee Li
9f4e30904b
[Model] Fix Qwen3VL mm mapping ( #27027 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-16 09:45:59 -07:00
rongfu.leng
5afd3276df
[Feature] Add process_weights_after_loading to AttentionImpl ( #26870 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-10-16 08:02:30 -07:00
Tahsin Tunan
43721bc67f
[CI] Replace large models with tiny alternatives in tests ( #24057 )
...
Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 15:51:27 +01:00
Kay Yan
02d709a6f1
[docs] standardize Hugging Face env var to HF_TOKEN (deprecates HUGGING_FACE_HUB_TOKEN) ( #27020 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io >
2025-10-16 15:31:02 +01:00
Mark McLoughlin
4a510ab487
[NIXL] Improve request_finished() debug logs ( #25665 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-10-16 15:55:17 +02:00
Matthew Bonanni
314fa8abbf
[Attention] Tune CUTLASS MLA num_splits ( #26846 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-16 06:36:09 -07:00
Cyrus Leung
334535b6fb
[Benchmark] Show E2EL by default for pooling models ( #27014 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 12:47:09 +00:00
bogdanm
dcbb3f1871
[Bugfix] Correct LayerNorm epsilon parameter in modernbert.py ( #27008 )
...
Signed-off-by: bogdanm <152898065+bogdan01m@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-16 12:27:44 +00:00
Sungjae Lee
00417f4e44
[MISC] fix import violations for re and triton modules ( #26654 )
...
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com >
Co-authored-by: Mengqing Cao <cmq0113@163.com >
2025-10-16 03:38:27 -07:00
Lukas Geiger
ed344f4116
Cleanup code after Python 3.10 upgrade ( #26520 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-16 03:38:23 -07:00
CSWYF3634076
e51928793e
[Model][Bugfix] fix ernie45 vl run failed from shared experts optimization ( #26885 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-10-16 03:37:35 -07:00
Cyrus Leung
d2740fafbf
[Chore] Separate out vllm.utils.collections ( #26990 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 08:35:35 +00:00
Cyrus Leung
17838e50ef
[Benchmark] Use truncation by default for pooling benchmarks ( #26992 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 16:02:39 +08:00
Zhewen Li
44c8555621
[CI/Build] Fix AMD import failures in CI ( #26841 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-16 07:28:20 +00:00
Akash kaothalkar
f7d318de2b
[Hardware][CPU][PowerPC]Disable torch.compile() in toptopk sampling ( #26987 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-10-15 22:36:59 -07:00
Cyrus Leung
76f0d05bc6
[CI/Build] Update expected beam search output for Phi3V ( #26978 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 05:12:44 +00:00
Bram Wasti
7d8975de84
Deepseek-v3 Batch Invariant on 8xH100 ( #26609 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-15 22:06:02 -07:00
Vadim Gimpelson
785d8b6410
[PERF] Qwen3-next MTP speedup (change bool mask indexing to index_select / index_copy to reduce d2h) ( #26437 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-10-16 12:18:31 +08:00
Cyrus Leung
f6cdc9a02f
[Chore] Rename utils submodules ( #26920 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 03:58:13 +00:00
Chendi.Xue
509cdc0370
[DOC][XPU]update feature parity with Intel GPU ( #26954 )
...
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com >
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-15 20:07:10 -07:00
Richard Zou
9b6504c307
[BugFix] Work around graph partition x torch.compile cache issue ( #26956 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-10-15 20:06:11 -07:00
Angela Yi
e19b16dde6
[bugfix] Fix SP + PP without specifying compile size ( #26955 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-15 20:05:33 -07:00
ahao-anyscale
582f2c6be7
[BUG] Allow runai_streamer_sharded in config check ( #26958 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2025-10-15 20:05:14 -07:00
Michael Goin
f8a0acbdbe
[CI] Enable Blackwell Llama4 MoE tests ( #26731 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-15 21:02:57 -06:00
kliuae
1317034379
[ROCm][FEAT] Fuse DeepSeek shared experts into AITER fused_moe ops ( #24097 )
...
Signed-off-by: chenjun <junchen2@amd.com >
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com >
Co-authored-by: valarLip <103567126+valarLip@users.noreply.github.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2025-10-16 10:41:34 +08:00
InChang Jeong
0ecc553ee6
[Bugfix] reasoning_parser parameter handling in run_batch.py ( #26225 )
...
Signed-off-by: inc-jeong <inc.jeong@navercorp.com >
Signed-off-by: InChang Jeong <inc.jeong@navercorp.com >
Co-authored-by: USER <user@AL02367916.local >
2025-10-16 10:24:05 +08:00
felixzhu555
f96bc3649c
[Qwen3-Next] Add tuned MoE config for Qwen3-Next FP8 on H100 tp2 ( #26887 )
...
Signed-off-by: Felix Zhu <felixzhu555@gmail.com >
2025-10-15 18:55:05 -07:00
Alexei-V-Ivanov-AMD
938c43ea7f
[ci] Adjusting AMD test composition 2025-10-14 ( #26852 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-15 23:52:13 +00:00
Adrian Abeyta
0a9ef0cfce
Move query quantization to attention layer for Flashinfer & Triton. ( #26534 )
...
Signed-off-by: adabeyta <aabeyta@redhat.com >
Signed-off-by: Adrian Abeyta <aabeyta@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-15 19:01:38 -04:00
Wentao Ye
e5b438a247
[Bug] Temporally Disable VLLM_ALLREDUCE_USE_SYMM_MEM by Default ( #26925 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-15 16:18:50 -04:00
XiaobingZhang
0b99f5d302
support flashinfer_fp4 moe for 5090 gpu ( #26669 )
...
Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-15 15:06:47 -04:00
Benji Beck
1f491aa0c8
Vectorize RMS norm variance using vectorize_read_with_alignment ( #26234 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-15 11:54:41 -07:00
Kaixi Hou
de92d916fe
[NVIDIA] Add support for cudnn fp4 gemm via flashinfer ( #26107 )
...
Signed-off-by: kaixih <kaixih@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-10-15 13:53:00 -04:00
Woosuk Kwon
a1063628a4
[Chore] Clean up CODEOWNERS ( #26923 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-10-15 10:52:54 -07:00
XiaobingZhang
d796375258
[ModelOpt] Remove NVFP4 MoE K%16==0 constraint ( #26891 )
...
Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com >
2025-10-15 13:06:17 -04:00
Sam/Samuel
14f8456344
[Feature]: Use pydantic validation in observability.py config ( #26637 )
...
Signed-off-by: Samuel Wu <cernunnos1710@gmail.com >
Signed-off-by: Sam/Samuel <57896620+cern1710@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-15 16:44:03 +00:00
Pradeep Dasigi
4794c2bd92
Olmo 3 tool parser and tests ( #26143 )
...
Signed-off-by: Pradeep Dasigi <pradeepd@allenai.org >
2025-10-15 16:36:12 +00:00
Harry Mellor
d3cbaa08dc
Lower sevarity of log when model info cache misses due to exception ( #26917 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-15 09:01:09 -07:00
Cyrus Leung
828523ad8e
[Chore] Separate out vllm.utils.async_utils ( #26913 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 15:33:00 +00:00
Cyrus Leung
136a17fe6e
[Chore] Separate out vllm.utils.func ( #26904 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 13:03:58 +00:00
Boyuan Feng
f57438338d
[BugFix] Patch inductor memory plan logic ( #26878 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-15 12:51:45 +00:00
Max Wittig
5d598680e3
chore: remove unused marker ( #26890 )
...
Signed-off-by: Max Wittig <max.wittig@siemens.com >
2025-10-15 05:40:33 -07:00
wangxiyuan
8f4b313c37
[Misc] rename torch_dtype to dtype ( #26695 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-15 12:11:48 +00:00
Cyrus Leung
f93e348010
[Misc] Remove isort and yapf ignores ( #26888 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 12:09:03 +00:00
wang.yuqi
f54f85129e
[Model][2/N] Improve all pooling task | Support multi-vector retrieval ( #25370 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-15 11:14:41 +00:00
li2haipeng
d4d1a6024f
[Lora]Load tuned multi-lora kernel configs from json files ( #26319 )
...
Signed-off-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com >
Signed-off-by: Haipeng Li <li2haipeng@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-15 09:45:14 +00:00
wangxiyuan
db1764e4e0
[Platform] allow platform to init dp group ( #22243 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-15 02:32:17 -07:00
Jialin Ouyang
7f83b4ee8e
[Easy] Get rid of unnecessary paraenthesis in kv_cache_manager ( #26842 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-15 09:17:43 +00:00
ant-yy
5c3bae1a6a
[Fix] Remove divisibility requirement between num_kv_heads and tp_size in bailing_moe ( #26876 )
...
Signed-off-by: vito.yy <vito.yy@antgroup.com >
2025-10-15 16:44:04 +08:00
Xudong Ma
5210dc3940
[Misc] Update TritonLanguagePlaceholder to have attributes that are used by Flash Linear Attention ops. ( #26853 )
...
Co-authored-by: Xudong Ma <mxd@meta.com >
2025-10-15 08:37:49 +00:00
youkaichao
650b51f9f9
[doc] add Context Parallel Deployment doc ( #26877 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-10-15 16:33:52 +08:00
Cyrus Leung
6256697997
[Doc] ruff format remaining Python examples ( #26795 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 01:25:49 -07:00
Wentao Ye
71557a5f7c
[CI] Fix mypy for vllm/executor ( #26845 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-15 01:23:33 -07:00
Zhewen Li
f3c378ffa7
[CI/Build] Add Qwen2.5-VL-7B-Instruct ChartQA Accuracy Tests in CI ( #21810 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: zhewenli <zhewenli@meta.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com >
2025-10-15 08:09:56 +00:00
Yongye Zhu
f5ed68ef63
[Deepseek-V3.2][Kernel] Integrate cuda indexer k cache gather ( #26456 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
2025-10-15 16:05:01 +08:00
Angela Yi
efdef57b1f
[bugfix] Lazy import cv2 ( #26869 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-15 07:47:50 +00:00
Cyrus Leung
b8a4572157
[Misc] Use helper function to generate dummy messages in OpenAI MM tests ( #26875 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 07:17:37 +00:00
Mengqing Cao
302ef403a2
[DSA][MLA] Tiny refactor on DeepSeek to make it reusable for different backends ( #26656 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-10-15 00:16:44 -07:00
sangho.lee
8865da157b
[Bugfix][Multi Modal] Fix incorrect Molmo token processing ( #26873 )
...
Signed-off-by: sanghol <sanghol@allenai.org >
2025-10-15 07:13:59 +00:00
Boyuan Feng
f0862eae43
[Graph Partition] pass tests for decorator ( #26831 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-15 06:39:48 +00:00
Isotr0py
8c851f6d04
[Bugfix] Fix qwen3-omni audio truncation issue ( #26815 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-15 05:38:36 +00:00
Angela Yi
7cfa420f49
[BugFix] Patch inductor partitioning logic ( #26735 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-15 05:04:32 +00:00
rongfu.leng
a27b288e4a
[Feature] default --extra-body param to disable thinking in vllm bench serve ( #26784 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-10-15 04:23:44 +00:00
zhrrr
e471d7ca7e
[CI/Build][Bugfix] fix qutlass cmake error when set QUTLASS_SRC_DIR ( #26773 )
...
Signed-off-by: izhuhaoran <izhuhaoran@qq.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-15 04:09:44 +00:00
Michael Yao
c43ca8259e
[Docs] Move build.inc into arm.inc ( #26862 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-10-14 20:35:08 -07:00
Tao Hui
85a65e7f51
[Model] Add DeepSeek-V3.1 reasoning parser (split from PR #24972 ) ( #25589 )
...
Signed-off-by: taohui <taohui3@gmail.com >
Signed-off-by: Tao Hui <taohui3@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-10-15 11:09:52 +08:00
kourosh hakhamaneshi
a2986b3e33
[Bugfix] Fixes prefix-repetition benchmark script ( #26828 )
...
Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com >
2025-10-15 02:54:43 +00:00
Morrison Turnansky
96b9aa5aa0
[Frontend][torch.compile] CompilationConfig Overhaul ( #20283 ): name change compilation level to compilation mode, deprecation compilation level ( #26355 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com >
Signed-off-by: Morrison Turnansky <mturnans@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-15 02:51:16 +00:00
Michael Goin
e66d787bce
Disable FlashInfer sampler by default ( #26859 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-15 02:35:18 +00:00
Chendi.Xue
bfad142e25
[BUGFIX][NIXL] quick fix for 'assert self.connector_worker is not None' in get_kv_connector_stats ( #26851 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-15 02:33:25 +00:00
Zhikaiiii
9354660036
[Bugfix]fix Qwen3 xml tool parser ( #26345 )
...
Signed-off-by: Zhikaiiii <1658973216@qq.com >
2025-10-15 09:50:30 +08:00
Jialin Ouyang
07ca70af8d
[Core][Easy] Use envs.__getattr__ for all Unify to environment variable access ( #26810 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-15 01:41:18 +00:00
Luka Govedič
2dcd12d357
[torch.compile] Fix tests for torch==2.9 inductor partition ( #26116 )
...
Signed-off-by: ProExpertProg <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2025-10-14 19:55:02 -04:00
Tyler Michael Smith
579d2e5458
[WideEP][P/D] Add usage stats for DP+EP and KV Connector ( #26836 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-10-14 23:51:54 +00:00
Ye Hu
0512c04aee
[frontend][gptoss] Add per turn stats into Harmony Context ( #25061 )
...
Signed-off-by: lacora <hyelacora@gmail.com >
Co-authored-by: Ye Hu <yehu@fb.com >
2025-10-14 16:48:13 -07:00
Michael Goin
7e0ef4084a
[CI Failure] Fix torchao dep failure for Quantization Test ( #26824 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-14 16:41:43 -07:00
Nick Hill
4aed506b65
[Core] Streamline some structured output related code ( #26737 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-14 23:27:44 +00:00
Boyuan Feng
a86b4c58e8
remove attn output view kernel ( #26680 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Boyuan Feng <fby.1994@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 22:53:10 +00:00
Nick Hill
ff4810ba73
[Minor] Group async_scheduling related fields in model runner init ( #26736 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-14 14:46:37 -07:00
Nan Qin
9d6964926e
fix: response_format for completion ( #23212 )
...
Signed-off-by: Nan2018 <qinnanjoshua@gmail.com >
2025-10-14 21:23:22 +00:00
Dhruvil Bhatt
0e65818910
Added MoE configs for llama 4, H200 device with tp=4/8 tuning ( #26837 )
...
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com >
2025-10-14 14:21:03 -07:00
Jialin Ouyang
380f17527c
[Perf] Cache vllm.env.__getattr__ result to avoid recomputation ( #26146 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-14 17:03:21 -04:00
HDCharles
b92ab3deda
Notice for deprecation of AutoAWQ ( #26820 )
...
Signed-off-by: HDCharles <39544797+HDCharles@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 13:39:59 -07:00
Jialin Ouyang
acaa2c0a4a
[Core] Reuse empty block lists whenever possible in KVCacheBlocks to mitigate GC costs ( #24964 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-14 12:58:43 -07:00
Matthew Bonanni
82af928c41
[Attention][Spec Decode] FlashMLA spec decode support ( #26541 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-14 19:38:20 +00:00
Huamin Li
87efc681db
llama4_vision_rope: add HIP override to accept (q, k) and avoid (positions, q, k) mismatch ( #26790 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-14 11:54:12 -07:00