Commit Graph

54 Commits

Author SHA1 Message Date
Harry Mellor
61e632aea1 Turn @config into a dataclass_transform (#31541)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-03 17:40:59 +00:00
杨朱 · Kiki
b95cc5014d [Misc] Remove deprecated VLLM_ALL2ALL_BACKEND environment variable (#33535)
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 15:01:59 +08:00
Harry Mellor
d334dd26c4 Move decode context parallel validationn to ParallelConfig (#33239)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-01-30 06:18:41 +00:00
Alex Sun
49a1262267 [AMD][ROCm] MoRI EP: a high-performance all2all backend (#28664)
Signed-off-by: Alex Sun <alex.s@amd.com>
2026-01-22 16:33:18 +08:00
Tomas Ruiz
4a5299c93f feat: spec decode with draft models (#24322)
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>
2026-01-19 16:05:46 -05:00
dtc
1e584823f8 [Bugfix] Strengthen the check of X-data-parallel-rank in Hybrid LB mode (#32314)
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
2026-01-15 16:31:16 +08:00
Nick Hill
9273a427b5 [Misc] Allow enabling NCCL for DP sync when async scheduling (#32197)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-13 02:03:08 +00:00
Ilya Markov
6170d47d22 [EPLB] Optimize EPLB with numpy (#29499)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2026-01-07 15:21:35 -05:00
Nick Hill
bd877162eb [BugFix] Support online dense model DP without overhead (#30739)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: njhill <nickhill123@gmail.com>
2026-01-02 23:36:38 +08:00
Ryan Rock
ddfac7034e [CI/Build] Ignore data_parallel_size_local (#30281)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
2025-12-24 07:40:54 +00:00
Elizabeth Thomas
41b6f9200f Remove all2all backend envvar (#30363)
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-18 19:46:28 +00:00
jiangkuaixue123
b9ff4f2a8d [feature] extend DBO to XBO (#30120)
Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com>
Co-authored-by: root <root@hk01dgx028.cm.cluster>
2025-12-16 00:04:01 -05:00
Qiu
a11f4a81e0 [Misc][PCP&DCP] relocate PCP feature check (#30050)
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-11 03:36:18 -08:00
Harry Mellor
bf4a901af9 Better error when world size is larger than node and distributed_executor_backend is not set (#30140)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-05 20:53:52 -08:00
Mercykid-bash
1119f6e47a Abstract eplb algo (#26471)
Signed-off-by: Che Ruan <cr623@ic.ac.uk>
Signed-off-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com>
Signed-off-by: Mercykid-bash <ruanche0218@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Che Ruan <cr623@ic.ac.uk>
Co-authored-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-04 19:09:09 +00:00
Xieyang Xu
ad32e3e19c enable multi-node in external launcher mode (#29833) 2025-12-03 17:02:02 -08:00
Didier Durand
66d3d5422c [Doc]: fixing typos in diverse files (#29492)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-27 07:15:50 -08:00
Lucia Fang
d8819c88eb fix assertion for single world use case (uni) (#29429)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
2025-11-26 00:14:23 +00:00
Harry Mellor
a4ad43ad5a Scheduled removal of ParallelConfig's direct child EPLB fields (#29324)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-25 01:58:58 +00:00
WeiQing Chen
2601f18a82 [EPLB] Optimize EPLB for Async Rearrange Experts (#22179)
Signed-off-by: David Chen <530634352@qq.com>
Co-authored-by: SunChenxiang123 <1291824390@qq.com>
2025-11-24 09:08:29 -05:00
Qiu
2fd893b4ce [Feature] Prefill Context Parallel (PCP) basic support (#28718)
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>
Signed-off-by: LookAround <lixushi@huawei.com>
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com>
Co-authored-by: LookAround <lixushi@huawei.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
2025-11-19 15:52:44 -05:00
vnadathur
1ffe934c8a [torch.compile] caching of config fields should be opt-out by default (#26468)
Signed-off-by: vnadathur <glvikramn@gmail.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com>
Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com>
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-19 06:13:54 -08:00
Lucia Fang
b316ac6589 [V1] Support MP Executor for multi node distributed inference (#23691)
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-11-16 09:01:21 +00:00
PerryZhang01
a1e7fa362a [EPLB][ROCm]: support EPBL for ROCm backend (#27731)
Signed-off-by: Perry Zhang <perzhang@amd.com>
Co-authored-by: Perry Zhang <perzhang@amd.com>
2025-11-12 18:16:35 +00:00
zhangsicheng5
2108a571d7 [DCP] Support dcp kv_cache interleave size > 1 (#26696)
Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com>
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: Qiu <qiuchunshuo@huawei.com>
Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>
2025-11-09 04:45:27 +09:00
Sairam Pillai
74374386e2 [Bugfix] Improve GPU validation logging in Ray fallback scenarios (#25775)
Signed-off-by: Sairam Pillai <sairam.pillai61@gmail.com>
2025-10-30 11:57:59 +00:00
Nick Hill
647214f3d5 [V0 Deprecation] Remove V0 executors (#27142)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-21 11:09:37 -07:00
iAmir97
7a6c8c3fa1 [Chore] Separate out vllm.utils.network_utils (#27164)
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com>
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
2025-10-19 03:06:32 -07:00
Isotr0py
6ac5e06f7c [Chore] Clean up pytorch helper functions in vllm.utils (#26908)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
2025-10-18 09:48:22 -07:00
Bram Wasti
b2f78cbad4 [small][batch invariance] Rename the env and internal flags to simplify usage (#26855)
Signed-off-by: Bram Wasti <bwasti@meta.com>
2025-10-16 21:40:25 +00:00
Bram Wasti
7d8975de84 Deepseek-v3 Batch Invariant on 8xH100 (#26609)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-10-15 22:06:02 -07:00
wangxiyuan
db1764e4e0 [Platform] allow platform to init dp group (#22243)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-10-15 02:32:17 -07:00
Michael Goin
3e051bda82 [UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend (#26732)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-13 18:12:52 -07:00
Harry Mellor
8fcaaf6a16 Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Simon Danielsson
92be3f3517 [Feature] Use pydantic validation in parallel.py config (#26417)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-09 12:41:31 +00:00
Ayush Satyam
5e65d6b2ad fix[DP][v1]: Prevent hangs from mismatched worker configurations (#26218)
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com>
2025-10-07 22:55:08 -07:00
Sage Moore
2111b4643c [Core] Simplify the Dp padding/should ubatch coordination logic (#25768)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-10-07 01:57:49 +00:00
Harry Mellor
d6953beb91 Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Tyler Michael Smith
a5354b3ed2 [Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models (#24982)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-09-27 14:22:28 +00:00
Lucas Wilkinson
cc1dc7ed6d [Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support (#24845)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-09-23 16:02:10 +00:00
Lucia Fang
922979bfcc [DP] support torchrun external launcher with Data Parallelism (#24899)
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2025-09-22 12:06:05 -07:00
Cyrus Leung
6c117cff7d [Frontend] Pass API server count to each process (#23717)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-20 01:15:19 +08:00
Sage Moore
567939953b [Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM (#23693)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-16 12:21:48 -04:00
Chen Bruce
7ea5c73ad7 [Feat][EPLB] A novel static EPLB placement strategy for MoE models. (#23745)
Signed-off-by: bruceszchen <bruceszchen@tencent.com>
Signed-off-by: Chen Bruce <bruceszchen@tencent.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Chen Bruce <cszwwdz@vip.qq.com>
Co-authored-by: lemon412 <lemon412@foxmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-16 10:55:16 +00:00
Tyler Michael Smith
561f38dc3c [Bugfix] Improve EPLB config validation error message (#24524)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-09-10 00:32:36 +00:00
Woosuk Kwon
4172235ab7 [V0 deprecation] Deprecate V0 Neuron backend (#21159)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-06 16:15:18 -07:00
yzds
ac201a0eaf [Feature] Support Decode Context Parallel (DCP) for MLA (#23734)
Signed-off-by: hongchao <hongchao@msh.team>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: hongchao <hongchao@msh.team>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-09-06 13:24:05 +08:00
co63oc
1bd007f234 fix some typos (#24071)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-09-02 20:44:50 -07:00
rongfu.leng
1b9b16649c [Misc] update dict parse to EPLBConfig from json dumps to dict unpacking (#23305)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-08-24 08:06:34 +00:00
22quinn
480bdf5a7b [Core] Support custom executor qualname (#23314)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-08-22 09:40:54 +08:00