Shu Wang
|
2ea50e977a
|
Enable Allgather/ReduceScatter backend for NaiveAllToAll (#23964)
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-18 15:52:58 +00:00 |
|
Hyogeun Oh (오효근)
|
b419937c78
|
[Docs] Fix warnings in mkdocs build (continued) (#25163)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
|
2025-09-18 08:23:26 -07:00 |
|
Punitvara
|
05b044e698
|
[Doc] Fix cross-reference warnings (#25058)
Signed-off-by: Punit Vara <punitvara@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-18 02:05:16 -07:00 |
|
Sage Moore
|
567939953b
|
[Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM (#23693)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-09-16 12:21:48 -04:00 |
|
dongluw
|
a5b84f1cbf
|
[Core] Shared memory based object store for Multimodal data caching and IPC (#20452)
Signed-off-by: donglu <donglu@cohere.com>
|
2025-09-12 07:54:17 -07:00 |
|
Ilya Markov
|
1fdd5c42d7
|
[Kernels] Enable Torch Symmetric Memory All-Reduce By Default (#24111)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-11 09:45:31 -07:00 |
|
Woosuk Kwon
|
4172235ab7
|
[V0 deprecation] Deprecate V0 Neuron backend (#21159)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-06 16:15:18 -07:00 |
|
Didier Durand
|
35bf193864
|
[Doc]: fix typos in Python comments (#24294)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-05 19:41:12 -07:00 |
|
bnellnm
|
e9b92dcd89
|
[Kernels] Overlap shared experts with send/recv (#23273)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-09-03 12:35:18 -04:00 |
|
Didier Durand
|
0235103cbb
|
[Doc]: fix typos in Python comments (#24042)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-01 19:07:45 -07:00 |
|
Didier Durand
|
107284959a
|
[Doc]: fix typos in Python comments (#24026)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-01 09:38:20 +00:00 |
|
Chaojun Zhang
|
235c9db8a7
|
[XPU] support data parallel for MoE models on XPU (#22887)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
|
2025-08-29 09:23:04 +08:00 |
|
Yongye Zhu
|
082cc07ef8
|
DP/EP Support for gpt-oss with deepep-ht comm kernel on SM100 (#23608)
|
2025-08-27 17:33:21 -04:00 |
|
yzds
|
c7c80af084
|
fix pynccl reduce_scatter (#23648)
Co-authored-by: hongchao <hongchao@msh.team>
|
2025-08-26 18:21:11 -07:00 |
|
Ilya Markov
|
0313cf854d
|
[PERF] PyTorch Symmetric Memory All-Reduce (#20759)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-22 15:39:08 -06:00 |
|
Chengji Yao
|
e9d6a3db69
|
[TPU] make ptxla not imported when using tpu_commons (#23081)
Signed-off-by: Chengji Yao <chengjiyao@gmail.com>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@gmail.com>
|
2025-08-19 11:46:42 +08:00 |
|
bnellnm
|
8ad7285ea2
|
[Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. (#22035)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-15 14:46:00 -04:00 |
|
Ilya Markov
|
1d20c34717
|
[CI] Fix tests/distributed/test_ca_buffer_sharing.py (#22849)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-08-13 20:09:30 -07:00 |
|
Shu Wang
|
b2c8ce57c6
|
Fix Flashinfer CUTLASS MOE Allgather (#21963)
Signed-off-by: Shu Wang <shuw@nvidia.com>
|
2025-08-07 19:18:25 -07:00 |
|
Ning Xie
|
74333ae2f6
|
[Misc] correct static type check for GroupCoordinator (#21946)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-05 03:17:46 -07:00 |
|
Ning Xie
|
7de45db9a5
|
[Misc] update doc comment for send (#22026)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-03 00:55:20 -07:00 |
|
Rui Qiao
|
d331759488
|
Introduce RayPPCommunicator for ray-based PP (#21660)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-08-01 11:50:58 -07:00 |
|
Li, Jiang
|
a15a50fc17
|
[CPU] Enable shared-memory based pipeline parallel for CPU backend (#21289)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-21 09:07:08 -07:00 |
|
Woosuk Kwon
|
4de7146351
|
[V0 deprecation] Remove V0 HPU backend (#21131)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-17 16:37:36 -07:00 |
|
Trevor Morris
|
a8593237c0
|
Add pynccl all-gatherv and reducescatterv (#20154)
Signed-off-by: Trevor Morris <tmorris@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 18:59:23 -07:00 |
|
Varun Sundar Rabindranath
|
53fa457391
|
[Misc] Add unit tests for MoE ModularKernel combinations + Profiling utility (#20449)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-11 07:51:46 -07:00 |
|
Liangliang Ma
|
a3e4e85ece
|
[XPU][CI] enhance xpu test support (#20652)
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai>
|
2025-07-09 16:53:09 +00:00 |
|
Wentao Ye
|
4d36693687
|
[Refactor] Create a function util and cache the results for has_deepgemm, has_deepep, has_pplx (#20187)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-28 22:06:38 +00:00 |
|
li haoyang
|
0740e29b66
|
[Feature] add quick all reduce (#19744)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-06-26 20:54:24 -07:00 |
|
Zhonghua Deng
|
eccdc8318c
|
[V1][P/D] An native implementation of xPyD based on P2P NCCL (#18242)
Signed-off-by: Abatom <abzhonghua@gmail.com>
|
2025-06-18 06:32:36 +00:00 |
|
Varun Sundar Rabindranath
|
5cf2daea9a
|
[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. (#19298)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun <vsundarr@redhat.com>
|
2025-06-09 10:50:39 -04:00 |
|
Povilas Kanapickas
|
85e2b7bb13
|
[MISC][Bugfix] Use less CPU when message queue has been empty for some time (#16226)
Signed-off-by: Povilas Kanapickas <povilas@radix.lt>
|
2025-06-05 16:53:08 +00:00 |
|
Tyler Michael Smith
|
d459fae0a2
|
[Bugfix][EP+DP] Fix internode check (#19112)
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
|
2025-06-04 23:39:23 +08:00 |
|
Varun Sundar Rabindranath
|
fa98d77773
|
[Kernel] DeepEP dispatch-combine kernel integration (#18434)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-06-03 12:30:02 -07:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
Tyler Michael Smith
|
8a57872b2a
|
[Bugfix][EP+DP] Use pplx-kernel internode instead of intranode (#19034)
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-06-03 11:36:51 +08:00 |
|
youkaichao
|
6a7988c55b
|
Refactor pplx init logic to make it modular (prepare for deepep) (#18200)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-05-23 23:43:43 +08:00 |
|
Li, Jiang
|
93f71673ce
|
[BugFix][CPU] Fix x86 SHM distributed module initialization (#18536)
Signed-off-by: jiang.li <jiang1.li@intel.com>
|
2025-05-22 07:35:00 -07:00 |
|
Russell Bryant
|
6e0fd34d3c
|
[CI] Fix race condition with StatelessProcessGroup.barrier (#18506)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-21 20:19:13 -07:00 |
|
Siyuan Liu
|
48ac2bed5b
|
[Hardware][TPU] Optionally import for TPU backend (#18269)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Co-authored-by: Carol Zheng <cazheng@google.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
Co-authored-by: Hongmin Fan <fanhongmin@google.com>
|
2025-05-17 15:23:12 +08:00 |
|
Lucia Fang
|
3d2779c29a
|
[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 (#17827)
Signed-off-by: Lucia Fang <fanglu@fb.com>
|
2025-05-15 22:28:27 -07:00 |
|
Harry Mellor
|
dc372b9c8a
|
Update deprecated type hinting in vllm/device_allocator and vllm/distributed (#18126)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-14 04:07:57 -07:00 |
|
youkaichao
|
6266c57bae
|
[core][distributed] add ep group and all2all interface (#18077)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-05-14 10:46:49 +08:00 |
|
Isotr0py
|
6e4a93e3f7
|
[Bugfix][CPU] Fix broken AVX2 CPU TP support (#17252)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-09 08:55:14 +00:00 |
|
Tyler Michael Smith
|
68e1ee0072
|
[Bugfix][Easy] Fix whitespace in shm_broadcast.py logging (#17635)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-05-04 19:20:19 -07:00 |
|
Russell Bryant
|
a0304dc504
|
[Security] Don't bind tcp zmq socket to all interfaces (#17197)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-28 10:08:20 -07:00 |
|
cascade
|
690fe019f0
|
[Feature] support sequence parallelism using compilation pass (#16155)
Signed-off-by: cascade812 <cascade812@outlook.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-04-27 06:29:35 -07:00 |
|
Robert Shaw
|
2b05b8ce69
|
[V1][Frontend] Improve Shutdown And Logs (#11737)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com>
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-16 19:48:34 -07:00 |
|
Chengji Yao
|
01b6113659
|
[TPU] optimize the all-reduce performance (#15903)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-04-03 00:25:14 +00:00 |
|
Li, Jiang
|
550b2801ad
|
[CPU][Bugfix] Using custom allreduce for CPU backend (#15934)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-04-02 07:46:47 -07:00 |
|