Commit Graph

346 Commits

Author SHA1 Message Date
Ilya Markov
8bdd8b5c51 Enable symmetric memory all reduce by default only enabling for TP (#25070)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-23 15:53:00 -04:00
Amir Samani
8c1c81a3de [core] add nccl symmetric memory for all reduce (#24532)
Signed-off-by: Amir Samani <asamani@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-23 14:33:06 -04:00
Lucas Wilkinson
cc1dc7ed6d [Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support (#24845)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-09-23 16:02:10 +00:00
Fanli Lin
4c966e440e [XPU] Fix MOE DP accuracy issue on XPU (#25465) 2025-09-23 14:32:57 +00:00
Chauncey
f05a4f0e34 [P/D] Support NIXL connector to disconnect during a clean shutdown (#24423)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
2025-09-23 16:08:02 +02:00
Chendi.Xue
5774b0a1da [NIXL][OOT platform] support nixl_connector with oot platform and other nixl_backend (#25121)
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
2025-09-23 04:17:42 +00:00
Lucia Fang
922979bfcc [DP] support torchrun external launcher with Data Parallelism (#24899)
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2025-09-22 12:06:05 -07:00
Csrayz
c10101a3eb [Bugfix] Fix several issues with p2p xPyD in GET type (#23993)
Signed-off-by: Csrayz <jover@cmbchina.com>
Signed-off-by: ivyilike <pww123@cmbchina.com>
Co-authored-by: ivyilike <pww123@cmbchina.com>
2025-09-22 14:53:13 +00:00
Woosuk Kwon
bc6e542d9f Remove V0 attention backends (#25351)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-21 16:03:28 -07:00
Nick Hill
a25ade5d47 [BugFix] Ensure appropriate guards in destructors (#25284)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-20 09:06:34 +08:00
Or Ozeri
c59a0eca42 [KV offload][4/N] Offloading KV connector (#22595)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-09-19 19:07:17 +00:00
Nicolò Lucchesi
a3d087adec [P/D][Nixl] Introduce KVTransferMetrics and aggregation strategy (#22188)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-09-19 11:09:14 +00:00
Shu Wang
2ea50e977a Enable Allgather/ReduceScatter backend for NaiveAllToAll (#23964)
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-18 15:52:58 +00:00
Hyogeun Oh (오효근)
b419937c78 [Docs] Fix warnings in mkdocs build (continued) (#25163)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-09-18 08:23:26 -07:00
Punitvara
05b044e698 [Doc] Fix cross-reference warnings (#25058)
Signed-off-by: Punit Vara <punitvara@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-18 02:05:16 -07:00
samzong
47f670b03b [Docs] improve code formatting and comments for eliminate griffe build warning. (#25010)
Signed-off-by: samzong <samzong.lu@gmail.com>
2025-09-17 07:31:20 -07:00
Sage Moore
567939953b [Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM (#23693)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-16 12:21:48 -04:00
cascade
17871983a2 [Bugfix] Fix sequence parallelism bug when enable pipeline parallelism (#24021)
Signed-off-by: cascade812 <cascade812@outlook.com>
2025-09-16 04:32:32 +00:00
Nicolò Lucchesi
2e41f5abca [XPU] Set consistent default KV cache layout (#24745)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-09-15 18:09:34 +08:00
Chao Lei
8de261b04a [P/D]kv_output_aggregator support P TP > D TP (#23917)
Signed-off-by: LCAIZJ <leichao139636@163.com>
Co-authored-by: leichao.lc <leichao.lc@antgroup.com>
2025-09-15 11:36:06 +02:00
TaoYu Chen
15b8fef453 Remove redundant assignment in xfer_buffers, This is a little fix (#24732)
Signed-off-by: ChenTaoyu-SJTU <ctynb@qq.com>
2025-09-13 08:11:59 +00:00
Hyogeun Oh (오효근)
9a8966bcc2 [Docs] Fix warnings in mkdocs build (continued) (#24791)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
2025-09-13 00:13:44 -07:00
dongluw
a5b84f1cbf [Core] Shared memory based object store for Multimodal data caching and IPC (#20452)
Signed-off-by: donglu <donglu@cohere.com>
2025-09-12 07:54:17 -07:00
Flora Feng
0377802c20 [Multimodal] Remove legacy multimodal fields in favor of MultiModalFeatureSpec (#24548)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2025-09-12 21:42:23 +08:00
Ilya Markov
1fdd5c42d7 [Kernels] Enable Torch Symmetric Memory All-Reduce By Default (#24111)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-11 09:45:31 -07:00
Jonathan Berkhahn
cc99baf14d [Misc] Make timeout passable in init_distributed_environment (#24522)
Signed-off-by: jberkhahn <jaberkha@us.ibm.com>
2025-09-10 15:41:12 -07:00
Chauncey
309d7aa401 [P/D] MultiConnector supports shutdown (#24425)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-09-09 21:24:11 -07:00
Yihua Cheng
b4a01aaf95 [KV Connector] More async support for get_num_new_matched_tokens (#23620)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
2025-09-09 21:23:37 -07:00
Didier Durand
46876dff32 [Doc]: fixing typos to improve docs (#24480)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-08 23:06:04 -07:00
Zebing Lin
82dfb12e52 [Core] Use sha256 bytes instead of BlockHash to reduce GC overhead (#23673)
Signed-off-by: linzebing <linzebing1995@gmail.com>
2025-09-08 21:34:37 -07:00
Harry Mellor
3e0d4a3475 Move KVTransferConfig from config/__init__.py to config/kv_transfer.py (#24434)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-08 20:30:32 -07:00
Harry Mellor
03dd652c16 Move KVEventsConfig from config/__init__.py to config/kv_events.py (#24433)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-08 06:41:27 -07:00
Didier Durand
f4962a6d55 [Doc]: fix typos in Python comments (#24417)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-08 00:22:16 -07:00
Chauncey
61aa4b2901 [P/D] Add a shutdown method to the Connector API (#22699)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-09-07 23:07:00 -07:00
Woosuk Kwon
4172235ab7 [V0 deprecation] Deprecate V0 Neuron backend (#21159)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-06 16:15:18 -07:00
yzds
ac201a0eaf [Feature] Support Decode Context Parallel (DCP) for MLA (#23734)
Signed-off-by: hongchao <hongchao@msh.team>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: hongchao <hongchao@msh.team>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-09-06 13:24:05 +08:00
Didier Durand
35bf193864 [Doc]: fix typos in Python comments (#24294)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-05 19:41:12 -07:00
liuzhenwei
e599e2c65e [XPU][P/D] Add XPU support in NixlConnector (#22436)
Signed-off-by: zhenwei <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2025-09-04 21:03:12 -07:00
WeiQing Chen
6adaed42f4 [Feature][P/D]: Optimize NIXL Connector xfer Launch (#23887)
Signed-off-by: ycyaw66 <497410282@qq.com>
Co-authored-by: ycyaw66 <497410282@qq.com>
2025-09-03 19:14:30 +00:00
bnellnm
e9b92dcd89 [Kernels] Overlap shared experts with send/recv (#23273)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-09-03 12:35:18 -04:00
Nicolò Lucchesi
f0c503f66e [Nixl] Heterogeneous TP support FlashInfer (#20189)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-09-03 15:19:54 +08:00
co63oc
1bd007f234 fix some typos (#24071)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-09-02 20:44:50 -07:00
Didier Durand
0235103cbb [Doc]: fix typos in Python comments (#24042)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-01 19:07:45 -07:00
Didier Durand
107284959a [Doc]: fix typos in Python comments (#24026)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-01 09:38:20 +00:00
Or Ozeri
14b4326b94 v1: Support KV events from connectors (#19737)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-09-01 01:13:21 +00:00
Ning Xie
68a349114f [Misc] enhance type hint for rearrange return value (#23519)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-30 06:43:33 -07:00
Chaojun Zhang
235c9db8a7 [XPU] support data parallel for MoE models on XPU (#22887)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
2025-08-29 09:23:04 +08:00
Yongye Zhu
082cc07ef8 DP/EP Support for gpt-oss with deepep-ht comm kernel on SM100 (#23608) 2025-08-27 17:33:21 -04:00
yzds
c7c80af084 fix pynccl reduce_scatter (#23648)
Co-authored-by: hongchao <hongchao@msh.team>
2025-08-26 18:21:11 -07:00
Zhonghua Deng
c3b0fd1ee6 [V1][P/D]P2pNcclConnector supports flashinfer (#23536)
Signed-off-by: Abatom <abzhonghua@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-08-26 22:56:16 +00:00