Wentao Ye
3d2a026fd0
[Feature] Pipeline Parallel Async send/recv, 2.9% E2E throughput improvement ( #33368 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2026-02-13 16:38:16 +08:00
Martin Hickey
47e9b63e1a
[KVConnector] Clean up redundant code in KV connectors ( #34147 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
2026-02-13 00:14:30 -08:00
Jaewon
4453ba8d9e
[Core] Profiler improvements and lazy initialization ( #33198 )
...
Signed-off-by: Jaewon Lee <jaewon@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-12 16:16:38 -08:00
Ilya Markov
bb2fc8b5e7
[BugFix] Fix async EPLB hang with DeepEP LL all2all backend ( #32860 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
2026-02-10 22:34:47 +00:00
Ilya Markov
67132945bb
[Perf] Move eplb rebalance algo to async thread ( #30888 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2026-02-10 22:19:10 +00:00
Qi Wang
33bcd3dc3b
[Misc] Introduce ec_both role EC (encoder cache) connector ( #34182 )
...
Signed-off-by: Qi Wang <qiwa@nvidia.com >
2026-02-10 18:55:35 +00:00
Zetong Li
5f970120f0
[Bugfix] Fix memory inconsistency in cross-process shared memory ( #32022 )
...
Signed-off-by: Zetong Li <slippersss@126.com >
2026-02-10 08:22:03 +00:00
Yuwei An
e94ec59733
[LMCache] Token Base IPC API ( #34175 )
...
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com >
2026-02-10 01:18:42 +00:00
ZhengHongming888
cb62e86f83
Add NUMA Core binding in nixl_connector for CPU xPyD ( #32365 )
...
Signed-off-by: Hongming Zheng <hongming.zheng@intel.com >
Signed-off-by: ZhengHongming888 <hongming.zheng@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-09 15:39:12 +00:00
Seiji Eicher
aca5967416
[KV Connector] Add missing method overrides to MultiConnector ( #33292 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2026-02-06 12:58:21 -05:00
zackyoray
1ee95841bd
[Bugfix] Fix swapped engine_ids in NIXL Llama 4 local attention path ( #33795 )
...
Signed-off-by: Yoray Zack <yorayz@nvidia.com >
2026-02-05 17:51:58 +00:00
Nicolò Lucchesi
7d8c6804e2
[Misc] Add debug logs ( #33931 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-05 09:42:40 -08:00
Aaron Hao
c1858b7ec8
[Feat][RL][1/2] Native Weight Syncing API: NCCL ( #31943 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: Aaron Hao <ahao@anyscale.com >
Co-authored-by: SumanthRH <sumanthrh99@gmail.com >
2026-02-05 12:13:23 -05:00
liranschour
8322d4e47f
Enable Cross layers KV cache layout at NIXL Connector V2 ( #33339 )
...
Signed-off-by: Liran Schour <lirans@il.ibm.com >
Signed-off-by: liranschour <liranschour@users.noreply.github.com >
Co-authored-by: Or Ozeri <or@ozery.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-05 02:17:02 -08:00
Sage Moore
ce498a6d61
Change the type signature of MixtureOfExperts.expert_weights to MutableSequence[Sequence[Tensor]] ( #33573 )
...
Signed-off-by: Sage Moore <sagmoore@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-04 17:02:46 -05:00
Micah Williamson
1d367a738e
[Bugfix][ROCm] Include float8_e4m3fnuz in NCCL Dtype Dispatching ( #33713 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-02-04 05:36:29 -08:00
dtc
0d6ccf68fa
[P/D] rework mooncake connector and introduce its bootstrap server ( #31034 )
...
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2026-02-03 08:08:25 -08:00
杨朱 · Kiki
1a7894dbdf
[Misc] Replace Optional[X] with X | None syntax ( #33332 )
...
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-30 01:56:59 -08:00
Li, Jiang
8311f083bd
[Bugfix][CPU] Fix thread num for shared memory communication ( #33317 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: Li, Jiang <bigpyj64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-29 03:26:58 -08:00
Ilya Markov
d09135fbd0
[BugFix] Async Eplb fix potential race condition ( #32881 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
2026-01-29 10:31:40 +00:00
Angela Yi
4197168ea5
[ez] Remove checks for torch version <= 2.8 ( #33209 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-01-28 16:03:56 -05:00
Or Ozeri
2e8de86777
Revert "Enable Cross layers KV cache layout at NIXL Connector ( #30207 )" ( #33241 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
2026-01-28 04:36:00 -08:00
Nicolò Lucchesi
492a7983dd
[Bugfix] Fix DeepseekV32 AssertionError: num_kv_heads == 1 ( #33090 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-27 15:03:20 +00:00
Matthew Bonanni
a608b4c6c2
[5/N][Attention] Finish eliminating vllm/attention folder ( #32064 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-27 10:02:51 -05:00
omerpaz95
7227d06156
[Metrics] [KVConnector] Add Offloading Connector metrics ( #27942 )
...
Added queries and hits metrics for the Offloading Connector.
Also added timing metrics for store and load operations, which take the
average time it takes to load/store, per-token.
The metrics are available from Prometheus and from the StatLogger.
Signed-off-by: omerpaz95 <omerpaz95@gmail.com >
Co-authored-by: Omer Paz <Omer.Paz@ibm.com >
2026-01-27 13:34:49 +00:00
Robert Shaw
5a93b9162b
[MoE Refactor] Integrate Naive Prepare Finalize into MK ( #32567 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: amirkl94 <203507526+amirkl94@users.noreply.github.com >
2026-01-27 01:28:02 +00:00
Robert Shaw
43a013c3a2
[Bugfix] Fix Dtypes for Pynccl Wrapper ( #33030 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-26 20:09:32 +00:00
Chauncey
9ef3b718d9
[Bugfix] Fix Can't instantiate abstract class DeepseekV32IndexerBackend ( #33052 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-26 06:44:02 -08:00
Fadi Arafeh
744ef30484
[CPU Backend] [Perf] Accelerate tensor-parallel/data-parallel inference across NUMA domains on Arm ( #32792 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-01-22 18:55:23 +00:00
liranschour
64e3d67ac0
Enable Cross layers KV cache layout at NIXL Connector ( #30207 )
...
Signed-off-by: Liran Schour <lirans@il.ibm.com >
Signed-off-by: liranschour <liranschour@users.noreply.github.com >
Co-authored-by: Or Ozeri <or@ozery.com >
2026-01-22 10:12:58 +00:00
Alex Sun
49a1262267
[AMD][ROCm] MoRI EP: a high-performance all2all backend ( #28664 )
...
Signed-off-by: Alex Sun <alex.s@amd.com >
2026-01-22 16:33:18 +08:00
knlnguyen1802
378385b90c
[EC Connector] Optimize remote cache check in scheduler ( #32585 )
...
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com >
2026-01-22 03:30:59 +00:00
Or Ozeri
7013e9ac8f
OffloadingConnector: Prevent redundant loads ( #29087 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-21 01:15:42 +00:00
YiSheng5
13f6630a9e
[XPU]Support AgRsAll2AllManager on XPU device ( #32654 )
...
Signed-off-by: yisheng <yi.sheng@intel.com >
2026-01-20 14:27:24 +00:00
Walter Beller-Morales
8be263c3fb
[Core] Cleanup shm based object store on engine shutdown ( #32429 )
...
Signed-off-by: walterbm <walter.beller.morales@gmail.com >
2026-01-20 08:53:37 +00:00
qli88
a0490be8f1
[CI][amd] Revert NIXL connector change to avoid crash ( #32570 )
...
Signed-off-by: Qiang Li <qiang.li2@amd.com >
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-01-19 18:39:16 +00:00
Nicolò Lucchesi
758df5afe7
[NIXL][Metrics] Track nixl_num_kv_expired_reqs metric in Prometheus ( #32340 )
...
Add a new metric to track the number of requests that had their KV blocks
expire. The scenario is particularly important to surface and track as it is a
vital indicator of the health of the deployment.
Currently we're resorting to track these failures through unstructured log
parsing (which is, among other thing, error string dependent); current main:
> Releasing expired KV blocks for request cmpl-071d which were retrieved by 0 decode worker(s) within 0 seconds.
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-19 12:28:27 +00:00
Deming
5480c6b1fa
[Doc] Correct comment for _jobs dict in OffloadingConnectorWorker ( #32556 )
2026-01-18 12:46:00 -08:00
bnellnm
327a02d8db
[MoE Refactor] Separate Router into OO Classes ( #30623 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-01-18 11:40:49 -05:00
Ilya Markov
c9a533079c
[EPLB][BugFix]Possible deadlock fix ( #32418 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2026-01-16 09:11:01 -05:00
kzwrime
edadca109c
[Bugfix] Add CpuCommunicator.dispatch and combine to fix DP+MoE inference ( #31867 )
...
Signed-off-by: kunzh <zhikun.wu@outlook.com >
2026-01-15 04:50:48 +00:00
Angela Yi
7933638051
[misc] Remove is_torch_equal_or_newer(2.4) cases ( #32296 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-01-13 23:22:07 -08:00
Sage Moore
6beef12b9b
[EPLB][Cleanup] Remove is_async_enabled from EplbModelState ( #32050 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2026-01-13 18:19:03 +00:00
Matthew Bonanni
2263d44b68
[4/N][Attention] Move MLA common to model_executor ( #32060 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2026-01-13 09:08:45 -08:00
Mathis Felardos
4f3676e726
nixl_connector: export UCX_MEM_MMAP_HOOK_MODE=none to avoid a UCX memory leak ( #32181 )
...
Signed-off-by: Mathis Felardos <mathis@mistral.ai >
2026-01-13 16:21:10 +00:00
Martin Hickey
510265472c
[BugFix] [KVConnector] Fix KV events for LMCache connector ( #32169 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-13 15:50:34 +00:00
Nicolò Lucchesi
f8bd8394e3
[NIXL][Bugfix] Failure logging overhaul + early metadata free on failure ( #32031 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-12 20:38:49 +00:00
Lucas Kabela
ad8818bb5e
[Misc][BE] Type coverage for vllm/compilation [3/3] ( #31748 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-01-12 19:24:38 +00:00
Ilya Markov
1eb61ab34b
[Refactor] EPLB rebalance algo to NumPy ( #30697 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
2026-01-12 18:13:23 +00:00
Nicolò Lucchesi
5b68107411
[Misc][PD] Fix get_attn_backend usage in transfer connectors ( #31988 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-12 18:10:05 +01:00