Commit Graph

28 Commits

Author SHA1 Message Date
Jeffrey Wang
ab79863e6c Remove MQ multi-node tests (#38934)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
2026-04-03 20:00:08 +00:00
Jeffrey Wang
de5e6c44c6 [Feat][Executor] Introduce RayExecutorV2 (#36836)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
2026-04-01 14:34:29 -07:00
Nicolò Lucchesi
1cbbcfe8a3 [CI][PD] Add Hybrid SSM integration tests to CI (#37657)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-03-23 23:58:19 +08:00
Flora Feng
b4c1aef21c [Refactor] Relocate tests from tests/v1/entrypoints/ to tests/entrypoints/ (#37500)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-20 02:50:34 -07:00
Andreas Karatzas
bd8c4c0752 [CI] Removing deprecated rlhf examples reference (#37585)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-20 15:20:33 +08:00
TJian
da70c87e81 [CI] Fix wrong path test file, missing rlhf_async_new_apis.py (#37532)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2026-03-19 02:21:55 -07:00
Aaron Hao
47a1f11bff [docs] Add docs for new RL flows (#36188)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-18 09:04:26 +00:00
Avinash Singh
c5030c439d [CI] Split Distributed Tests (4 GPUs) and Kernel MoE tests (#37100)
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Signed-off-by: Avinash Singh  <107198269+avinashsingh77@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
2026-03-17 11:44:55 -07:00
Kevin H. Luu
74fe80ee95 [CI] Split Distributed Tests (4 GPUs) into 3 parallel jobs (#37015)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 12:21:13 +08:00
Aaron Hao
d6b61e5166 [BUG] Fix async rlhf tests (#35811)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2026-03-11 18:06:10 -04:00
zhanqiuhu
90f3c01fa4 [Spec Decode][KV Connector] Fix KV transfer in PD + speculative decoding (#35158)
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2026-03-06 08:50:44 +01:00
Yongye Zhu
86e1060b17 [Bugfix] Fix inner_dp_world initialization order for multi-node TP (#35892)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2026-03-05 22:04:44 -08:00
Robert Shaw
6521ccf286 [CI] Temporarily Disable Nightly Failures (#35770)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-03-03 01:49:13 +00:00
Aaron Hao
2ce6f3cf67 [Feat][RL][2/2] Native Weight Syncing API: IPC (#34171)
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2026-02-27 13:45:21 -07:00
Robert Shaw
ea97750414 [CI] Fix Distributed Tests (#35236)
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
2026-02-24 22:31:56 +00:00
Aaron Hao
596ed1f02e [RL] Validation for pause_mode='keep' (#34992)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2026-02-23 16:30:56 -05:00
Andreas Karatzas
d403c1da1c [CI] Stabilizing ROCm amd-ci signal and minor name fix in upstream (#35008)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-22 04:01:10 +00:00
Robert Shaw
25e2e136ef [CI] temporarily disable multi-node tests (#34825)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-02-18 11:32:44 -05:00
Nicolò Lucchesi
8e962fef5f [CI][Nixl] Add CrossLayer KV layout tests (#34615)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-02-17 21:35:40 +08:00
Aaron Hao
c1858b7ec8 [Feat][RL][1/2] Native Weight Syncing API: NCCL (#31943)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Co-authored-by: SumanthRH <sumanthrh99@gmail.com>
2026-02-05 12:13:23 -05:00
Luka Govedič
4d9513537d [CI][torch.compile] Reduce e2e fusion test time (#33293)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: ProExpertProg <luka.govedic@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-02-04 19:09:03 -05:00
Kevin H. Luu
5d3d6e44e8 [CI] minor fixes to pipeline generator and tests (#33151)
Signed-off-by: khluu <khluu000@gmail.com>
2026-01-27 17:04:02 -08:00
Kevin H. Luu
ebe0ba91db [ci] Sync test areas with test-pipeline.yaml and enable new pipeline generator (#33080)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: Kevin Luu <khluu@Kevins-MacBook-Pro.local>
2026-01-26 12:28:20 -08:00
Nicolò Lucchesi
83e1c76dbe [CI][ROCm] Fix NIXL tests on ROCm (#31728)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-01-09 01:34:43 +08:00
Lucas Wilkinson
7e065eba59 [CI] Fix "2 Node Tests (4 GPUs in total)" (#31090)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-22 10:32:40 +08:00
Lucas Wilkinson
ae0770fa6b [CI] Fix H200 Distributed test (#31054)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-20 16:48:49 -05:00
Elizabeth Thomas
41b6f9200f Remove all2all backend envvar (#30363)
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-18 19:46:28 +00:00
Kevin H. Luu
db14f61f2d [ci] Refactor CI file structure (#29343) 2025-12-08 17:25:43 -09:00