Commit Graph

1174 Commits

Author SHA1 Message Date
Itay Alroy
dea268336f [1/N] Elastic EP Milestone 2 (#34861)
Signed-off-by: Yongji Wu <wuyongji317@gmail.com>
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com>
Co-authored-by: Yongji Wu <wuyongji317@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>
2026-02-28 04:46:42 +00:00
Aaron Hao
2ce6f3cf67 [Feat][RL][2/2] Native Weight Syncing API: IPC (#34171)
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2026-02-27 13:45:21 -07:00
Michael Goin
4fec53cfcb [CI] Actually run tests/kernels/quantization/test_block_fp8.py in CI (#34274) 2026-02-26 17:58:03 -07:00
Tyler Michael Smith
eb19955c37 [WideEP] Remove pplx all2all backend (#33724)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 14:30:10 -08:00
Yongye Zhu
1976356ee6 [MoE Refactor] MXFP4 Cutlass Experts to MK (#34542)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
2026-02-25 17:32:39 -08:00
Ming Yang
6831650c40 [offloader] v2: Hide weight onloading latency via prefetching (#29941)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-02-25 17:20:59 -08:00
Andreas Karatzas
ed42507f6d [ROCm][CI] Amending deletion of AMD mirror (#35322)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-25 14:17:56 -08:00
Andreas Karatzas
9571e99945 [ROCm][CI] Extending attention backend coverage for Eagle spec decode tests (#35265)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-25 14:16:18 -08:00
Robert Shaw
ea97750414 [CI] Fix Distributed Tests (#35236)
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
2026-02-24 22:31:56 +00:00
Andreas Karatzas
067c5d9ad1 [ROCm][CI] Added MI325 mirrors (#34923)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-24 13:37:15 -08:00
Lucas Wilkinson
542ca66357 Revert "[CI/Build] Remove redundant OpenTelemetry pip install from CI configs" (#35211) 2026-02-24 09:26:42 -08:00
Cyrus Leung
fc8456c336 [CI/Build] Fix kernels test location (#35205)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-24 09:20:34 -08:00
Robert Shaw
60da0e1544 [CI] Remove Duplicated Tests (#35199)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-02-24 23:53:30 +08:00
Vlad Tiberiu Mihailescu
1a6cf39dec [CI/Build] Remove redundant OpenTelemetry pip install from CI configs (#35032)
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>
2026-02-23 22:24:11 -08:00
Aaron Hao
596ed1f02e [RL] Validation for pause_mode='keep' (#34992)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2026-02-23 16:30:56 -05:00
Andreas Karatzas
d403c1da1c [CI] Stabilizing ROCm amd-ci signal and minor name fix in upstream (#35008)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-22 04:01:10 +00:00
Michael Goin
fac1507f03 [CI] Remove failing prime-rl integration test (#34843)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2026-02-20 10:17:42 -08:00
Kevin H. Luu
a1a2d79442 [ci] Use the right tag for CPU arm64 image (#34915)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2026-02-19 19:59:15 -08:00
Michael Goin
16f24e8797 [CI] Add GPT-OSS Eval job for H100 (#34359)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2026-02-19 17:14:54 -08:00
Alexei-V-Ivanov-AMD
304319c4ed Change targets for AMD build in the "CI" pipeline (#34918)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2026-02-19 21:26:53 +00:00
Andreas Karatzas
f6220f9877 [ROCm][Test] Fix beam search determinism failures from batch-size-dependent FP divergence and removed wrong marker (#34878)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-19 08:25:26 +00:00
Andreas Karatzas
2df2bb27b0 [ROCm][CI] Removing all blocking labels from MI355 until stable infra (#34879)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-19 07:53:08 +00:00
Kevin H. Luu
b6101d384d Deprecate test-pipeline.yaml (#34864)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2026-02-19 02:15:27 +00:00
Ilya Markov
6b3166a7c7 [CI][Bugfix] Fix multinode test script (#34820)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
2026-02-18 11:45:10 -05:00
Robert Shaw
25e2e136ef [CI] temporarily disable multi-node tests (#34825)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-02-18 11:32:44 -05:00
Cyrus Leung
30ebe0dc3c [CI/Build] Remove use of skip_v1 (#34699)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-18 12:19:11 +08:00
Amr Mahdi
df3f537a66 [CI] Remove unused precompiled wheel args from image build (#34767)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
2026-02-17 18:58:18 -08:00
Nicolò Lucchesi
8e962fef5f [CI][Nixl] Add CrossLayer KV layout tests (#34615)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-02-17 21:35:40 +08:00
junuxyz
c61a98f529 [CI][BugFix] ShellCheck cleanup to remove baseline and preserve runtime behavior (#34514)
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>
2026-02-17 12:22:56 +00:00
kourosh hakhamaneshi
c464b57374 [Ray] Propagate third-party env vars to Ray workers via prefix matching (#34383)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-17 01:08:42 -08:00
Amr Mahdi
c5c38e152a [CI] Fix bake config artifact path for AMI rebuild pipeline (#34656)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
2026-02-17 06:39:44 +00:00
Alexei-V-Ivanov-AMD
824f9e8f3c Targeting the MI355 agent pool with all existing tests (#34629)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2026-02-16 17:02:27 +00:00
Amr Mahdi
08f8c198ae [CI] Disable precompiled wheel path in CI image builds (#34606)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
2026-02-16 15:14:43 +00:00
Harry Mellor
a21cedf4ff Bump lm-eval version for Transformers v5 compatibility (#33994)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-16 05:24:35 -08:00
Amr Mahdi
bb59c90248 [CI] Write bake config to temp directory instead of repo root (#34569)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
2026-02-15 22:15:47 -08:00
Louie Tsai
55a1a9563a Vllm CPU benchmark suite improvement (#34128)
Signed-off-by: louie-tsai <louie.tsai@intel.com>
2026-02-12 16:04:44 +08:00
Kevin H. Luu
83b47f67b1 [ci] Integrate AMD tests into CI (#33626)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2026-02-12 08:54:17 +08:00
Micah Williamson
fb7b30c716 [ROCm][CI] Revert Test Groups From mi325_8 to mi325_1 Agent Pool In AMD CI (#34384)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-02-11 15:52:34 -08:00
Micah Williamson
4a1550d22d [ROCm][CI] Fix test_sequence_parallel.py location in AMD CI pipeline (#34280)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-02-11 01:08:11 +00:00
Matthew Bonanni
4293c00b84 [Benchmarks] Fix attention benchmark smoke test (#34269)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-02-10 16:04:07 -05:00
Harry Mellor
d0bc520569 Bump mamba-ssm version in CI for Transformers v5 compatibility (#34233)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-10 14:46:01 +01:00
wang.yuqi
dab1de9f38 [Frontend][CI] Consolidate instrumentator entrypoints (#34123)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-02-10 07:30:19 +00:00
Luka Govedič
781ddf7868 [CI][torch.compile] Fix incorrect filtering for E2E fusion tests on B200 (#34031)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
2026-02-09 10:05:14 -05:00
Lucas Wilkinson
d0d97e2974 [Misc] Fix up attention benchmarks (#33810)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
2026-02-09 09:42:03 -05:00
zofia
9bdb06b436 [XPU][6/N] add xpu scaled_mm kernel (#34117)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
2026-02-09 20:17:35 +08:00
wang.yuqi
22b64948f6 [Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-02-09 06:42:38 +00:00
Andrey Talman
f97ca67176 [Release 2.10] Update to Torch 2.10 - final release (#30525) 2026-02-08 13:51:09 -08:00
TJian
785cf28fff [ROCm] [CI] Reduce Resource of two test groups (#34059)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2026-02-08 15:17:26 +08:00
kourosh hakhamaneshi
4a2d00eafd [bugfix] [ROCm] Fix premature CUDA initialization in platform detection (#33941)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2026-02-06 16:17:55 -06:00
zofia
2ce9fe4ad0 [XPU][5/N] add wna16 xpu kernel (#33973)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
2026-02-06 15:59:53 +00:00