Commit Graph

98 Commits

Author SHA1 Message Date
Andreas Karatzas
ec27b36b4b [CI] Defining extended V1 e2e + engine tests (#35580)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-02 08:10:54 +00:00
Charlie Fu
3fd1d4ec2c [Rocm][CI] Fix LM Eval Large Models (H100) test group (#34750)
Signed-off-by: charlifu <charlifu@amd.com>
2026-03-02 07:43:38 +00:00
Augusto Yao
8e75d88554 add io_process_plugin for sparse embedding (#34214)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Signed-off-by: Augusto Yao <augusto.yjh@antgroup.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-02-28 09:16:37 +00:00
Andreas Karatzas
f5d1281c9d [ROCm][CI] Expose tests to AMD production CI and fix amdsmi heap corruption (#35071)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-28 13:57:31 +08:00
Aaron Hao
2ce6f3cf67 [Feat][RL][2/2] Native Weight Syncing API: IPC (#34171)
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2026-02-27 13:45:21 -07:00
Lucas Wilkinson
542ca66357 Revert "[CI/Build] Remove redundant OpenTelemetry pip install from CI configs" (#35211) 2026-02-24 09:26:42 -08:00
Vlad Tiberiu Mihailescu
1a6cf39dec [CI/Build] Remove redundant OpenTelemetry pip install from CI configs (#35032)
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>
2026-02-23 22:24:11 -08:00
Andreas Karatzas
d403c1da1c [CI] Stabilizing ROCm amd-ci signal and minor name fix in upstream (#35008)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-22 04:01:10 +00:00
Michael Goin
fac1507f03 [CI] Remove failing prime-rl integration test (#34843)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2026-02-20 10:17:42 -08:00
Andreas Karatzas
f6220f9877 [ROCm][Test] Fix beam search determinism failures from batch-size-dependent FP divergence and removed wrong marker (#34878)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-19 08:25:26 +00:00
Andreas Karatzas
2df2bb27b0 [ROCm][CI] Removing all blocking labels from MI355 until stable infra (#34879)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-19 07:53:08 +00:00
Cyrus Leung
30ebe0dc3c [CI/Build] Remove use of skip_v1 (#34699)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-18 12:19:11 +08:00
Alexei-V-Ivanov-AMD
824f9e8f3c Targeting the MI355 agent pool with all existing tests (#34629)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2026-02-16 17:02:27 +00:00
Micah Williamson
fb7b30c716 [ROCm][CI] Revert Test Groups From mi325_8 to mi325_1 Agent Pool In AMD CI (#34384)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-02-11 15:52:34 -08:00
Micah Williamson
4a1550d22d [ROCm][CI] Fix test_sequence_parallel.py location in AMD CI pipeline (#34280)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-02-11 01:08:11 +00:00
wang.yuqi
dab1de9f38 [Frontend][CI] Consolidate instrumentator entrypoints (#34123)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-02-10 07:30:19 +00:00
wang.yuqi
22b64948f6 [Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-02-09 06:42:38 +00:00
TJian
785cf28fff [ROCm] [CI] Reduce Resource of two test groups (#34059)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2026-02-08 15:17:26 +08:00
kourosh hakhamaneshi
4a2d00eafd [bugfix] [ROCm] Fix premature CUDA initialization in platform detection (#33941)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2026-02-06 16:17:55 -06:00
Luka Govedič
ac32e66cf9 [torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) (#33731)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: ProExpertProg <luka.govedic@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-02-06 04:19:49 -08:00
Cyrus Leung
116880a5a0 [Bugfix] Make MM batching more robust (#33817)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-05 20:40:58 +00:00
Aaron Hao
c1858b7ec8 [Feat][RL][1/2] Native Weight Syncing API: NCCL (#31943)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Co-authored-by: SumanthRH <sumanthrh99@gmail.com>
2026-02-05 12:13:23 -05:00
Cyrus Leung
038914b7c8 [Refactor] Move task outside of PoolingParams.verify (#33796)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-02-05 09:33:11 +00:00
Luka Govedič
4d9513537d [CI][torch.compile] Reduce e2e fusion test time (#33293)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: ProExpertProg <luka.govedic@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-02-04 19:09:03 -05:00
Matt
08e094997e [Hardware][AMD][CI] Refactor AMD tests to properly use BuildKite parallelism (#32745)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
2026-02-04 14:51:33 +08:00
Micah Williamson
911b51b69f [ROCm][CI] Add TORCH_NCCL_BLOCKING_WAIT For Distributed Tests (A100) (#32891)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-01-28 11:32:31 +08:00
Alexei-V-Ivanov-AMD
3c3c547ce0 Enabling "2 node" distributed tests in the AMD CI pipeline. (#32719)
Signed-off-by: DCCS-4560 <alivanov@chi-mi325x-pod1-112.ord.vultr.cpe.ice.amd.com>
Co-authored-by: DCCS-4560 <alivanov@chi-mi325x-pod1-112.ord.vultr.cpe.ice.amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2026-01-27 19:13:21 +00:00
Matthew Bonanni
a608b4c6c2 [5/N][Attention] Finish eliminating vllm/attention folder (#32064)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-27 10:02:51 -05:00
Robert Shaw
5a93b9162b [MoE Refactor] Integrate Naive Prepare Finalize into MK (#32567)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: amirkl94 <203507526+amirkl94@users.noreply.github.com>
2026-01-27 01:28:02 +00:00
Matthew Bonanni
300622e609 [CI][Attention] Add more CI dependencies for attention tests (#32487)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-22 18:44:56 +00:00
Matt
c517d8c934 [Hardware][AMD][CI][Bugfix] Fix regressions from deprecated env vars (#32837)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
2026-01-23 00:59:15 +08:00
Cyrus Leung
d117a4d1a9 [Frontend] Introduce Renderer for processing chat messages (using ModelConfig) (#30200)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-22 12:44:22 +00:00
Andreas Karatzas
eb1629da24 [ROCm][CI] Fix AITER test flakiness by using explicit attention backend (#32346)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
2026-01-22 13:55:25 +08:00
Wentao Ye
6437ff1fb9 [Deprecation] Remove deprecated environment variables (#32812)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-01-22 02:25:16 +00:00
Micah Williamson
22375f8d13 [ROCm][CI] Remove DS async eplb accuracy test from AMD CI (#32717)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-01-20 13:40:48 -08:00
Matthew Bonanni
1a1fc3bbc0 [Attention][MLA] Make FLASHINFER_MLA the default MLA backend on Blackwell, and TRTLLM the default prefill (#32615)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-01-19 18:41:34 -05:00
Yanan Cao
9d1e611f0e [CI] Add Helion as an optional dependency (#32482)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2026-01-19 19:09:56 +00:00
qli88
a0490be8f1 [CI][amd] Revert NIXL connector change to avoid crash (#32570)
Signed-off-by: Qiang Li <qiang.li2@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
2026-01-19 18:39:16 +00:00
Divakar Verma
a28d9f4470 [ROCm][CI] Handle pytest status code 5 when a shard isn't allocated any tests (#32040)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2026-01-12 17:35:49 -05:00
Cyrus Leung
a374532111 [CI/Build] Separate out flaky responses API tests (#32110)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-11 05:01:12 -08:00
Matt
bde57ab2ed [Hardware][AMD][CI][Bugfix] Fix AMD Quantization test group (#31713)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
2026-01-10 23:19:46 -08:00
Matthew Bonanni
2612ba9285 [1/N][Attention] Restructure attention: move files (#31916)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-09 13:10:24 -08:00
Nicolò Lucchesi
83e1c76dbe [CI][ROCm] Fix NIXL tests on ROCm (#31728)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-01-09 01:34:43 +08:00
rasmith
f1b1bea5c3 [CI][BugFix][AMD] Actually skip tests marked @pytest.mark.skip_v1 (#31873)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2026-01-08 13:06:09 +08:00
Andreas Karatzas
364a8bc6dc [ROCm][CI] Fix plugin tests (2 GPUs) failures on ROCm and removing VLLM_FLOAT32_MATMUL_PRECISION from all ROCm tests (#31829)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-07 01:12:23 +00:00
Andreas Karatzas
89f1f25310 [CI] Skip Phi-MoE test due to old API util (#31632)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-05 08:52:07 +08:00
Andreas Karatzas
5cc4876630 [ROCm][CI] Fix failure in Language Models Tests (Extra Standard) by reducing agent pool size (#31553)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-01 19:29:42 -08:00
qli88
0f35429a0c [CI]Test Group 'NixlConnector PD accuracy tests' is fixed (#31460)
Signed-off-by: qli88 <qiang.li2@amd.com>
2025-12-29 23:48:56 +00:00
Andreas Karatzas
f70368867e [ROCm][CI] Add TorchCodec source build for transcription tests (#31323)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-28 16:06:05 +08:00
Micah Williamson
6559d96796 [ROCm][CI] Set TORCH_NCCL_BLOCKING_WAIT Distributed Tests On ROCm (#31259)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-24 07:19:07 +00:00