Commit Graph

1184 Commits

Author SHA1 Message Date
Li, Jiang
12449f9492 [Bugfix][CPU] Skip set_num_threads after thread binding (#38535)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
(cherry picked from commit 6557f4937f)
2026-03-30 23:01:42 -07:00
Andreas Karatzas
4f2ed5fddb [ROCm][CI] Enable hybrid chunked prefill test (#38317)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-30 10:30:26 +08:00
Kyle Sayers
d28d86e8a3 [QeRL] Fix online quantized reloading (#38442)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2026-03-29 14:56:41 -06:00
TJian
58a249bc61 [ROCm] [Release] Update ROCm variant from rocm700 to rocm721 (#38413)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2026-03-28 06:07:03 +00:00
Sage Moore
497e234d38 [EPLB] Cleanup the transfer logic for the various eplb maps (#34520)
Signed-off-by: Sage Moore <sagmoore@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
2026-03-27 10:18:46 +01:00
Shengqi Chen
84e439a9cb [CI/Build] Move nightly wheel index generation to a single post-build step (#38322)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-27 07:44:18 +00:00
wenjun liu
d86060122a [CI/Build] enable Intel XPU test flow with prebuilt image (#37447)
Signed-off-by: wendyliu235 <wenjun.liu@intel.com>
2026-03-26 18:16:04 -07:00
Giancarlo Delfin
c32e97602d [Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling (#38045)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
2026-03-26 13:38:12 -07:00
TJian
bc9c6fbbe6 [ROCm] [Bugfix] [Release] Fix nightly rocm release pipeline (#38263)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2026-03-26 18:47:10 +00:00
Andreas Karatzas
bff9a1c266 [ROCm][CI] Override PYTORCH_ROCM_ARCH with detected GPU arch in test containers (#38165)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-26 18:33:45 +00:00
Andreas Karatzas
9c3ae04bfe [ROCm][CI] Add LM Eval Qwen3.5 Models test for MI355 (#38155)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-26 16:51:18 +00:00
TJian
60af7b967b [Releases] [ROCm] Enable Nightly Docker Image and Wheel Releases for ROCm (#37283)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
2026-03-26 16:32:25 +00:00
Wentao Ye
e054f152fa [CI] Add batch invariant test for b200 (#38014)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-26 11:54:54 -04:00
Fadi Arafeh
71161e8b63 [cpu][ci] remove soft-fail for Arm CI and add quant model tests (#37691)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2026-03-26 07:03:31 +00:00
Richard Zou
6e37c46b35 [compile] Add some more startup tests for top models (#38046)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-03-25 12:02:22 -04:00
Andreas Karatzas
04417ecd5f [ROCm][CI] Rename filepath test to point to correct file (#38102)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-25 20:05:46 +08:00
Gregory Shtrasberg
189ddefbfd [ROCm] Attention selector reordering (#36702)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
2026-03-25 17:42:56 +08:00
Andreas Karatzas
679c6a3ecc [Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128 (#37787)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-25 08:17:33 +08:00
Andreas Karatzas
8bbb7c7f20 [ROCm][CI][PD] Add Hybrid SSM integration tests to CI (#37924)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-25 07:58:39 +08:00
Kevin H. Luu
af945615b5 [release] Move the rest of release jobs to release queue (#38044)
Signed-off-by: khluu <khluu000@gmail.com>
2026-03-24 16:40:58 -07:00
amey asgaonkar
0c1809c806 Add Ubuntu 24.04 support for Docker builds (#35386)
Signed-off-by: aasgaonkar <aasgaonkar@nvidia.com>
2026-03-24 13:34:44 -07:00
Li, Jiang
352b90c4a4 [Bugfix] Add replacement of _compute_slot_mapping_kernel on CPU (#37987)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2026-03-24 07:00:20 -07:00
Kevin H. Luu
7281199a8c [release] Move agent queue to Release cluster queues (#37783)
Signed-off-by: khluu <khluu000@gmail.com>
2026-03-23 20:36:47 -07:00
Kevin H. Luu
b2dd75eb48 Downsize CPU jobs to use small queue (#37913)
Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2026-03-23 20:36:37 -07:00
Andreas Karatzas
de99d91ece [ROCm][CI] Split Entrypoints Integration (API Server 1) into 3 jobs (#37906)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-24 09:48:37 +08:00
Wentao Ye
83c9d525b6 [CI] Add batch invariant test: Block FP8 + small MOE (#37895)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-23 21:16:14 -04:00
roikoren755
56777b5c89 [Test] E2E Nemotron-3-Super tests (#36803)
Signed-off-by: Roi Koren <roik@nvidia.com>
2026-03-23 17:49:56 -07:00
Kevin H. Luu
2488a82f89 [CI] Split V1 Others into 3 separate jobs (#37016)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 06:44:38 +08:00
Kyle Sayers
38364a7e32 [Sparse24] [Deprecation] Remove Sparse24 CT integration and kernels (#36799)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2026-03-23 16:03:29 -04:00
Kunshang Ji
91fd695b75 [CI] split Entrypoints Integration (API Server 1) into 3 jobs (#37882)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-23 10:37:56 -07:00
Nicolò Lucchesi
1cbbcfe8a3 [CI][PD] Add Hybrid SSM integration tests to CI (#37657)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-03-23 23:58:19 +08:00
Jee Jee Li
1f0d210641 [CI/Build][LoRA] Update Qwen35 LoRA testing (#37816)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-03-23 12:55:49 +08:00
Woosuk Kwon
43877a620b [MRV2] Enable PP CUDA graph test (#37830)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-03-22 16:30:25 -07:00
Andreas Karatzas
6eedec6e36 [ROCm][CI] Make some duplicated tests optional so that they are only evaluated in our nightly (#37780)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-22 16:03:18 +08:00
Andreas Karatzas
e78bc74268 [ROCm][CI] close missing quote in kernels/moe block in run-amd-test.sh (#37774)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-22 09:42:34 +08:00
Robert Shaw
eeee5b262d [Quantization][Deprecation] Remove PTPC FP8 (#32700)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-03-21 22:10:16 +00:00
Andreas Karatzas
02eec7ecbe [ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list (MI355) (#37721)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-21 15:27:12 +08:00
Andreas Karatzas
0d50fa1db6 [ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 (#37610)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-21 12:57:25 +08:00
Andreas Karatzas
8bc6b5cdb0 [ROCm][CI] Setting some mi325_4 tests back to optional (in parity with upstream) (#37711)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-20 12:25:08 -07:00
Vadim Gimpelson
4f16ebbbd3 [Bugfix] Disable monolithic TRTLLM MoE for Renormalize routing (#37591) (#37605)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2026-03-20 12:19:26 -07:00
Lucas Wilkinson
e1d85e5c24 [Attention] Support distinguishing between short extends and decodes (#37303)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-03-20 10:49:36 -07:00
Flora Feng
b4c1aef21c [Refactor] Relocate tests from tests/v1/entrypoints/ to tests/entrypoints/ (#37500)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-20 02:50:34 -07:00
Flora Feng
6050b93bed [Refactor] Move serve entrypoint tests under tests/entrypoints/serve/ (#37595)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-20 02:10:47 -07:00
Andreas Karatzas
37cd9fc107 [ROCm][CI] Remove deepep DBO tests on gfx90a (#37614)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-20 17:07:07 +08:00
Andreas Karatzas
9cfd4ebb5e [ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list (#37619)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-20 17:06:53 +08:00
Andreas Karatzas
bd8c4c0752 [CI] Removing deprecated rlhf examples reference (#37585)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-20 15:20:33 +08:00
Flora Feng
e2d1c8b5e8 [Refactor] Relocate entrypoint tests to match serving code structure (#37593)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-20 05:31:23 +00:00
Jee Jee Li
8fbe3f303f [Bugfix][LoRA] Fix Qwen35 LoRA (#36976)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-03-20 11:09:32 +08:00
Andreas Karatzas
040a505ff5 [ROCm][CI] Cleaning and restructuring amd-ci legacy pipeline (#34839)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-19 14:30:58 -05:00
Cyrus Leung
c7bc12c20f [CI/Build] Split out MM pooling tests (#37542)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-19 11:36:11 +00:00