Commit Graph

867 Commits

Author SHA1 Message Date
Wentao Ye
1f400c58b8 [CI] Add batch invariant test to ci (#27842)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-21 09:20:33 -07:00
Bhagyashri
2b1b3dfa4b Update Dockerfile to use gcc-toolset-14 and fix test case failures on power (ppc64le) (#28957)
Signed-off-by: Bhagyashri <Bhagyashri.Gaikwad2@ibm.com>
2025-11-21 12:24:09 +00:00
Michael Goin
986ab5db63 [CI Bugfix] Fix Kernels DeepGEMM Test (H100) (#29106)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-20 16:42:33 -08:00
Kevin H. Luu
114b0e2500 [chore] Update annotate release scripts (#29077)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2025-11-20 10:22:40 -08:00
Alexei-V-Ivanov-AMD
22924383e1 Updating the mirror of test-amd.yaml as of 2025-11-18 (#29016)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-11-20 12:07:06 -05:00
Fadi Arafeh
3168285fca [cpu][ci] Add initial set of tests for Arm CPUs (#28657)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-11-20 02:37:09 +00:00
Alexander Matveev
3aaa94ac99 [Performance] Reduce DeepGEMM N dim restriction from 128 to 64 multiplier (#28687)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-11-19 15:47:13 -08:00
Micah Williamson
22e44ad589 [ROCm][CI] Fix Weight Loading With Multiple GPU Tests on ROCm (#28984)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-11-19 21:31:33 +00:00
Shu Wang
613abb50d5 [MoE] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#25990)
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-11-19 13:29:06 -08:00
Copilot
61728cd1df Re-enable FlashInfer for Llama4 on Blackwell in e2e fusion tests (#28966)
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-19 13:32:19 -05:00
Harry Mellor
a8b70304d6 Update rope_scaling to rope_parameters in preparation for Transformers v5 (#28542)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-19 09:06:36 -08:00
Yanan Cao
2c8b9182b5 [CI] Reorganize compile tests so new tests are automatically included in CI (#28625)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-11-19 06:13:50 -08:00
Li, Jiang
20852c8f4c [CPU] Refactor CPU WNA16 (#28826)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-19 10:32:00 +08:00
Zhewen Li
f8b19c0ffd [Bugfix] Fix GPT-OSS on AMD after #28603 (#28816)
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-11-17 13:15:26 -05:00
Nick Hill
637f292196 [CI] Fix broken pipeline (#28781)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-15 08:44:14 -08:00
Angela Yi
f36292dbee [compile] Enable sequence parallelism matching w/o custom ops enabled (#27126)
Signed-off-by: angelayi <yiangela7@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
2025-11-15 11:46:12 +00:00
Kunshang Ji
da14ae0fad [XPU][CI]disable lm cache uts (#28696)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-11-14 03:15:50 +00:00
Bradley D
b39a5026eb [ci][amd] fix basic models extra init test (#28676)
Signed-off-by: Bradley Davis <bradleyhd@meta.com>
2025-11-14 02:44:36 +00:00
Alexei-V-Ivanov-AMD
f2b8e1c551 Mirrored test group definitions for AMD (2025-11-11) (#28573)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-11-14 00:16:34 +00:00
Yanan Cao
262d263f6c [Bugfix] Eliminate tuple inputs to submodules in graph partitioning (#28533)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-11-13 15:09:05 -05:00
Nick Hill
8832fff972 [BugFix] Fix mm_encoder_attn_backend arg type checking (#28599)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-13 03:06:03 +00:00
Harry Mellor
51c599f0ec Skip models that cannot currently init on Transformers v5 (#28471)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-12 23:43:57 +00:00
Harry Mellor
a742134cc5 Remove deprecated fields from CompilationConfig (#27593)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-12 16:10:28 +00:00
Huamin Li
c748355e0d [CI] Introduce autorun_on_main feature (#27836)
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-11-12 08:51:19 +00:00
Andreas Karatzas
9f0247cfa4 VLLM_USE_TRITON_FLASH_ATTN V0 variable deprecation (#27611)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Andreas Karatzas <Andreas.Karatzas@amd.com>
2025-11-11 18:34:36 -08:00
Li, Jiang
7f829be7d3 [CPU] Refactor CPU attention backend (#27954)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-12 09:43:06 +08:00
wangxiyuan
e1710393c4 [[V0 deprecation]]Remove VLLM_USE_V1 env (#28204)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-11 18:22:16 -07:00
zhrrr
68c09efc37 [Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Model (#27165)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
2025-11-11 12:00:31 -05:00
usberkeley
3143eb23fc [BugFix] Add test_outputs.py to CI pipeline (#28466)
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-11 16:01:30 +00:00
Matthew Bonanni
b30dfa03c5 [Attention] Refactor CUDA attention backend selection logic (#24794)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-11 07:40:44 -05:00
Adrian Abeyta
a5a790eea6 [Bugfix] Ensure calculated KV scales are applied in attention. (#27232)
Signed-off-by: adabeyta <aabeyta@redhat.com>
2025-11-10 23:42:37 +00:00
Ilya Markov
d17ecc6b19 [PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds (#24248)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-11-10 18:33:11 -05:00
Zhewen Li
a65a934ebe [CI/Build] Temporary fix to LM Eval Small Models (#28324)
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-11-09 21:08:38 +00:00
Simon Mo
d0ceb38ae8 [Build] Fix release pipeline failing annotation (#28272)
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-07 10:06:45 -08:00
Copilot
a736e5ff77 [CI] Reduce Blackwell Fusion test runtime by filtering tests and only run all tests in nightly (#28074) 2025-11-07 15:58:16 +08:00
Alexis MacAskill
a47d94f18c Add runai model streamer e2e test for GCS (#28079)
Signed-off-by: Alexis MacAskill <amacaskill@google.com>
2025-11-07 03:07:54 +00:00
Michael Goin
f32229293e Disable nm-testing models with issues in CI (#28206)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-06 06:19:07 -08:00
gmagogsfm
bde5039325 [CI] Add compile/test_multimodal_compile.py to CI (#28151)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-06 05:41:47 +00:00
Samuel Shen
40db194446 [CI]: Add LMCacheConnector Unit Tests (#27852)
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
2025-11-05 09:45:57 -08:00
Alexei-V-Ivanov-AMD
80c9275348 Enabling cooperative multi-gpu tests on multi-gpu nodes (#27986)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-11-05 10:35:49 -05:00
Ilya Markov
e50c454672 [BugFix] Support EP/DP + EPLB with MTP (#25311)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
2025-11-05 15:22:17 +00:00
Zhewen Li
878fd5a16f [CI/Build] Enable some fixed tests in AMD CI (#28078)
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-11-05 03:15:59 +00:00
Zhewen Li
53f6e81dfd [CI/Build] Fix OpenAI API correctness on AMD CI (#28022)
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-11-04 07:20:50 +00:00
QiliangCui
7956b0c0bc Remove the tpu docker image nightly build. (#27997)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-11-04 00:35:54 +00:00
Matthew Bonanni
01baefe674 Add TP parameter to attention tests (#27683)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-03 13:04:40 -08:00
Lucas Wilkinson
4bc400f47e [CI/Testing] Add basic single node dual batch overlap test (#27235)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-03 17:00:46 +00:00
Matthew Bonanni
f29aeb5a25 Add FLASHINFER_MLA to test_mla_backends and add B200 CI run (#27663)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-10-31 11:12:19 -07:00
Jee Jee Li
0384aa7150 [CI/Build] Add gpt-oss LoRA test (#27870)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-10-31 22:17:21 +08:00
Wentao Ye
2bf0bcc1fc [CI Test] Add Scheduled Integration Test (#27765)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-30 17:29:26 -07:00
Jakub Sochacki
697f507a8e [CI/Build][Intel] Enable performance benchmarks for Intel Gaudi 3 (#26919)
Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl>
2025-10-31 07:57:22 +08:00