liuzhenwei
|
83aea2147f
|
[XPU][UT] update UTs in CI (#39296)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-04-09 09:38:16 +08:00 |
|
Roberto L. Castro
|
b55d830ec7
|
[Perf][Kernel] Persistent TopK scheduler: unified CUDAGraph-safe kernel with dynamic per-row dispatch - DeepSeek-V3.2 DSA decode (#37421)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2026-04-08 13:35:57 -04:00 |
|
Andrey Talman
|
2111997f96
|
[release 2.11] Update to torch 2.11 (#34644)
|
2026-04-07 18:55:48 -07:00 |
|
Flora Feng
|
5af684c319
|
[CI] Add reasoning parser tests to CI (#37025)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-04-08 00:57:36 +00:00 |
|
Giancarlo Delfin
|
5daf62271d
|
[Model Runner V2] Fuse probabilistic rejection sample kernels (#38496)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-04-07 17:37:37 -07:00 |
|
bnellnm
|
f01482408c
|
[MoE Refactor][Test] FusedMoE layer test (#24675)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-06 17:17:23 +00:00 |
|
Andreas Karatzas
|
780ba37458
|
[ROCm][Quantization] Add asymmetric INT8 quantization support to TritonInt8ScaledMMLinearKernel (#38501)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-04-06 09:42:10 +08:00 |
|
Micah Williamson
|
9570654c6d
|
[ROCm][CI] Run Kernels Core Operation Test On MI325 and mitigate flakiness (#38184)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-04-06 09:42:02 +08:00 |
|
Kevin H. Luu
|
56de443db1
|
[ci] Switch some CI jobs to H200 MIG slices (#38956)
|
2026-04-05 13:26:11 -07:00 |
|
Kevin H. Luu
|
f0d3ad9f3e
|
[ci] Remove soft fail for AMD image build job (#38941)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-04-03 20:42:33 +00:00 |
|
Divin Honnappa
|
121ea5a21f
|
Removed GPU state confirmation and cleanup steps. (#38238)
Signed-off-by: Divin Honnappa <divin.honnappa@amd.com>
|
2026-04-03 13:11:08 -07:00 |
|
Jeffrey Wang
|
ab79863e6c
|
Remove MQ multi-node tests (#38934)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2026-04-03 20:00:08 +00:00 |
|
Nick Hill
|
5f1de2b14b
|
[Model Runner V2] Add config validation for not-yet-supported features (#38758)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-04-03 12:08:08 -07:00 |
|
xiangdong
|
40ee64c00e
|
[XPU][CI] Skip test_topp_only and test_topk_and_topp cases on Intel GPU in CI (#38904)
Signed-off-by: zengxian <xiangdong.zeng@intel.com>
|
2026-04-03 20:44:52 +08:00 |
|
Anton Ivanov
|
abebd9323d
|
[CPU] Replace OMP initialization (#36487)
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
|
2026-04-03 18:42:43 +08:00 |
|
xiangdong
|
cb4ff07f8b
|
[XPU][CI] Skip test_topk_only cases on Intel GPU in CI (#38899)
Signed-off-by: zengxian <xiangdong.zeng@intel.com>
|
2026-04-03 09:50:41 +00:00 |
|
Varun Sundar Rabindranath
|
2ad7c0335f
|
[Model] Add Phi4ForCausalLMV for microsoft/Phi-4-reasoning-vision-15B (#38306)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2026-04-02 21:14:57 -07:00 |
|
Carl Y
|
1f5ec2889c
|
[mla] Support fused FP8/NVFP4 output quantization in MLA attention (#35792) (#36205)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
Signed-off-by: Carl Y <4531192+carlyou@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-02 21:16:11 -04:00 |
|
Stefano Castagnetta
|
188defbd0b
|
[CI] Add flashinfer.py to attention test source deps (#38792)
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-04-02 19:24:29 +00:00 |
|
Yanan Cao
|
ecd5443dbc
|
Bump helion dependency from 0.3.2 to 0.3.3 (#38062)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-02 10:59:33 -07:00 |
|
Jeffrey Wang
|
de5e6c44c6
|
[Feat][Executor] Introduce RayExecutorV2 (#36836)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2026-04-01 14:34:29 -07:00 |
|
Luka Govedič
|
40bb175027
|
[vLLM IR] 1/N Implement IR skeleton and rms_norm op (#33825)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Signed-off-by: Luka Govedic <luka.govedic@gmail.com>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: Luka Govedič <ProExpertProg@h100-01.nemg-001.lab.rdu2.dc.redhat.com>
|
2026-03-31 22:15:05 -04:00 |
|
Yi Liu
|
0dd25a44ea
|
[Quantization][Autoround][XPU] Add W4A16 Support (#37986)
Signed-off-by: yiliu30 <yi4.liu@intel.com>
|
2026-03-31 16:48:24 +00:00 |
|
wenjun liu
|
e8057c00bc
|
[CI] Avoid concurrent docker pull in intel XPU CI runners to prevent rate limit issues (#38594)
Signed-off-by: wendyliu235 <wenjun.liu@intel.com>
|
2026-03-31 22:23:18 +08:00 |
|
Ilya Markov
|
abdbb68386
|
[EPLB] Add alternative communication for EPLB weight exchange (#33176)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
|
2026-03-31 08:17:12 -04:00 |
|
Yintong Lu
|
f09daea261
|
[CPU] Support int8 compute mode in CPU AWQ (#35697)
Signed-off-by: Yintong Lu <yintong.lu@intel.com>
|
2026-03-31 15:27:37 +08:00 |
|
Kevin H. Luu
|
42318c840b
|
[ci] Remove benchmarks job (#38611)
|
2026-03-31 06:46:21 +00:00 |
|
Louie Tsai
|
44eef0ca1e
|
vLLM Benchmark Suite perf regression after PR#32723 (#38576)
Signed-off-by: louie-tsai <louie.tsai@intel.com>
|
2026-03-31 05:23:17 +00:00 |
|
Li, Jiang
|
6557f4937f
|
[Bugfix][CPU] Skip set_num_threads after thread binding (#38535)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-30 20:13:00 +08:00 |
|
Kevin H. Luu
|
fec5aeca12
|
[ci] Soft fail and disable retry for AMD build image job (#38505)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-03-29 23:05:26 -07:00 |
|
Andreas Karatzas
|
4f2ed5fddb
|
[ROCm][CI] Enable hybrid chunked prefill test (#38317)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-30 10:30:26 +08:00 |
|
Kyle Sayers
|
d28d86e8a3
|
[QeRL] Fix online quantized reloading (#38442)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-03-29 14:56:41 -06:00 |
|
TJian
|
58a249bc61
|
[ROCm] [Release] Update ROCm variant from rocm700 to rocm721 (#38413)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-03-28 06:07:03 +00:00 |
|
Sage Moore
|
497e234d38
|
[EPLB] Cleanup the transfer logic for the various eplb maps (#34520)
Signed-off-by: Sage Moore <sagmoore@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2026-03-27 10:18:46 +01:00 |
|
Shengqi Chen
|
84e439a9cb
|
[CI/Build] Move nightly wheel index generation to a single post-build step (#38322)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
2026-03-27 07:44:18 +00:00 |
|
wenjun liu
|
d86060122a
|
[CI/Build] enable Intel XPU test flow with prebuilt image (#37447)
Signed-off-by: wendyliu235 <wenjun.liu@intel.com>
|
2026-03-26 18:16:04 -07:00 |
|
Giancarlo Delfin
|
c32e97602d
|
[Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling (#38045)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-03-26 13:38:12 -07:00 |
|
TJian
|
bc9c6fbbe6
|
[ROCm] [Bugfix] [Release] Fix nightly rocm release pipeline (#38263)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-03-26 18:47:10 +00:00 |
|
Andreas Karatzas
|
bff9a1c266
|
[ROCm][CI] Override PYTORCH_ROCM_ARCH with detected GPU arch in test containers (#38165)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 18:33:45 +00:00 |
|
Andreas Karatzas
|
9c3ae04bfe
|
[ROCm][CI] Add LM Eval Qwen3.5 Models test for MI355 (#38155)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 16:51:18 +00:00 |
|
TJian
|
60af7b967b
|
[Releases] [ROCm] Enable Nightly Docker Image and Wheel Releases for ROCm (#37283)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
|
2026-03-26 16:32:25 +00:00 |
|
Wentao Ye
|
e054f152fa
|
[CI] Add batch invariant test for b200 (#38014)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-26 11:54:54 -04:00 |
|
Fadi Arafeh
|
71161e8b63
|
[cpu][ci] remove soft-fail for Arm CI and add quant model tests (#37691)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-03-26 07:03:31 +00:00 |
|
Richard Zou
|
6e37c46b35
|
[compile] Add some more startup tests for top models (#38046)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-25 12:02:22 -04:00 |
|
Andreas Karatzas
|
04417ecd5f
|
[ROCm][CI] Rename filepath test to point to correct file (#38102)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-25 20:05:46 +08:00 |
|
Gregory Shtrasberg
|
189ddefbfd
|
[ROCm] Attention selector reordering (#36702)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
|
2026-03-25 17:42:56 +08:00 |
|
Andreas Karatzas
|
679c6a3ecc
|
[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128 (#37787)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-25 08:17:33 +08:00 |
|
Andreas Karatzas
|
8bbb7c7f20
|
[ROCm][CI][PD] Add Hybrid SSM integration tests to CI (#37924)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-25 07:58:39 +08:00 |
|
Kevin H. Luu
|
af945615b5
|
[release] Move the rest of release jobs to release queue (#38044)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-03-24 16:40:58 -07:00 |
|
amey asgaonkar
|
0c1809c806
|
Add Ubuntu 24.04 support for Docker builds (#35386)
Signed-off-by: aasgaonkar <aasgaonkar@nvidia.com>
|
2026-03-24 13:34:44 -07:00 |
|