Varun Sundar Rabindranath
|
2ad7c0335f
|
[Model] Add Phi4ForCausalLMV for microsoft/Phi-4-reasoning-vision-15B (#38306)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2026-04-02 21:14:57 -07:00 |
|
Carl Y
|
1f5ec2889c
|
[mla] Support fused FP8/NVFP4 output quantization in MLA attention (#35792) (#36205)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
Signed-off-by: Carl Y <4531192+carlyou@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-02 21:16:11 -04:00 |
|
Stefano Castagnetta
|
188defbd0b
|
[CI] Add flashinfer.py to attention test source deps (#38792)
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-04-02 19:24:29 +00:00 |
|
Yanan Cao
|
ecd5443dbc
|
Bump helion dependency from 0.3.2 to 0.3.3 (#38062)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-02 10:59:33 -07:00 |
|
Jeffrey Wang
|
de5e6c44c6
|
[Feat][Executor] Introduce RayExecutorV2 (#36836)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2026-04-01 14:34:29 -07:00 |
|
Luka Govedič
|
40bb175027
|
[vLLM IR] 1/N Implement IR skeleton and rms_norm op (#33825)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Signed-off-by: Luka Govedic <luka.govedic@gmail.com>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: Luka Govedič <ProExpertProg@h100-01.nemg-001.lab.rdu2.dc.redhat.com>
|
2026-03-31 22:15:05 -04:00 |
|
Yi Liu
|
0dd25a44ea
|
[Quantization][Autoround][XPU] Add W4A16 Support (#37986)
Signed-off-by: yiliu30 <yi4.liu@intel.com>
|
2026-03-31 16:48:24 +00:00 |
|
wenjun liu
|
e8057c00bc
|
[CI] Avoid concurrent docker pull in intel XPU CI runners to prevent rate limit issues (#38594)
Signed-off-by: wendyliu235 <wenjun.liu@intel.com>
|
2026-03-31 22:23:18 +08:00 |
|
Ilya Markov
|
abdbb68386
|
[EPLB] Add alternative communication for EPLB weight exchange (#33176)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
|
2026-03-31 08:17:12 -04:00 |
|
Yintong Lu
|
f09daea261
|
[CPU] Support int8 compute mode in CPU AWQ (#35697)
Signed-off-by: Yintong Lu <yintong.lu@intel.com>
|
2026-03-31 15:27:37 +08:00 |
|
Kevin H. Luu
|
42318c840b
|
[ci] Remove benchmarks job (#38611)
|
2026-03-31 06:46:21 +00:00 |
|
Louie Tsai
|
44eef0ca1e
|
vLLM Benchmark Suite perf regression after PR#32723 (#38576)
Signed-off-by: louie-tsai <louie.tsai@intel.com>
|
2026-03-31 05:23:17 +00:00 |
|
Li, Jiang
|
6557f4937f
|
[Bugfix][CPU] Skip set_num_threads after thread binding (#38535)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-30 20:13:00 +08:00 |
|
Kevin H. Luu
|
fec5aeca12
|
[ci] Soft fail and disable retry for AMD build image job (#38505)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-03-29 23:05:26 -07:00 |
|
Andreas Karatzas
|
4f2ed5fddb
|
[ROCm][CI] Enable hybrid chunked prefill test (#38317)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-30 10:30:26 +08:00 |
|
Kyle Sayers
|
d28d86e8a3
|
[QeRL] Fix online quantized reloading (#38442)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-03-29 14:56:41 -06:00 |
|
TJian
|
58a249bc61
|
[ROCm] [Release] Update ROCm variant from rocm700 to rocm721 (#38413)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-03-28 06:07:03 +00:00 |
|
Sage Moore
|
497e234d38
|
[EPLB] Cleanup the transfer logic for the various eplb maps (#34520)
Signed-off-by: Sage Moore <sagmoore@redhat.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2026-03-27 10:18:46 +01:00 |
|
Shengqi Chen
|
84e439a9cb
|
[CI/Build] Move nightly wheel index generation to a single post-build step (#38322)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
2026-03-27 07:44:18 +00:00 |
|
wenjun liu
|
d86060122a
|
[CI/Build] enable Intel XPU test flow with prebuilt image (#37447)
Signed-off-by: wendyliu235 <wenjun.liu@intel.com>
|
2026-03-26 18:16:04 -07:00 |
|
Giancarlo Delfin
|
c32e97602d
|
[Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling (#38045)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-03-26 13:38:12 -07:00 |
|
TJian
|
bc9c6fbbe6
|
[ROCm] [Bugfix] [Release] Fix nightly rocm release pipeline (#38263)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-03-26 18:47:10 +00:00 |
|
Andreas Karatzas
|
bff9a1c266
|
[ROCm][CI] Override PYTORCH_ROCM_ARCH with detected GPU arch in test containers (#38165)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 18:33:45 +00:00 |
|
Andreas Karatzas
|
9c3ae04bfe
|
[ROCm][CI] Add LM Eval Qwen3.5 Models test for MI355 (#38155)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 16:51:18 +00:00 |
|
TJian
|
60af7b967b
|
[Releases] [ROCm] Enable Nightly Docker Image and Wheel Releases for ROCm (#37283)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
|
2026-03-26 16:32:25 +00:00 |
|
Wentao Ye
|
e054f152fa
|
[CI] Add batch invariant test for b200 (#38014)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-26 11:54:54 -04:00 |
|
Fadi Arafeh
|
71161e8b63
|
[cpu][ci] remove soft-fail for Arm CI and add quant model tests (#37691)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-03-26 07:03:31 +00:00 |
|
Richard Zou
|
6e37c46b35
|
[compile] Add some more startup tests for top models (#38046)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-25 12:02:22 -04:00 |
|
Andreas Karatzas
|
04417ecd5f
|
[ROCm][CI] Rename filepath test to point to correct file (#38102)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-25 20:05:46 +08:00 |
|
Gregory Shtrasberg
|
189ddefbfd
|
[ROCm] Attention selector reordering (#36702)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
|
2026-03-25 17:42:56 +08:00 |
|
Andreas Karatzas
|
679c6a3ecc
|
[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128 (#37787)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-25 08:17:33 +08:00 |
|
Andreas Karatzas
|
8bbb7c7f20
|
[ROCm][CI][PD] Add Hybrid SSM integration tests to CI (#37924)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-25 07:58:39 +08:00 |
|
Kevin H. Luu
|
af945615b5
|
[release] Move the rest of release jobs to release queue (#38044)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-03-24 16:40:58 -07:00 |
|
amey asgaonkar
|
0c1809c806
|
Add Ubuntu 24.04 support for Docker builds (#35386)
Signed-off-by: aasgaonkar <aasgaonkar@nvidia.com>
|
2026-03-24 13:34:44 -07:00 |
|
Li, Jiang
|
352b90c4a4
|
[Bugfix] Add replacement of _compute_slot_mapping_kernel on CPU (#37987)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-24 07:00:20 -07:00 |
|
Kevin H. Luu
|
7281199a8c
|
[release] Move agent queue to Release cluster queues (#37783)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-03-23 20:36:47 -07:00 |
|
Kevin H. Luu
|
b2dd75eb48
|
Downsize CPU jobs to use small queue (#37913)
Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-03-23 20:36:37 -07:00 |
|
Andreas Karatzas
|
de99d91ece
|
[ROCm][CI] Split Entrypoints Integration (API Server 1) into 3 jobs (#37906)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-24 09:48:37 +08:00 |
|
Wentao Ye
|
83c9d525b6
|
[CI] Add batch invariant test: Block FP8 + small MOE (#37895)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-23 21:16:14 -04:00 |
|
roikoren755
|
56777b5c89
|
[Test] E2E Nemotron-3-Super tests (#36803)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-03-23 17:49:56 -07:00 |
|
Kevin H. Luu
|
2488a82f89
|
[CI] Split V1 Others into 3 separate jobs (#37016)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-24 06:44:38 +08:00 |
|
Kyle Sayers
|
38364a7e32
|
[Sparse24] [Deprecation] Remove Sparse24 CT integration and kernels (#36799)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-03-23 16:03:29 -04:00 |
|
Kunshang Ji
|
91fd695b75
|
[CI] split Entrypoints Integration (API Server 1) into 3 jobs (#37882)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-23 10:37:56 -07:00 |
|
Nicolò Lucchesi
|
1cbbcfe8a3
|
[CI][PD] Add Hybrid SSM integration tests to CI (#37657)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-23 23:58:19 +08:00 |
|
Jee Jee Li
|
1f0d210641
|
[CI/Build][LoRA] Update Qwen35 LoRA testing (#37816)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-23 12:55:49 +08:00 |
|
Woosuk Kwon
|
43877a620b
|
[MRV2] Enable PP CUDA graph test (#37830)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-22 16:30:25 -07:00 |
|
Andreas Karatzas
|
6eedec6e36
|
[ROCm][CI] Make some duplicated tests optional so that they are only evaluated in our nightly (#37780)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 16:03:18 +08:00 |
|
Andreas Karatzas
|
e78bc74268
|
[ROCm][CI] close missing quote in kernels/moe block in run-amd-test.sh (#37774)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 09:42:34 +08:00 |
|
Robert Shaw
|
eeee5b262d
|
[Quantization][Deprecation] Remove PTPC FP8 (#32700)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-03-21 22:10:16 +00:00 |
|
Andreas Karatzas
|
02eec7ecbe
|
[ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list (MI355) (#37721)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-21 15:27:12 +08:00 |
|