Commit Graph

463 Commits

Author SHA1 Message Date
Andreas Karatzas
4cde2e0159 [ROCm][Bugfix] Resolve Dynamo tracing crash from amdsmi calls in on_gfx* arch detection (#34108)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-09 20:50:20 -08:00
ZhengHongming888
cb62e86f83 Add NUMA Core binding in nixl_connector for CPU xPyD (#32365)
Signed-off-by: Hongming Zheng <hongming.zheng@intel.com>
Signed-off-by: ZhengHongming888 <hongming.zheng@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-09 15:39:12 +00:00
kourosh hakhamaneshi
4a2d00eafd [bugfix] [ROCm] Fix premature CUDA initialization in platform detection (#33941)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2026-02-06 16:17:55 -06:00
Cyrus Leung
cd8b405bd0 [Refactor] Consolidate sequence normalization and enc-dec parsing (#33928)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-06 15:43:47 +00:00
Luka Govedič
ac32e66cf9 [torch.compile] Reorganize vllm/compilation and tests/compile (0/N for vLLM IR) (#33731)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: ProExpertProg <luka.govedic@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2026-02-06 04:19:49 -08:00
kourosh hakhamaneshi
2f6d17cb2f [rocm][ray] Fix: Unify Ray device visibility handling across CUDA and ROCm (#33308)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2026-02-04 10:09:14 -08:00
Kunshang Ji
061da6bcf7 [XPU] remove common path warning log (#33769)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-02-04 16:40:17 +08:00
Kunshang Ji
e10604480b [XPU][1/N] Deprecate ipex and switch to vllm-xpu-kernels for xpu platform (#33379)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-02-02 22:46:10 -08:00
Matthew Bonanni
5d1aef3004 [UX] Format attention backend log line (#33570)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-02-02 18:57:12 +00:00
Wentao Ye
010ec0c30e [Deprecation] Deprecate seed_everything and scatter_mm_placeholders in v0.15 (#33362)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-01-31 02:54:16 +00:00
杨朱 · Kiki
1a7894dbdf [Misc] Replace Optional[X] with X | None syntax (#33332)
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 01:56:59 -08:00
Kunshang Ji
8bb6271c77 [Intel GPU] refine xpu worker (#32894)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-01-29 12:26:52 +00:00
Paco Xu
157caf511b [Perf] avoid duplicate mem_get_info() call in get_current_memory_usage (#33064)
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
2026-01-27 03:45:45 +00:00
monajafi-amd
97ef11dd34 [ROCm][ViT] Enable Flash Attention Triton backend on RDNA3/RDNA4 (#32944)
Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>
2026-01-24 10:03:07 +08:00
Wentao Ye
37c9859fab [Refactor] Clean up unused variables & func (#32692)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-01-23 17:04:25 -05:00
Markus / Mark
586a57ad7e fix: Add glm4_moe_lite to MLA detection (#32614)
Signed-off-by: marksverdhei <marksverdhei@hotmail.com>
Signed-off-by: Markus / Mark <46672778+marksverdhei@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2026-01-23 12:38:57 -08:00
Matthew Bonanni
955b43a5a5 [Bugfix][Attention] Explicitly report support for kv_cache_dtype bfloat16 (#32795)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-22 19:05:18 +00:00
Matt
c517d8c934 [Hardware][AMD][CI][Bugfix] Fix regressions from deprecated env vars (#32837)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
2026-01-23 00:59:15 +08:00
Alex Sun
49a1262267 [AMD][ROCm] MoRI EP: a high-performance all2all backend (#28664)
Signed-off-by: Alex Sun <alex.s@amd.com>
2026-01-22 16:33:18 +08:00
Wentao Ye
6437ff1fb9 [Deprecation] Remove deprecated environment variables (#32812)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-01-22 02:25:16 +00:00
Pleaplusone
6c20e89c02 [ROCm][Deepseekv3.2] Refactor Sparse Indexer as CustomOp (#29287)
Signed-off-by: ganyi <ygan@amd.com>
2026-01-21 23:16:30 +08:00
Matthew Bonanni
1a1fc3bbc0 [Attention][MLA] Make FLASHINFER_MLA the default MLA backend on Blackwell, and TRTLLM the default prefill (#32615)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-01-19 18:41:34 -05:00
Matthew Bonanni
2e7c89e708 Revert "[Attention][MLA] Make FLASHINFER_MLA the default MLA backen… (#32484)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-17 04:42:39 +00:00
Matthew Bonanni
8ebfacaa75 [Attention][MLA] Make FLASHINFER_MLA the default MLA backend on Blackwell, and TRTLLM the default prefill (#32339)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-01-15 09:49:57 -05:00
Shanshan Shen
ce0946249d [Misc] Make mem utils can be reused by other platforms (#32322)
Signed-off-by: shen-shanshan <467638484@qq.com>
2026-01-14 03:46:01 -08:00
Hongxia Yang
048bb59728 AMD CI Test - unskip moe_sum test and moe_align_block_size tests (#32039)
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
2026-01-13 23:25:10 -08:00
Matt
bde57ab2ed [Hardware][AMD][CI][Bugfix] Fix AMD Quantization test group (#31713)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
2026-01-10 23:19:46 -08:00
Matthew Bonanni
2612ba9285 [1/N][Attention] Restructure attention: move files (#31916)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-01-09 13:10:24 -08:00
Ning Xie
c907d22158 [refactor] refactor memory constants usage (#31865)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2026-01-07 18:37:31 +00:00
sihao_li
59fe6f298e [XPU]fallback to TRITON_ATTN on xpu when use float32 dtype (#31762)
Signed-off-by: sihao.li <sihao.li@intel.com>
2026-01-07 08:10:29 +00:00
weiyu
e7596371a4 [Refactor][TPU] Remove torch_xla path and use tpu-inference (#30808)
Signed-off-by: Wei-Yu Lin <weiyulin@google.com>
Signed-off-by: weiyu <62784299+weiyu0824@users.noreply.github.com>
2026-01-07 16:07:16 +08:00
Cyrus Leung
db318326a5 [Misc] Use deprecated for seed_everything (#31780)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-06 11:29:55 +00:00
Isotr0py
6aa5b18e1d [v1] Add encoder-only/cross attention support to Triton Attention backend (#31406)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-06 00:00:23 +08:00
zzzzwwjj
caaa482aca [platform] Support additional forward context for OOT (#31674)
Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: zzzzwwjj <34335947+zzzzwwjj@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-01-05 10:25:13 +00:00
wangxiyuan
bb4337b34c [Platform] Deprecate seed_everything (#31659)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-01-04 18:34:04 -08:00
SameerAsal
70e1acefcd [BugFix] Fix NUMA node validation in CPU platform (#31520)
Signed-off-by: SameerAsal <SameerAsal@users.noreply.github.com>
Co-authored-by: SameerAsal <SameerAsal@users.noreply.github.com>
2025-12-31 04:06:49 +00:00
Li, Jiang
7157596103 [CPU] Disable async schedule on CPU (#31525)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-12-30 12:34:08 +00:00
Pleaplusone
1a834df2d4 [ROCm][Bugfix] Fix accuracy issue on fmoe when VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS enabled (#31523)
Signed-off-by: ganyi <ygan@amd.com>
2025-12-30 09:21:49 +00:00
sihao_li
471ddb99a0 [XPU] Remove distributed_executor_backend check (#30760)
Signed-off-by: sihao.li <sihao.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2025-12-23 21:34:33 -08:00
Yan Ma
f1c2c20136 [XPU] decrease IGC_ForceOCLSIMDWidth for speculative decoding triton-xpu kernel compilation (#30538)
Signed-off-by: Yan Ma <yan.ma@intel.com>
2025-12-23 05:22:15 +00:00
Kevin McKay
cf8eed7bef [Bugfix][ROCm] Fix typo: is_linear_fp8_enaled -> is_linear_fp8_enabled (#31109)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-21 21:14:58 -08:00
Andreas Karatzas
7b43db210c [ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements (#30270)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-19 02:17:27 +00:00
Fanli Lin
058926d48c [XPU] allow custom workers (e.g. vllm-omni workers) to be used on XPU (#30935)
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
2025-12-18 10:16:36 -08:00
TJian
d0fb572929 [ROCm] [AITER] [DOC] Add usage description about check functions in _aiter_ops (#30586)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-16 13:50:47 +00:00
Isotr0py
ec154c36ee [Platform] Refactor Platform attention backend selection to avoid breakpoint for OOT platform (#30212)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-15 17:36:07 +00:00
Shanshan Shen
87b4d1557d [CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. (#30125)
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-15 11:13:32 +08:00
Wentao Ye
6e78ed6ba7 [Logs] Optimize startup logs 4 (#29903)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-13 16:12:53 -05:00
Roberto L. Castro
4fa7ce46f3 [Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484)
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-12-12 19:34:23 -08:00
Fadi Arafeh
f355ad5412 [CPU][FIX] Fix build failures on Arm CPUs with torch nightly (#30481)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-12-12 02:09:25 +00:00
Andreas Karatzas
b51255f369 [ROCm] Fix broken import in platform attention backend dispatching (#30432)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-11 01:12:58 +00:00