monajafi-amd
|
97ef11dd34
|
[ROCm][ViT] Enable Flash Attention Triton backend on RDNA3/RDNA4 (#32944)
Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>
|
2026-01-24 10:03:07 +08:00 |
|
Wentao Ye
|
37c9859fab
|
[Refactor] Clean up unused variables & func (#32692)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-23 17:04:25 -05:00 |
|
Markus / Mark
|
586a57ad7e
|
fix: Add glm4_moe_lite to MLA detection (#32614)
Signed-off-by: marksverdhei <marksverdhei@hotmail.com>
Signed-off-by: Markus / Mark <46672778+marksverdhei@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2026-01-23 12:38:57 -08:00 |
|
Matthew Bonanni
|
955b43a5a5
|
[Bugfix][Attention] Explicitly report support for kv_cache_dtype bfloat16 (#32795)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-22 19:05:18 +00:00 |
|
Matt
|
c517d8c934
|
[Hardware][AMD][CI][Bugfix] Fix regressions from deprecated env vars (#32837)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-01-23 00:59:15 +08:00 |
|
Alex Sun
|
49a1262267
|
[AMD][ROCm] MoRI EP: a high-performance all2all backend (#28664)
Signed-off-by: Alex Sun <alex.s@amd.com>
|
2026-01-22 16:33:18 +08:00 |
|
Wentao Ye
|
6437ff1fb9
|
[Deprecation] Remove deprecated environment variables (#32812)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-22 02:25:16 +00:00 |
|
Pleaplusone
|
6c20e89c02
|
[ROCm][Deepseekv3.2] Refactor Sparse Indexer as CustomOp (#29287)
Signed-off-by: ganyi <ygan@amd.com>
|
2026-01-21 23:16:30 +08:00 |
|
Matthew Bonanni
|
1a1fc3bbc0
|
[Attention][MLA] Make FLASHINFER_MLA the default MLA backend on Blackwell, and TRTLLM the default prefill (#32615)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-19 18:41:34 -05:00 |
|
Matthew Bonanni
|
2e7c89e708
|
Revert "[Attention][MLA] Make FLASHINFER_MLA the default MLA backen… (#32484)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-17 04:42:39 +00:00 |
|
Matthew Bonanni
|
8ebfacaa75
|
[Attention][MLA] Make FLASHINFER_MLA the default MLA backend on Blackwell, and TRTLLM the default prefill (#32339)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-15 09:49:57 -05:00 |
|
Shanshan Shen
|
ce0946249d
|
[Misc] Make mem utils can be reused by other platforms (#32322)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2026-01-14 03:46:01 -08:00 |
|
Hongxia Yang
|
048bb59728
|
AMD CI Test - unskip moe_sum test and moe_align_block_size tests (#32039)
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
|
2026-01-13 23:25:10 -08:00 |
|
Matt
|
bde57ab2ed
|
[Hardware][AMD][CI][Bugfix] Fix AMD Quantization test group (#31713)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-01-10 23:19:46 -08:00 |
|
Matthew Bonanni
|
2612ba9285
|
[1/N][Attention] Restructure attention: move files (#31916)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-09 13:10:24 -08:00 |
|
Ning Xie
|
c907d22158
|
[refactor] refactor memory constants usage (#31865)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-01-07 18:37:31 +00:00 |
|
sihao_li
|
59fe6f298e
|
[XPU]fallback to TRITON_ATTN on xpu when use float32 dtype (#31762)
Signed-off-by: sihao.li <sihao.li@intel.com>
|
2026-01-07 08:10:29 +00:00 |
|
weiyu
|
e7596371a4
|
[Refactor][TPU] Remove torch_xla path and use tpu-inference (#30808)
Signed-off-by: Wei-Yu Lin <weiyulin@google.com>
Signed-off-by: weiyu <62784299+weiyu0824@users.noreply.github.com>
|
2026-01-07 16:07:16 +08:00 |
|
Cyrus Leung
|
db318326a5
|
[Misc] Use deprecated for seed_everything (#31780)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-06 11:29:55 +00:00 |
|
Isotr0py
|
6aa5b18e1d
|
[v1] Add encoder-only/cross attention support to Triton Attention backend (#31406)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-06 00:00:23 +08:00 |
|
zzzzwwjj
|
caaa482aca
|
[platform] Support additional forward context for OOT (#31674)
Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: zzzzwwjj <34335947+zzzzwwjj@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-01-05 10:25:13 +00:00 |
|
wangxiyuan
|
bb4337b34c
|
[Platform] Deprecate seed_everything (#31659)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2026-01-04 18:34:04 -08:00 |
|
SameerAsal
|
70e1acefcd
|
[BugFix] Fix NUMA node validation in CPU platform (#31520)
Signed-off-by: SameerAsal <SameerAsal@users.noreply.github.com>
Co-authored-by: SameerAsal <SameerAsal@users.noreply.github.com>
|
2025-12-31 04:06:49 +00:00 |
|
Li, Jiang
|
7157596103
|
[CPU] Disable async schedule on CPU (#31525)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-12-30 12:34:08 +00:00 |
|
Pleaplusone
|
1a834df2d4
|
[ROCm][Bugfix] Fix accuracy issue on fmoe when VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS enabled (#31523)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-12-30 09:21:49 +00:00 |
|
sihao_li
|
471ddb99a0
|
[XPU] Remove distributed_executor_backend check (#30760)
Signed-off-by: sihao.li <sihao.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-12-23 21:34:33 -08:00 |
|
Yan Ma
|
f1c2c20136
|
[XPU] decrease IGC_ForceOCLSIMDWidth for speculative decoding triton-xpu kernel compilation (#30538)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2025-12-23 05:22:15 +00:00 |
|
Kevin McKay
|
cf8eed7bef
|
[Bugfix][ROCm] Fix typo: is_linear_fp8_enaled -> is_linear_fp8_enabled (#31109)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2025-12-21 21:14:58 -08:00 |
|
Andreas Karatzas
|
7b43db210c
|
[ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements (#30270)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-19 02:17:27 +00:00 |
|
Fanli Lin
|
058926d48c
|
[XPU] allow custom workers (e.g. vllm-omni workers) to be used on XPU (#30935)
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
|
2025-12-18 10:16:36 -08:00 |
|
TJian
|
d0fb572929
|
[ROCm] [AITER] [DOC] Add usage description about check functions in _aiter_ops (#30586)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-12-16 13:50:47 +00:00 |
|
Isotr0py
|
ec154c36ee
|
[Platform] Refactor Platform attention backend selection to avoid breakpoint for OOT platform (#30212)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-15 17:36:07 +00:00 |
|
Shanshan Shen
|
87b4d1557d
|
[CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. (#30125)
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-12-15 11:13:32 +08:00 |
|
Wentao Ye
|
6e78ed6ba7
|
[Logs] Optimize startup logs 4 (#29903)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-13 16:12:53 -05:00 |
|
Roberto L. Castro
|
4fa7ce46f3
|
[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484)
Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-12-12 19:34:23 -08:00 |
|
Fadi Arafeh
|
f355ad5412
|
[CPU][FIX] Fix build failures on Arm CPUs with torch nightly (#30481)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2025-12-12 02:09:25 +00:00 |
|
Andreas Karatzas
|
b51255f369
|
[ROCm] Fix broken import in platform attention backend dispatching (#30432)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-11 01:12:58 +00:00 |
|
vllmellm
|
ee14644ba9
|
[ROCm] Aiter Quant Kernels (#25552)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-12-09 14:27:37 +00:00 |
|
Lucas Wilkinson
|
0044c4038c
|
[BugFix][DeepSeek-V3.2] Fix backend selection logic for Blackwell (#30195)
|
2025-12-07 10:53:51 -05:00 |
|
Isotr0py
|
b952f4d3c3
|
[v1] Add PrefixLM support to FlexAttention backend (#27938)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-07 15:51:36 +00:00 |
|
Wentao Ye
|
17eb25e327
|
[Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement (#29558)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-07 04:44:50 +00:00 |
|
Matthew Bonanni
|
66e674cdd5
|
[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2025-12-05 09:48:43 -08:00 |
|
Qiu
|
0098a6e3da
|
[PCP&DCP] move CUDAGraph check for PCP&DCP to the check func of platforms (#29952)
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-04 21:40:51 -05:00 |
|
gausah01
|
28097d5638
|
[Bugfix][CPU] Fix CPU KV cache fallback memory allocation (#29604)
Signed-off-by: Gauri Sahnan <gauri.sahnan@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2025-12-04 13:01:15 +08:00 |
|
Li, Jiang
|
e2f56c309d
|
[CPU] Update torch 2.9.1 for CPU backend (#29664)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-11-28 13:37:54 +00:00 |
|
Isotr0py
|
38658ec6f3
|
[Bugfix][MM encoder] Fix ViT attention backend resolving for Turing GPU (#29614)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-27 19:17:37 +00:00 |
|
Matthew Bonanni
|
fc1d8be3dc
|
[Attention] Update attention imports (#29540)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-27 11:19:09 -05:00 |
|
Johnny Yang
|
3ecabd06ee
|
Fix tpu-inference platform path (#29554)
Signed-off-by: Johnny Yang <johnnyyang@google.com>
|
2025-11-26 23:25:21 -08:00 |
|
Johnny Yang
|
ba1fcd84a7
|
[TPU] add tpu_inference (#27277)
Signed-off-by: Johnny Yang <johnnyyang@google.com>
|
2025-11-26 14:46:36 -08:00 |
|
Matthew Bonanni
|
430dd4d9eb
|
[Attention] Remove imports from vllm/attention/__init__.py (#29342)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-26 10:53:15 -07:00 |
|