biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Andreas Karatzas	4cde2e0159	[ROCm][Bugfix] Resolve Dynamo tracing crash from amdsmi calls in on_gfx* arch detection (#34108 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-09 20:50:20 -08:00
kourosh hakhamaneshi	4a2d00eafd	[bugfix] [ROCm] Fix premature CUDA initialization in platform detection (#33941 ) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>	2026-02-06 16:17:55 -06:00
kourosh hakhamaneshi	2f6d17cb2f	[rocm][ray] Fix: Unify Ray device visibility handling across CUDA and ROCm (#33308 ) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>	2026-02-04 10:09:14 -08:00
杨朱 · Kiki	1a7894dbdf	[Misc] Replace Optional[X] with X \| None syntax (#33332 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 01:56:59 -08:00
Paco Xu	157caf511b	[Perf] avoid duplicate mem_get_info() call in get_current_memory_usage (#33064 ) Signed-off-by: Paco Xu <paco.xu@daocloud.io>	2026-01-27 03:45:45 +00:00
monajafi-amd	97ef11dd34	[ROCm][ViT] Enable Flash Attention Triton backend on RDNA3/RDNA4 (#32944 ) Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>	2026-01-24 10:03:07 +08:00
Wentao Ye	37c9859fab	[Refactor] Clean up unused variables & func (#32692 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-23 17:04:25 -05:00
Matt	c517d8c934	[Hardware][AMD][CI][Bugfix] Fix regressions from deprecated env vars (#32837 ) Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-01-23 00:59:15 +08:00
Alex Sun	49a1262267	[AMD][ROCm] MoRI EP: a high-performance all2all backend (#28664 ) Signed-off-by: Alex Sun <alex.s@amd.com>	2026-01-22 16:33:18 +08:00
Wentao Ye	6437ff1fb9	[Deprecation] Remove deprecated environment variables (#32812 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-22 02:25:16 +00:00
Pleaplusone	6c20e89c02	[ROCm][Deepseekv3.2] Refactor Sparse Indexer as CustomOp (#29287 ) Signed-off-by: ganyi <ygan@amd.com>	2026-01-21 23:16:30 +08:00
Hongxia Yang	048bb59728	AMD CI Test - unskip moe_sum test and moe_align_block_size tests (#32039 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>	2026-01-13 23:25:10 -08:00
Matt	bde57ab2ed	[Hardware][AMD][CI][Bugfix] Fix AMD Quantization test group (#31713 ) Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-01-10 23:19:46 -08:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
Isotr0py	6aa5b18e1d	[v1] Add encoder-only/cross attention support to Triton Attention backend (#31406 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-06 00:00:23 +08:00
Pleaplusone	1a834df2d4	[ROCm][Bugfix] Fix accuracy issue on fmoe when `VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS` enabled (#31523 ) Signed-off-by: ganyi <ygan@amd.com>	2025-12-30 09:21:49 +00:00
Kevin McKay	cf8eed7bef	[Bugfix][ROCm] Fix typo: is_linear_fp8_enaled -> is_linear_fp8_enabled (#31109 ) Signed-off-by: c0de128 <kevin.mckay@outlook.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-21 21:14:58 -08:00
Andreas Karatzas	7b43db210c	[ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements (#30270 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-19 02:17:27 +00:00
TJian	d0fb572929	[ROCm] [AITER] [DOC] Add usage description about check functions in `_aiter_ops` (#30586 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-12-16 13:50:47 +00:00
Isotr0py	ec154c36ee	[Platform] Refactor Platform attention backend selection to avoid breakpoint for OOT platform (#30212 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-15 17:36:07 +00:00
Shanshan Shen	87b4d1557d	[CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. (#30125 ) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-12-15 11:13:32 +08:00
Andreas Karatzas	b51255f369	[ROCm] Fix broken import in platform attention backend dispatching (#30432 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-11 01:12:58 +00:00
vllmellm	ee14644ba9	[ROCm] Aiter Quant Kernels (#25552 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-12-09 14:27:37 +00:00
Isotr0py	b952f4d3c3	[v1] Add PrefixLM support to FlexAttention backend (#27938 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-07 15:51:36 +00:00
Qiu	0098a6e3da	[PCP&DCP] move CUDAGraph check for PCP&DCP to the check func of platforms (#29952 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-04 21:40:51 -05:00
Matthew Bonanni	fc1d8be3dc	[Attention] Update attention imports (#29540 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-27 11:19:09 -05:00
Micah Williamson	ef1f7030f0	[ROCm][CI] Fix test_cudagraph_mode failure in AMD CI (#29367 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-11-25 07:55:09 +00:00
vllmellm	64deead719	[Bugfix] [ROCm] [UX]: revert Flex attention backend (#29371 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-11-25 06:56:06 +00:00
vllmellm	e48b2e6848	[Bugfix] [ROCm] [UX] Reorganize ROCm Backend Selection Logic (#26980 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-11-24 15:24:49 +00:00
Matthew Bonanni	11857a00b0	[Attention] Add ROCM_AITER_MLA_SPARSE to attention backend registry (#29103 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-20 20:24:43 -08:00
Pleaplusone	06c20c9904	[ROCm] Add AMD GPU support on Deepseek v3.2 and SparseMLA (#26670 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-20 02:54:01 -08:00
Aleksandr Malyshev	ac10fd3c69	Upstreaming aiter triton attention backend as a new backend (#28701 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2025-11-19 19:59:30 +00:00
Strahinja Stamenkovic	814843e021	Enable bitsandbytes quantization on AMD GPUs that use warp size 32 (#27307 ) Signed-off-by: sstamenk <strahinja.stamenkovic@amd.com>	2025-11-19 03:12:31 +00:00
Huamin Li	07a606aa7e	[CI Failure] Fix backend selection for encoder-only models (#28534 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-11-13 10:11:27 -05:00
wangxiyuan	2dacd57394	[platform] Move get_cu_count to utils (#27005 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-13 08:48:47 +08:00
vllmellm	d8140b9833	[ROCM] Fix ROCm warnings, environment flag access, and GEMM kernel naming for consistency in `_aiter_ops.py` (#28464 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-11-12 21:46:57 +00:00
wangxiyuan	10138c92a5	[V0 deprecation] Deprecate use_v1 parameter (#28112 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-12 14:03:52 +00:00
Andreas Karatzas	9f0247cfa4	`VLLM_USE_TRITON_FLASH_ATTN` V0 variable deprecation (#27611 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Andreas Karatzas <Andreas.Karatzas@amd.com>	2025-11-11 18:34:36 -08:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00
vllmellm	f080a83511	[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like `_custom_ops` and `_ipex_ops` (#24490 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-10 08:20:53 -08:00
JartX	c5f685b3ae	[ROCm][Platform] Add RX7900XTX device id in _ROCM_DEVICE_ID_NAME_MAP (#28279 ) Signed-off-by: JartX <sagformas@epdcenter.es>	2025-11-09 23:09:36 +00:00
Pleaplusone	6cae1e5332	[ROCm][MLA] Support block-size > 1 for AITER MLA backend (#27224 ) Signed-off-by: ganyi <ygan@amd.com> Co-authored-by: wuhuikx <hattie.wu@amd.com>	2025-11-05 10:43:02 -05:00
wangxiyuan	30a14b034f	[V0 deprecation] Remove VLLM_USE_V1 usage in platform and v1 module (#27798 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-01 10:17:45 +00:00
Xiake Sun	ded24e3e54	[ROCm][Platform] Add MI308X device id in _ROCM_DEVICE_ID_NAME_MAP (#27623 ) Signed-off-by: Xiake Sun <xiake.sun@amd.com>	2025-10-29 14:44:03 +00:00
Zhewen Li	83fd49b1fc	[CI/Build][Bugfix]Fix Quantized Models Test on AMD (#27712 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-10-29 06:27:30 +00:00
JartX	65d2cf9511	[BUGFIX][ROCM] ViT FlashAttention on ROCm (no GFX9) and contiguous on qwen3vl ROCm TORCH_SDPA (#27190 ) Signed-off-by: JartX <sagformas@epdcenter.es> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-10-26 15:08:52 +08:00
Luciano Martins	e05a6754a8	[Model] Revert PR #26715 : Restore custom PaliGemma and Gemma3-MM impl… (#27309 ) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>	2025-10-22 10:05:34 -07:00
wangxiyuan	f6027b2855	[1/N][Platform] Cleanup useless function (#26982 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-10-22 09:04:57 +00:00
Isotr0py	6ac5e06f7c	[Chore] Clean up pytorch helper functions in `vllm.utils` (#26908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-10-18 09:48:22 -07:00
Harry Mellor	6c9fdbf725	[Docs] Replace `rst` style double-backtick with `md` single-backtick (#27091 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-17 02:47:34 -07:00

1 2 3 4

156 Commits