biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Cyrus Leung	638e4196d1	[Misc] Make `SchedulerConfig.max_model_len` init-only (#28733 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-15 01:59:31 -08:00
QiliangCui	9fc81ec765	[TPU] Fix import error in tpu launch (#28758 ) Signed-off-by: Qiliang Cui <derrhein@gmail.com>	2025-11-15 00:58:32 +00:00
Cyrus Leung	e2741f6cbc	[Chore] Rename `SchedulerConfig.chunked_prefill_enabled` (#28735 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 18:39:57 +00:00
Cyrus Leung	511a6b611d	[Config] Clean up SchedulerConfig initialization (#28665 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 22:41:02 +08:00
Huamin Li	07a606aa7e	[CI Failure] Fix backend selection for encoder-only models (#28534 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-11-13 10:11:27 -05:00
Fanli Lin	dbbe0c756a	[XPU] Support Triton path for LoRA operations on XPU (#28511 ) Signed-off-by: Fanli Lin <fanli.lin@intel.com>	2025-11-13 05:31:42 +00:00
wangxiyuan	2dacd57394	[platform] Move get_cu_count to utils (#27005 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-13 08:48:47 +08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	4ca5cd5740	[Core][AMD] Migrate fully transparent sleep mode to ROCm platform (#12695 ) Signed-off-by: Hollow Man <hollowman@opensuse.org> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>	2025-11-12 15:24:12 -08:00
vllmellm	d8140b9833	[ROCM] Fix ROCm warnings, environment flag access, and GEMM kernel naming for consistency in `_aiter_ops.py` (#28464 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-11-12 21:46:57 +00:00
Harry Mellor	54aecd9ed5	Fix pre-commit (and XPU) on `main` (#28556 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 06:13:41 -08:00
wangxiyuan	10138c92a5	[V0 deprecation] Deprecate use_v1 parameter (#28112 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-12 14:03:52 +00:00
Chaojun Zhang	a4730c1b4f	[XPU]Fix crash due to removed VLLM_USE_V1 attribute (#28520 ) Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>	2025-11-12 10:20:55 +00:00
Andreas Karatzas	9f0247cfa4	`VLLM_USE_TRITON_FLASH_ATTN` V0 variable deprecation (#27611 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Andreas Karatzas <Andreas.Karatzas@amd.com>	2025-11-11 18:34:36 -08:00
Li, Jiang	7f829be7d3	[CPU] Refactor CPU attention backend (#27954 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-12 09:43:06 +08:00
Kyuyeun Kim	df4d3a44a8	[TPU] Rename path to tpu platform (#28452 ) Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>	2025-11-11 19:16:47 +00:00
Matthew Bonanni	684f254585	Prefer FlashAttention MLA as default over FlashMLA (#27363 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-11 17:13:51 +00:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00
vllmellm	f080a83511	[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like `_custom_ops` and `_ipex_ops` (#24490 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-10 08:20:53 -08:00
JartX	c5f685b3ae	[ROCm][Platform] Add RX7900XTX device id in _ROCM_DEVICE_ID_NAME_MAP (#28279 ) Signed-off-by: JartX <sagformas@epdcenter.es>	2025-11-09 23:09:36 +00:00
StanHatko	e52e4da971	[HARDWARE][CPU] Add Option for Disabling Binding to Specific CPU Cores (#27953 ) Signed-off-by: Stan Hatko <stan_hatko@live.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2025-11-06 23:47:11 +08:00
Wentao Ye	d79d9f0780	[Bug] Fix cpu disable shared_experts `VLLM_DISABLE_SHARED_EXPERTS_STREAM` (#28157 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-05 17:03:09 -08:00
Pleaplusone	6cae1e5332	[ROCm][MLA] Support block-size > 1 for AITER MLA backend (#27224 ) Signed-off-by: ganyi <ygan@amd.com> Co-authored-by: wuhuikx <hattie.wu@amd.com>	2025-11-05 10:43:02 -05:00
wangxiyuan	30a14b034f	[V0 deprecation] Remove VLLM_USE_V1 usage in platform and v1 module (#27798 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-01 10:17:45 +00:00
Yan Ma	7e2729b57e	[Multimodal][XPU]Enable vision attn backend for xpu platform (#27525 ) Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yejing Lai <yejing.lai@intel.com> Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-01 04:45:02 +00:00
Akash kaothalkar	36960501d3	[Hardware][Powerpc] Fix VLLM_CPU_OMP_THREADS_BIND="auto" low CPU utilization for Power (#27734 ) Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>	2025-10-31 07:45:26 +00:00
Wentao Ye	5b0448104f	[Bug] Raise error explicitly if using incompatible backend (#27424 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-29 13:29:20 -04:00
Xiake Sun	ded24e3e54	[ROCm][Platform] Add MI308X device id in _ROCM_DEVICE_ID_NAME_MAP (#27623 ) Signed-off-by: Xiake Sun <xiake.sun@amd.com>	2025-10-29 14:44:03 +00:00
Zhewen Li	83fd49b1fc	[CI/Build][Bugfix]Fix Quantized Models Test on AMD (#27712 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-10-29 06:27:30 +00:00
Cyrus Leung	6ebffafbb6	[Misc] Clean up more utils (#27567 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-27 15:30:38 +00:00
Fadi Arafeh	a663f6ae64	[cpu][perf] Fix low CPU utilization with VLLM_CPU_OMP_THREADS_BIND on AArch64 (#27415 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-10-27 11:14:55 +00:00
Shanshan Shen	a3e8611da5	[Bugfix] Limit the default value of `max_model_len` when it is not specified by users (#27556 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-10-27 10:16:20 +00:00
Yeshwanth N	71b1c8b667	[Chore]:Extract math and argparse utilities to separate modules (#27188 ) Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com> Signed-off-by: Yeshwanth N <yeshsurya@gmail.com> Signed-off-by: yeshsurya <yeshsurya@gmail.com>	2025-10-26 04:03:32 -07:00
JartX	65d2cf9511	[BUGFIX][ROCM] ViT FlashAttention on ROCm (no GFX9) and contiguous on qwen3vl ROCm TORCH_SDPA (#27190 ) Signed-off-by: JartX <sagformas@epdcenter.es> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-10-26 15:08:52 +08:00
Wentao Ye	52efc34ebf	[Log] Optimize Startup Log (#26740 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-24 19:27:04 -04:00
Luciano Martins	e05a6754a8	[Model] Revert PR #26715 : Restore custom PaliGemma and Gemma3-MM impl… (#27309 ) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>	2025-10-22 10:05:34 -07:00
Chendi.Xue	7c4767f1eb	[NIXL] use Host buffer to support TP_ratio > 1 for XPU (#27140 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2025-10-22 15:28:13 +00:00
Li, Jiang	843af7f7fc	[Bugfix][CPU] Disable dual stream execution for experts on CPU (#27320 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-10-22 11:02:27 +00:00
wangxiyuan	f6027b2855	[1/N][Platform] Cleanup useless function (#26982 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-10-22 09:04:57 +00:00
Isotr0py	6ac5e06f7c	[Chore] Clean up pytorch helper functions in `vllm.utils` (#26908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-10-18 09:48:22 -07:00
iAmir97	1d165d6d85	[Chore] Separate out `vllm.utils.mem_utils` (#27143 ) Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com> Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com> Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-18 10:06:59 +00:00
Harry Mellor	6c9fdbf725	[Docs] Replace `rst` style double-backtick with `md` single-backtick (#27091 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-17 02:47:34 -07:00
Cyrus Leung	8c017b3490	[Model] Always use Transformers backend for PaliGemma and Gemma3-MM (#26715 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-17 05:03:35 +00:00
Cyrus Leung	4d4d6bad19	[Chore] Separate out `vllm.utils.importlib` (#27022 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-17 00:48:59 +00:00
wangxiyuan	8f4b313c37	[Misc] rename torch_dtype to dtype (#26695 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-10-15 12:11:48 +00:00
wangxiyuan	db1764e4e0	[Platform] allow platform to init dp group (#22243 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-10-15 02:32:17 -07:00
Morrison Turnansky	96b9aa5aa0	[Frontend][torch.compile] CompilationConfig Overhaul (#20283 ): name change compilation level to compilation mode, deprecation compilation level (#26355 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-15 02:51:16 +00:00
wangxiyuan	577d498212	[Plugin] Make plugin group clear (#26757 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-10-14 07:49:59 +00:00
Michael Goin	3e051bda82	[UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend (#26732 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-13 18:12:52 -07:00
Morrison Turnansky	e3fdb627d9	[FrontEnd] UNREVERT CompilationConfig overhaul (#20283 ): deprecate use_inductor in favor of backend, simplify custom_ops (#26502 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>	2025-10-13 22:47:16 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00

1 2 3 4 5 ...

388 Commits