biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Michael Goin	db5d0719e1	[Kernel] Add MXFP8 to Marlin GEMM/MoE and refactor Mxfp8LinearOp (#34664 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-04-01 09:41:42 -07:00
wliao2	4dfad17ed1	replace cuda_device_count_stateless() to current_platform.device_count() (#37841 ) Signed-off-by: Liao, Wei <wei.liao@intel.com> Signed-off-by: wliao2 <wei.liao@intel.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-31 22:32:54 +08:00
EdalatiAli	e5b807607c	[Quant][Feature] Support online MXFP8 quantization for MoE and dense models (#35448 ) Signed-off-by: EdalatiAli <aliedalati@cohere.com>	2026-03-16 18:07:39 -04:00
Marc Sun	c973ecdead	[bnb] Skip moe + bnb test (#36896 ) Signed-off-by: Marc Sun <marc@huggingface.co>	2026-03-12 18:03:25 +00:00
Micah Williamson	fc4657756f	[ROCm][CI] Enable AITER for failing `test_gpt_oss` test case on MI355 (#36174 )	2026-03-07 13:50:17 -08:00
Micah Williamson	e7213003cb	[ROCm][CI] Fix TP size issue for `test_gpt_oss` (#35887 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-03-03 20:57:34 +00:00
Micah Williamson	8b9e8b7454	[ROCm][CI] Fix Assertion Logic For `test_gpt_oss` (#35806 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-03-03 05:08:04 +00:00
xuebwang-amd	b129136c7a	[ROCm][Quantization] GPT_OSS in amd-quark format model loading and emulations (#29008 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-10 10:08:05 -05:00
Cyrus Leung	edb359cce4	[Renderer] Define `render_cmpl` and `render_chat` (#34039 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-07 05:24:40 -08:00
Robert Shaw	af9b69f977	[Quantization][Deprecation] Remove Marlin 24 (#32688 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 15:54:59 +00:00
Robert Shaw	247d1a32ea	[Quantization][Deprecation] Remove BitBlas (#32683 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-28 11:06:22 +00:00
Roberto L. Castro	8ef50d9a6b	[Kernel][Performance] Enable smaller Scaling Factor tiling for NVFP4 small-batch decoding (#30885 ) Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>	2026-01-13 15:22:53 -08:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
Robert Shaw	5825bbc1f7	[Quantization] Deprecate Long Tail of Schemes (#31688 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-08 19:07:45 -05:00
Matthew Bonanni	7eb6cb6c18	[Attention] Update tests to remove deprecated env vars (#30563 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-17 09:49:59 -08:00
Tsukasa OI	42c1949643	[Bugfix][Quantization] Support BF16 tensors on GGUF (#29948 ) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>	2025-12-03 10:33:46 +00:00
Cyrus Leung	33b06a6f24	[Misc] Remove redundant attention var constants (#29650 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-28 04:35:19 -08:00
Julien Denize	57430fc95c	Default model load/config/tokenizer to `mistral` format if relevant files exist (#28659 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-21 13:58:59 -08:00
Strahinja Stamenkovic	814843e021	Enable bitsandbytes quantization on AMD GPUs that use warp size 32 (#27307 ) Signed-off-by: sstamenk <strahinja.stamenkovic@amd.com>	2025-11-19 03:12:31 +00:00
Luciano Martins	c2612371ad	[Model] Add Gemma3 GGUF multimodal support (#27772 ) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-18 08:56:29 -08:00
xuebwang-amd	5a1271d83a	[Quantization] fix attention quantization of gpt_oss model (#27334 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com>	2025-11-11 12:06:00 -05:00
Zhewen Li	83fd49b1fc	[CI/Build][Bugfix]Fix Quantized Models Test on AMD (#27712 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-10-29 06:27:30 +00:00
dongbo910220	3ae082c373	[Chore] Separate out optional dependency checks from vllm.utils (#27207 ) Signed-off-by: dongbo910220 <1275604947@qq.com> Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-22 10:44:21 -04:00
wangxiyuan	f6027b2855	[1/N][Platform] Cleanup useless function (#26982 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-10-22 09:04:57 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Woosuk Kwon	72dd1595b4	[CI] Skip tests failing on main (#25326 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-20 19:57:46 -07:00
Woosuk Kwon	52c2a8d4ad	[V0 Deprecation] Remove LLMEngine (#25033 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-20 17:56:30 -07:00
Cyrus Leung	bef180f009	[V0 Deprecation] Enable the remaining multimodal tests in V1 (#25307 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-20 17:50:58 +00:00
Cyrus Leung	3d9a1d2de5	[V1] Support `LLM.apply_model` (#18465 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-20 07:14:35 +00:00
Woosuk Kwon	14006840ea	[V0 Deprecation] Remove V0 FlashInfer attention backend (#22776 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-18 19:54:16 -07:00
Michael Goin	4fc722eca4	[Kernel/Quant] Remove AQLM (#22943 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-08-16 19:38:21 +00:00
Roger Wang	067c34a155	docs: remove deprecated disable-log-requests flag (#22113 ) Signed-off-by: Roger Wang <hey@rogerw.me>	2025-08-02 00:19:48 -07:00
Cyrus Leung	86ae693f20	[Deprecation][2/N] Replace `--task` with `--runner` and `--convert` (#21470 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-27 19:42:40 -07:00
Ning Xie	d97841078b	[Misc] unify variable for LLM instance (#20996 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-07-21 12:18:33 +01:00
Jee Jee Li	8020e98c9f	[Quantization][1/N] MoE support BNB-Inflight Quantization (#20061 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-11 08:01:13 +00:00
Isotr0py	77f77a951e	[Misc] Clean up mark to fork process in BNB tests (#20692 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-07-10 13:59:40 +00:00
Cyrus Leung	9fb52e523a	[V1] Support any head size for FlexAttention backend (#20467 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-06 09:54:36 -07:00
Michael Goin	1e473b3010	[CI] Disable failing GGUF model test (#19454 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-06-11 05:12:38 +00:00
Isotr0py	d2f0e7e615	[CI/Build] Improve Llama GGUF test robustness (#19287 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-07 17:23:28 +08:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Isotr0py	1f1b1bc03b	[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-27 04:40:28 +00:00
Michael Goin	f4a8a37465	[Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-20 09:08:37 -07:00
hissu-hyvarinen	f6518b2b48	[ROCm] Skip tests for quantizations incompatible with ROCm (#17905 ) Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com>	2025-05-12 18:39:28 -06:00
Bowen Bao	db593aa67f	[Quantization] Quark MXFP4 format loading (#16943 )	2025-05-07 15:05:05 -04:00
Cyrus Leung	d7543862bd	[Misc] Rename assets for testing (#17575 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 03:29:25 -07:00
Cyrus Leung	48e925fab5	[Misc] Clean up test docstrings and names (#17521 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-01 05:19:32 -07:00
Cyrus Leung	afb4429b4f	[CI/Build] Reorganize models tests (#17459 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-30 23:03:08 -07:00

48 Commits