biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Luka Govedič	bbbd696af9	[torch.compile][CI] Add back attn fusion on hopper/ada (#32940 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2026-01-23 16:49:20 +00:00
Robert Shaw	42135d6898	[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414 )	2026-01-21 08:22:33 -05:00
vllmellm	148117ea2e	[Refactor] Make FP8 Linear Ops use kernel abstraction (#27814 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-01-20 14:48:20 +08:00
Lucas Wilkinson	14ce524249	[CI] Breakup h200 tests (#30499 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-16 06:23:22 +00:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
Angela Yi	9a1d20a89c	[CI] Add warmup run in test_fusion_attn (#31183 ) Signed-off-by: angelayi <yiangela7@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-07 00:31:52 +00:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Matthew Bonanni	66e674cdd5	[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-12-05 09:48:43 -08:00
Arpit Khandelwal	d7284a2604	[Core] Rename PassConfig flags as per RFC #27995 (#29646 ) Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-12-03 03:38:55 +00:00
Harry Mellor	951445a52d	Remove default values from `InitVar`s so that they're not stored (#29859 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 12:16:37 +00:00
Matthew Bonanni	430dd4d9eb	[Attention] Remove imports from `vllm/attention/__init__.py` (#29342 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-26 10:53:15 -07:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00
Luka Govedič	bd7157a071	[torch.compile] Enable attention and allreduce fusion without custom ops enabled (#24604 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-17 08:10:23 -06:00
Adrian Abeyta	0a9ef0cfce	Move query quantization to attention layer for Flashinfer & Triton. (#26534 ) Signed-off-by: adabeyta <aabeyta@redhat.com> Signed-off-by: Adrian Abeyta <aabeyta@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-15 19:01:38 -04:00
Morrison Turnansky	96b9aa5aa0	[Frontend][torch.compile] CompilationConfig Overhaul (#20283 ): name change compilation level to compilation mode, deprecation compilation level (#26355 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-15 02:51:16 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Cyrus Leung	1e4ecca1d0	[V0 Deprecation] Remove `VLLM_USE_V1` from tests (#26341 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 15:42:31 +00:00
Gregory Shtrasberg	f231e5bc21	[ROCm] Split AITER unified attention into its own backend (#25507 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-10-06 22:49:23 +00:00
Harry Mellor	6c04638214	Fix per file ruff ignores related to line length (#26262 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-06 05:12:40 +00:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Matthew Bonanni	2aaa423842	[Attention] Move Backend enum into registry (#25893 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-02 20:32:24 -07:00
Yongye Zhu	fa7e254a7f	[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@meta.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by: Lucia Fang <fanglu@meta.com> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Xiaozhu Meng <mxz297@gmail.com> Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-09-30 17:14:41 +08:00
Matthew Bonanni	3468f17ebe	[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-09-25 17:37:50 +00:00
Luka Govedič	d5e0fca264	[torch.compile] Cleanup compilation tests and custom passes, add debug utils, fix DCE bug (#23091 ), fix test (#24376 ), and prep for custom op matching (#24604 ) (#24542 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: luka <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-22 12:30:05 -07:00
Woosuk Kwon	bc6e542d9f	Remove V0 attention backends (#25351 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-21 16:03:28 -07:00
Woosuk Kwon	52c2a8d4ad	[V0 Deprecation] Remove LLMEngine (#25033 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-20 17:56:30 -07:00
Boyuan Feng	8945b001db	[torch.compile] CUDAGraph Inductor partition integration (#24281 ) Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Signed-off-by: boyuanfeng <boyuan@meta.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-20 01:02:15 +00:00
Gregory Shtrasberg	9a161307f5	[torch.compile][ROCm][V1] Enable attention output FP8 fusion for V1 attention backends (#19767 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-10 13:59:55 -07:00
elvischenv	24d0c9e6ed	[NVIDIA][torch.compile] Support Flashinfer TRTLLM FP8-q/kv NVFP4-out Attention Kernel (#22703 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-08-22 22:09:05 +00:00
elvischenv	03752dba8f	[NVIDIA] Support Flashinfer TRTLLM FP8-q/kv/out Attention Kernel (#21716 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-08-19 08:22:15 -04:00
Luka Govedič	31d5c1797f	[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (#19830 ) Signed-off-by: Luka Govedic <lgovedic@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-11 04:56:28 +00:00
Jee Jee Li	1caca5a589	[Misc] Add SPDX-FileCopyrightText (#20428 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-04 07:40:42 +00:00
Luka Govedič	f98548b9da	[torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass (#16756 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com>	2025-06-12 08:31:04 -07:00

34 Commits