biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Francesco Fusco	5b55c0bea7	[Attention] Clarify comment explaining attn_logits +1 dimension (#33427 ) Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com>	2026-01-31 04:50:30 +00:00
Russell Bryant	a2ef06e1b3	[Misc] offest -> offset in comments and variable names (#33444 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-01-30 20:19:22 -08:00
Lucas Wilkinson	0a3c71e7e5	[BugFix] Fix whisper FA2 + full cudagraphs (#33360 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-31 12:15:06 +08:00
Matthew Bonanni	aaa901ad55	[Attention] Move MLA `forward` from backend to layer (#33284 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-30 19:30:00 -08:00
杨朱 · Kiki	1a7894dbdf	[Misc] Replace Optional[X] with X \| None syntax (#33332 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 01:56:59 -08:00
Gregory Shtrasberg	ab597c869a	[Bugfix] Add missing encoder only guard for do_kv_cache_update (#33269 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-01-28 21:25:07 +00:00
Gregory Shtrasberg	22ad649501	[ROCm] Enabling forward_includes_kv_cache on ROCm MHA backends (#33106 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-01-28 14:36:14 +08:00
Harry Mellor	2eb673a088	Add flake8-implicit-str-concat rules to Ruff (#33191 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 04:56:10 +00:00
Wentao Ye	3a6d5cbefd	[Perf] Optimize dcp allocate tensor (#33102 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-27 17:24:41 -05:00
Matthew Bonanni	a608b4c6c2	[5/N][Attention] Finish eliminating `vllm/attention` folder (#32064 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-27 10:02:51 -05:00
Nicolò Lucchesi	1f3a2c2944	[Bugfix] Disable CG for Whisper+FA2 (#33164 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-27 21:46:51 +08:00
Strahinja Stamenkovic	c568581ff3	Fix IndexError with encoder-decoder models when using Custom Paged Attention (#33112 ) Signed-off-by: sstamenk <strahinja.stamenkovic@amd.com>	2026-01-27 10:33:37 +08:00
ElizaWszola	a28b94e6ef	[Performance] Split FlashAttn attention and cache update (#25954 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Luka Govedič <luka.govedic@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2026-01-23 17:28:06 -08:00
Markus / Mark	586a57ad7e	fix: Add glm4_moe_lite to MLA detection (#32614 ) Signed-off-by: marksverdhei <marksverdhei@hotmail.com> Signed-off-by: Markus / Mark <46672778+marksverdhei@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2026-01-23 12:38:57 -08:00
Harry Huang	5206e5e28c	[V1][Hybrid] Mamba Prefix Caching with align mode (#30877 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2026-01-23 09:56:48 -08:00
tianshu-Michael-yu	13d8746c54	[Feature]: Remove DtoH Copy for lfm2_vl On Default Stream (#32815 ) Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>	2026-01-23 13:20:30 +00:00
Nicolò Lucchesi	160c6fa387	[Misc] Add `get_name` to missing AttentionBackends (#32698 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-23 10:35:44 +00:00
Andreas Karatzas	a8eb1182f1	[CI][Models] Add VLM Support for Sequence Classification Conversion (#32885 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-23 16:22:51 +08:00
Wentao Ye	7ef5873752	[CI] Fix mypy for `vllm/v1/structured_output` (#32722 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-23 11:55:51 +08:00
Eldar Kurtić	44f08af3a7	Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) (#30141 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com> Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>	2026-01-22 13:29:57 -07:00
Matthew Bonanni	955b43a5a5	[Bugfix][Attention] Explicitly report support for kv_cache_dtype bfloat16 (#32795 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-22 19:05:18 +00:00
Lucas Wilkinson	889722f3bf	[FlashMLA] Update FlashMLA to expose new arguments (#32810 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-21 22:02:39 -07:00
Nick Hill	24dc30f7ff	[ModelRunner V2] Don't pin reused flashinfer tensors (#32799 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-21 13:17:43 -08:00
elvischenv	808d6fd7b9	Bump Flashinfer to v0.6.1 (#30993 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2026-01-21 08:49:50 -08:00
Pleaplusone	6c20e89c02	[ROCm][Deepseekv3.2] Refactor Sparse Indexer as CustomOp (#29287 ) Signed-off-by: ganyi <ygan@amd.com>	2026-01-21 23:16:30 +08:00
Robert Shaw	42135d6898	[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414 )	2026-01-21 08:22:33 -05:00
Lucas Wilkinson	b4f64e5b02	Update FlashMLA (#32491 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-21 13:03:37 +08:00
Tomas Ruiz	4a5299c93f	feat: spec decode with draft models (#24322 ) Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>	2026-01-19 16:05:46 -05:00
Vadim Gimpelson	6101a26dc9	[BUGFIX] Fix degenerate strides in TRTLLM query tensors for FlashInfer backend. Fixes issue #32353 (#32417 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-01-18 16:57:32 -08:00
Wentao Ye	16de822c71	[Refactor] Remove unused file `pallas_kv_cache_update.py` (#32433 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-18 12:46:39 -08:00
Li Xie	c826c72a96	[Model] Support Step1 Model (#32511 ) Signed-off-by: xieli <xieli@stepfun.com>	2026-01-18 10:20:46 +00:00
Isotr0py	8cc26acd8b	[Performance] Improve Triton prefill attention kernel's performance (#32403 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-17 20:19:59 -08:00
Guofang.Tang	2b99f210f5	[Misc] Fix typo: seperator -> separator in flashmla_sparse.py (#32411 ) Signed-off-by: Guofang Tang <tinggofun@gmail.com> Co-authored-by: Guofang Tang <tinggofun@gmail.com>	2026-01-17 12:18:30 +00:00
Matthias Gehre	047413375c	[Attention][AMD] Make flash-attn optional (#30361 ) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>	2026-01-15 17:18:24 +00:00
Pleaplusone	130d6c9514	[ROCm][Perf] Enable shuffle kv cache layout and assembly paged attention kernel for `AiterFlashAttentionBackend` (#29887 ) Signed-off-by: ganyi <ygan@amd.com>	2026-01-15 15:29:53 +00:00
vllmellm	e27078ea80	[Bugfix][ROCm][performance] Resolve the performance regression issue of the Qwen3-Next-80B-A3B-Thinking under rocm_atten (#32336 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-01-14 19:32:48 +00:00
Matthew Bonanni	2263d44b68	[4/N][Attention] Move MLA common to model_executor (#32060 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-13 09:08:45 -08:00
Matthew Bonanni	98f60e5acb	[6/N][Attention] Move utils to more appropriate locations (#32215 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-13 05:38:52 -08:00
Mickaël Seznec	a5bbbd2f24	[Quantization] fix: overflow with static per-tensor scaling (#29867 ) Signed-off-by: Mickael Seznec <mickael@mistral.ai> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-13 12:56:01 +00:00
cjackal	15b33ff064	[Misc] improve warning/assert messages (#32226 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2026-01-13 03:11:23 +00:00
Matthew Bonanni	20228cb851	[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-12 09:13:56 -08:00
Asaf Joseph Gardin	8fb2c135be	[Bugfix] Fix stale SSM state for new Mamba requests scheduled as decode (#32118 ) Signed-off-by: Josephasafg <ajgard7@gmail.com>	2026-01-12 17:02:38 +00:00
Isotr0py	9dbe1fe960	[Bugfix] Fix missing scale passing for encoder Triton Attention implementation (#32149 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-12 11:13:41 +00:00
Vadim Gimpelson	e15a5ff07b	[MISC] Add strict contiguity check for FlashInfer attention tensors (#32008 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>	2026-01-10 12:40:05 -08:00
jvlunteren	b8bf5c45bb	[Kernel] Optimize Sliding Window Attention in 3D Triton Kernel (#31984 ) Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>	2026-01-10 18:13:44 +00:00
Lucas Wilkinson	da6709c9fe	[Misc] Delay deprecation of CommonAttentionMetadata properties (#32074 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-09 21:06:44 -08:00
Lucas Kabela	ea6d067a2a	[Misc][LLaMa4] Compile LLaMa Vision Encoder (#30709 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-01-09 22:01:38 -05:00
Matthew Bonanni	0308901975	[2/N][Attention] Fix pre-commit errors (#32052 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-10 00:27:15 +00:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
R3hankhan	8e27663b6a	[CPU] Add head sizes 80 and 112 with vec16 fallback (#31968 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2026-01-09 22:14:46 +08:00

1 2 3 4 5 ...

612 Commits