biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
haosdent	116ed130f4	[Bugfix] Fix GDN attention crash with mixed decode/spec-decode batches (#34871 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-03-16 10:30:23 +01:00
Harry Mellor	17dc9c7fc9	[CI] Bump `mypy` version (#34950 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 20:55:11 +00:00
Woosuk Kwon	0916e7960b	[GDN] Use CPU tensors to build GDN metadata (#34498 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-02-13 01:24:45 -08:00
Vadim Gimpelson	000214c4bb	[BUGFIX] Fix accuracy bugs in Qwen3-Next MTP (#34077 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-02-10 10:57:11 -05:00
Harry Huang	5206e5e28c	[V1][Hybrid] Mamba Prefix Caching with align mode (#30877 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2026-01-23 09:56:48 -08:00
tianshu-Michael-yu	13d8746c54	[Feature]: Remove DtoH Copy for lfm2_vl On Default Stream (#32815 ) Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>	2026-01-23 13:20:30 +00:00
Nicolò Lucchesi	160c6fa387	[Misc] Add `get_name` to missing AttentionBackends (#32698 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-23 10:35:44 +00:00
Matthew Bonanni	20228cb851	[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-12 09:13:56 -08:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
Cyrus Leung	b665bbc2d4	[Chore] Migrate V0 attention utils (#31891 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-07 13:44:36 +00:00
Jack Yang	0a2c2dc3f1	fixed mypy warnings for files vllm/v1/attention with TEMPORARY workaround (#31465 ) Signed-off-by: Zhuohao Yang <zy242@cornell.edu> Co-authored-by: Zhuohao Yang <zy242@cornell.edu> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-07 04:08:47 +00:00
Lucas Wilkinson	4c73be14e0	[Attention][2/n] Remove usage of deprecated `seq_lens_cpu` and `num_computed_tokens_cpu` CommonAttentionMetadata properties (#31774 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-01-06 17:32:14 +00:00
Benjamin Chislett	85aff45e24	[Perf] Remove blocking copy in GDN Attention (#31167 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-12-22 14:25:22 -08:00
drslark	add1b9d3de	[main][BugFix] Fixed an accuracy bug of Qwen3-next-MTP when batched inferring (#30632 ) Signed-off-by: drslark <slarksblood@qq.com>	2025-12-14 01:32:16 -08:00
Lucas Wilkinson	abe93bce59	[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-12-09 17:18:10 -08:00
Matthew Bonanni	1d93f11675	[Attention][CUDAGraph] Remove CG padding from attention backends (#29352 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-02 13:48:08 -05:00
Benjamin Chislett	304419576a	[Perf] Refactor cudagraph_support to enable full CUDA graphs for spec decoding with FlashInfer (#28479 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-13 01:56:40 +09:00
fhl2000	284cc92275	[MISC] `cudagraph_capture_sizes` related improvements (#26016 ) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-24 05:11:05 -07:00
Vadim Gimpelson	785d8b6410	[PERF] Qwen3-next MTP speedup (change bool mask indexing to index_select / index_copy to reduce d2h) (#26437 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-10-16 12:18:31 +08:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Roger Wang	43c146ca42	[Misc] Clean up unnecessary E501 ignore (#26274 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-10-06 07:29:18 +00:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Tao He	99b3a504c5	[Qwen3-Next][GDN] fixes cuda graph capturing bug in GDN metadata and a stride bug in causal_conv_1d. (#25743 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>	2025-09-26 01:18:58 -07:00
Benjamin Chislett	c30b405b8f	[Spec Decode] Enable FlashInfer Spec Decoding (#25196 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Co-authored-by: lhsjohn <huashuoli@tencent.com>	2025-09-23 22:29:58 -04:00
Thomas Parnell	a903669e10	[V1] Remove V0 code paths for Hybrid models (#25400 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-09-23 08:26:13 -07:00
Vadim Gimpelson	072d7e53e5	[PERF] Add `conv1d` metadata to GDN attn (#25105 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-09-18 14:27:49 +00:00
Tao He	dd6a910aac	[Bugfix][Qwen3-Next] fixes the varlen issue in qwen3-next's MTP implementation. (#24957 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>	2025-09-17 21:59:09 +08:00
Tao He	8226dd56bf	[Qwen3Next] Fixes the cuda graph capture conditions under large batch sizes (#24660 ) (#24667 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>	2025-09-12 22:31:32 +00:00
Tao He	e93f4cc9e3	Add the support for the qwen3 next model (a hybrid attention model). (#24526 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-11 15:32:09 +08:00

29 Commits