biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Nicolò Lucchesi	66e86f1dbd	[Kernel] Mamba support different layout for Conv state (#37416 )	2026-04-03 01:50:09 +02:00
roikoren755	8e6293e838	[Mamba] Add stochastic rounding support (#35753 ) Signed-off-by: Roi Koren <roik@nvidia.com>	2026-03-30 12:33:49 -04:00
Wentao Ye	c59a132f96	[V0 Deprecation] Refactor kv cache from list to element (#37487 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-23 20:10:11 -07:00
Wentao Ye	0d81a1fe61	[V0 Deprecation] Deprecate virtual engine (#37195 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-18 14:30:14 -07:00
Benjamin Chislett	f5972a872f	[Model][Spec Decode] Nemotron-H MTP and Mamba Speculative Decoding Support (#33726 ) Signed-off-by: Shahar Mor <smor@nvidia.com> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Shahar Mor <smor@nvidia.com> Co-authored-by: Roi Koren <roik@nvidia.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-24 09:49:56 -08:00
Vadim Gimpelson	604b9eaec5	[BUGFIX] Fix accuracy regression for NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 with TP>1 (#34476 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-02-14 23:25:17 -08:00
whx	ce9b3cd3e9	[PluggableLayer][3/N] Apply PluggableLayer to mamba layers. (#33660 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2026-02-07 05:26:05 -08:00
Vadim Gimpelson	a372f3f40a	[MISC] Fix Tensor Parallelism for Quantized Mamba Models with n_groups=1 (#33257 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-02-03 15:10:31 -05:00
danisereb	f4a0921c9c	[Performance] Tune Mamba selective scan kernel for B200 (#32873 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-01-26 05:56:54 -08:00
Harry Huang	5206e5e28c	[V1][Hybrid] Mamba Prefix Caching with align mode (#30877 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2026-01-23 09:56:48 -08:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
Shanshan Shen	08d954f036	[Doc] Add developer guide for CustomOp (#30886 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2026-01-09 16:21:11 +00:00
Cyrus Leung	aab0102a26	[V0 deprecation] Remove more V0 references (#29088 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-21 11:56:59 +00:00
Shanshan Shen	d44e9df7d4	[Model][Mamba] Add selector for mamba attention backend and make it pluggable for other device (#26487 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-11-19 16:24:55 +00:00
tomeras91	1395461f5f	[Hybrid][torch.compile] Refactor mamba2 forward to avoid obscuring linear projections under custom op (#28587 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-11-18 16:49:36 -08:00
Isotr0py	6ac5e06f7c	[Chore] Clean up pytorch helper functions in `vllm.utils` (#26908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-10-18 09:48:22 -07:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Thomas Parnell	778f554157	[V1] [Hybrid] Some additional clean-up in Mamba2 prefix caching (#26222 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-10-06 10:40:30 +08:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Stan Wozniak	ea507c3a93	[V1] [Hybrid] Mamba2 Automatic Prefix Caching (#25752 ) Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com> Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-10-04 06:34:22 +02:00
Thomas Parnell	fea3e476aa	[Kernel] Chunk-aligned mamba2 (#24683 )	2025-09-29 23:18:25 +02:00
Chih-Chieh Yang	2b6b1d7809	[Model] Mamba2 varlen refactor (#21467 ) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com>	2025-09-26 11:31:14 +00:00
Michael Goin	7361ab379f	Remove redundant mutates_args and dispatch_key for direct_register_custom_op (#25512 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-23 22:48:40 +00:00
Thomas Parnell	a903669e10	[V1] Remove V0 code paths for Hybrid models (#25400 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-09-23 08:26:13 -07:00
Chih-Chieh Yang	73cfb3c5ee	[Model] Clean up and simplify Mamba2 Metadata Usage in both V0 and V1 (#24331 ) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>	2025-09-16 14:53:43 +00:00
tomeras91	27fcfe7bcf	[Mamba] Support TP>1 with quantization for mamba2 mixer in case `n_groups % tp_size == 0` (#24593 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com> Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-16 10:51:01 +00:00
Didier Durand	bcb06d7baf	[Doc]: fix typos in various files (#24726 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-12 06:43:12 -07:00
tomeras91	08abfa78ec	[Bugfix] fix modelopt exclude_modules name mapping (#24178 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-10 10:20:46 -07:00
Ayush Satyam	5c4b6e66fe	[Attention] Unify mamba and attention backend selection (#23171 ) Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com>	2025-08-25 09:09:36 +00:00
Thomas Parnell	75531a6c13	[V1] [Hybrid] Support using float32 for state in Hybrid Models (Mamba2, Mamba1, Minimax) (#22928 ) Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Daniel Afrimi <danielafrimi8@gmail.com> Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-08-15 12:57:06 +00:00
Asaf Joseph Gardin	3d232dbd19	[Mamba] - refactor: Renamed mamba_attn to mamba2_attn (#22818 ) Signed-off-by: asafg <asafg@ai21.com> Co-authored-by: asafg <asafg@ai21.com>	2025-08-15 06:38:05 +00:00
Chen Zhang	fceafaf582	[Bugfix][mamba] Fix type annotation of Mamba2Metadata (#22787 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-13 06:07:09 -07:00
Asaf Joseph Gardin	46a13949d5	[v1] - Mamba1 Attention Metadata (#21249 ) Signed-off-by: asafg <asafg@ai21.com> Co-authored-by: asafg <asafg@ai21.com>	2025-08-06 17:03:42 -07:00
Chih-Chieh Yang	b690e34824	[Model] Mamba2 preallocate SSM output tensor to avoid d2d copy overhead (#21075 ) Signed-off-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com> Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>	2025-08-02 01:59:34 -07:00
Asaf Joseph Gardin	a6c050286a	[v1][mamba] Added mamba_type into MambaSpec (#21715 ) Signed-off-by: asafg <asafg@ai21.com> Co-authored-by: asafg <asafg@ai21.com>	2025-07-28 08:15:55 +00:00
Chih-Chieh Yang	eab2f3980c	[Model] Replace Mamba2 RMSNorm Gated with Fused Triton Kernel (#20839 ) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Signed-off-by: Yu Chin Fabian Lim <fabian.lim@gmail.com> Signed-off-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com> Co-authored-by: Yu Chin Fabian Lim <fabian.lim@gmail.com>	2025-07-25 06:49:36 -07:00
Thomas Parnell	881e3cbe3b	[V1] [Hybrid] Enable piecewise CUDA Graph for mamba layers (#21194 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-07-19 19:27:21 +00:00
Tuan, Hoang-Trong	f29fd8a7f8	[BugFix] fix 3 issues: (1) using metadata for causal-conv1d, (2) indexing overflow in v1 vLLM, and (3) init_states in v0 (#20838 ) Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com> Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>	2025-07-15 16:08:26 -04:00
Thomas Parnell	3534c39a20	[V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli (#20840 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-07-15 04:04:35 -07:00
nopperl	5d09152ff1	[V1] Enable Mamba2 layers other than MambaMixer2 in the v1 engine (#20660 ) Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>	2025-07-11 05:53:31 +00:00
Tuan, Hoang-Trong	47043eb678	[Kernel] Triton implementation of causal-conv1d for Mamba-based models (#18218 ) Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com> Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-07-09 12:53:55 -07:00
Chen Zhang	a89209b78d	[v1] Support mamba2 (#19327 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-06-18 20:34:15 +00:00
Ning Xie	2f1c19b245	[CI] change spell checker from codespell to typos (#18711 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-11 19:57:10 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Dhia Eddine Rhaiem	20bd6f4d2e	[FalconH1] Fix output dtype in RMSNorm fallback path for Falcon-H1 (e.g. 0.5B) (#18500 ) Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae> Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>	2025-05-21 19:23:59 -07:00
Dhia Eddine Rhaiem	eca18691d2	[MODEL] FalconH1 (#18406 ) Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae> Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>	2025-05-21 04:59:06 -07:00
omahs	a9944aabfa	fix: typos (#18151 ) Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>	2025-05-15 02:16:15 -07:00
Harry Mellor	6223dd8114	Update deprecated type hinting in `model_executor/layers` (#18056 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 04:17:23 -07:00
Chih-Chieh Yang	18dd5e01f2	[Model] Mamba2 causal conv1d Refactor to Split Prefill and Decode Requests for Corresponding Kernels (#17146 ) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>	2025-05-06 17:59:30 -07:00
Chih-Chieh Yang	daefed052c	[Model] Reduce redundant computations in mamba2 blocks for Bamba-9B (#15423 ) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>	2025-04-10 19:07:07 +00:00

1 2

59 Commits