biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Cyrus Leung	34a984274e	[Misc] Refactor tokenizer interface (#29693 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-29 04:02:21 -08:00
Cyrus Leung	389aa1b2eb	[Doc] Update more docs with respect to V1 (#29188 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-23 10:58:48 +08:00
Julien Denize	57430fc95c	Default model load/config/tokenizer to `mistral` format if relevant files exist (#28659 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-21 13:58:59 -08:00
Cyrus Leung	511a6b611d	[Config] Clean up SchedulerConfig initialization (#28665 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 22:41:02 +08:00
Li, Jiang	7f829be7d3	[CPU] Refactor CPU attention backend (#27954 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-12 09:43:06 +08:00
Asaf Joseph Gardin	00b31a36a2	[V1] [Hybrid] Mamba1 Automatic Prefix Caching (#26377 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-11-02 04:16:23 -08:00
Luciano Martins	e05a6754a8	[Model] Revert PR #26715 : Restore custom PaliGemma and Gemma3-MM impl… (#27309 ) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>	2025-10-22 10:05:34 -07:00
Cyrus Leung	8c017b3490	[Model] Always use Transformers backend for PaliGemma and Gemma3-MM (#26715 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-17 05:03:35 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Cyrus Leung	0f29dca988	[CI/Build] Fix model nightly tests (#26466 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-08 23:44:16 -07:00
Harry Mellor	6c04638214	Fix per file ruff ignores related to line length (#26262 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-06 05:12:40 +00:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Stan Wozniak	ea507c3a93	[V1] [Hybrid] Mamba2 Automatic Prefix Caching (#25752 ) Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com> Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-10-04 06:34:22 +02:00
Thomas Parnell	0e93ac0b3a	[CI] Fix distributed hybrid tests in CI (#26155 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-10-03 09:14:18 +00:00
Thomas Parnell	a903669e10	[V1] Remove V0 code paths for Hybrid models (#25400 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-09-23 08:26:13 -07:00
Woosuk Kwon	52c2a8d4ad	[V0 Deprecation] Remove LLMEngine (#25033 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-20 17:56:30 -07:00
Andrew Sansom	9a4600e4dc	[CORE] Prompt Embeddings Support for v1 Engine (#24278 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai> Signed-off-by: Andrew Sansom <qthequartermasterman@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-09-19 08:03:09 +08:00
Asaf Joseph Gardin	66072b36db	[Bugfix][Mamba] - Fix Conv State Kernel FP32 Support (#24883 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-09-18 12:21:17 +00:00
Woosuk Kwon	759ef49b15	Remove V0 Encoder-Decoder Support (#24907 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-09-15 21:17:14 -07:00
afeldman-nm	c8c42597ab	[CI] Speed up model unit tests in CI (#24253 ) Signed-off-by: Andrew Feldman <afeldman@redhat.com>	2025-09-12 10:36:50 -07:00
Li, Jiang	59d5d2c736	[CI/Build] Skip prompt embeddings tests on V1-only CPU backend (#24721 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-09-12 18:51:01 +08:00
Andrew Sansom	ddcec289c7	Fix implementation divergence for BLOOM models between vLLM and HuggingFace when using prompt embeds (#24686 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-09-12 04:35:48 +00:00
Russell Bryant	37e8182bfe	[v1] Add Whisper model support (encoder-decoder) (#21088 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: NickLucche <nlucches@redhat.com>	2025-09-10 13:53:35 -07:00
Didier Durand	46876dff32	[Doc]: fixing typos to improve docs (#24480 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-08 23:06:04 -07:00
Cyrus Leung	948dd3443b	[Bugfix] Fix Apertus HF repo name (#24447 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-08 21:40:29 -07:00
nopperl	fa4311d85f	[V1] v1 engine + full CUDA graph support for PLaMo2 (#23998 ) Signed-off-by: Hemmi Shinichi <shemmi@preferred.jp> Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com> Co-authored-by: Hemmi Shinichi <shemmi@preferred.jp> Co-authored-by: Thomas Parnell <tom.parnell@gmail.com>	2025-09-03 08:24:02 -07:00
Thomas Parnell	d328f7894f	[CI] Enable all hf transformers baselines in test_hybrid (#23936 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-09-02 20:15:06 +00:00
Didier Durand	fad73be1a5	[Doc]: fix typos in Python comments (#24077 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-02 02:38:55 -07:00
Asaf Joseph Gardin	2b41cbbf03	[V1][Mamba1] - FP32 SSM Kernel Support (#23506 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-09-01 20:53:00 -07:00
EduardDurech	1cf3753b90	[MODEL] `Apertus` and `XIELU` (#23068 ) Signed-off-by: EduardDurech <39579228+EduardDurech@users.noreply.github.com> Co-authored-by: AllenHaoHuang <allenhuangdd@gmail.com>	2025-08-29 20:29:18 +08:00
Asaf Joseph Gardin	853c371fc3	[V1][Mamba] - Enable V1 by default for Mamba Models (#23650 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-08-27 20:53:30 +00:00
Chen Zhang	2b4fc9bd9b	Support FlashAttention Backend for Hybrid SSM Models (#23299 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-26 12:41:52 +00:00
Paul Pak	2e2000f352	[Model] Add LFM2 architecture (#22845 ) Signed-off-by: Paul Pak <paulpak58@gmail.com>	2025-08-21 09:35:07 +02:00
Asaf Joseph Gardin	3663870c72	[V1][Mamba1] - Full CUDA and Piecewise CUDA Graphs Support (#23035 ) Signed-off-by: asafg <asafg@ai21.com> Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com> Co-authored-by: asafg <asafg@ai21.com>	2025-08-20 20:08:51 -07:00
汪志鹏	829bbd7882	[New Model]mBART model (#22883 ) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>	2025-08-16 12:16:58 +00:00
Thomas Parnell	75531a6c13	[V1] [Hybrid] Support using float32 for state in Hybrid Models (Mamba2, Mamba1, Minimax) (#22928 ) Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Daniel Afrimi <danielafrimi8@gmail.com> Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-08-15 12:57:06 +00:00
amirai21	fe91ce9591	[V1] - Split Prefill and Decode for Mamba1 models (#22653 ) Signed-off-by: amirk <amirk@ai21.com> Signed-off-by: asafg <asafg@ai21.com> Co-authored-by: asafg <asafg@ai21.com> Co-authored-by: Asaf Joseph Gardin <39553475+Josephasafg@users.noreply.github.com>	2025-08-15 08:59:52 +00:00
Woosuk Kwon	71683ca6f6	[V0 Deprecation] Remove multi-step scheduling (#22138 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-08-12 20:18:39 -07:00
Thomas Parnell	61f67d8acd	[V1] [Hybrid] Enable Full CUDA Graph (decode-only) for Mamba layers (#21401 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-09 20:16:11 -07:00
Thomas Parnell	1bf5e1f25b	[CI] [Hybrid] Speed up hybrid models test by removing large models (#22563 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-09 02:04:42 -07:00
Thomas Parnell	8a0ffd6285	Remove mamba_ssm from vLLM requirements; install inside test container using `--no-build-isolation` (#22541 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-08 23:05:32 -07:00
Asaf Joseph Gardin	46a13949d5	[v1] - Mamba1 Attention Metadata (#21249 ) Signed-off-by: asafg <asafg@ai21.com> Co-authored-by: asafg <asafg@ai21.com>	2025-08-06 17:03:42 -07:00
Jee Jee Li	a7b8788d2c	[Misc] Modify the organization of GLM series (#22171 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-03 23:51:20 -07:00
Reza Barazesh	37efc63b64	[V0 deprecation] Guided decoding (#21347 ) Signed-off-by: Reza Barazesh <rezabarazesh@meta.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-29 03:15:30 -07:00
Ning Xie	0df4d9b06b	[Misc] unify variable for LLM instance v2 (#21356 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-07-22 06:32:36 -07:00
Ning Xie	d97841078b	[Misc] unify variable for LLM instance (#20996 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-07-21 12:18:33 +01:00
Thomas Parnell	881e3cbe3b	[V1] [Hybrid] Enable piecewise CUDA Graph for mamba layers (#21194 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-07-19 19:27:21 +00:00
Thomas Parnell	3534c39a20	[V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli (#20840 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-07-15 04:04:35 -07:00
Isotr0py	0d21b2664c	[Bugfix] Fix OOM in language generation test (#20814 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-11 11:21:52 -07:00
Li, Jiang	7721ef1786	[CI/Build][CPU] Fix CPU CI and remove all CPU V0 files (#20560 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-07 22:13:44 -07:00

1 2

68 Commits