biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Harry Mellor	d6249d0699	Fix typing for `safetensors_load_strategy` (#24641 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-11 10:41:39 +00:00
shengshiqi-google	41329a0ff9	[Core] feat: Add --safetensors-load-strategy flag for faster safetensors loading from Lustre (#24469 ) Signed-off-by: Shiqi Sheng <shengshiqi@google.com> Signed-off-by: shengshiqi-google <160179165+shengshiqi-google@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-10 23:10:01 -07:00
Hanjie Qiu	dcb28a332b	[Kernel] Flashinfer MLA (trtllm-gen) decode kernel integration (#21078 ) Signed-off-by: hjjq <hanjieq@nvidia.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-10 15:31:10 -07:00
Xingyu Liu	9fb74c27a7	[Core] Support configuration parsing plugin (#24277 ) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com> Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-10 11:32:43 -07:00
pwschuurman	4377b1ae3b	[Bugfix] Update Run:AI Model Streamer Loading Integration (#23845 ) Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai> Signed-off-by: Peter Schuurman <psch@google.com> Co-authored-by: Omer Dayan (SW-GPU) <omer@run.ai> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-09 21:37:17 -07:00
Zebing Lin	82dfb12e52	[Core] Use sha256 bytes instead of BlockHash to reduce GC overhead (#23673 ) Signed-off-by: linzebing <linzebing1995@gmail.com>	2025-09-08 21:34:37 -07:00
Didier Durand	f4962a6d55	[Doc]: fix typos in Python comments (#24417 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-08 00:22:16 -07:00
Woosuk Kwon	4172235ab7	[V0 deprecation] Deprecate V0 Neuron backend (#21159 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-06 16:15:18 -07:00
yzds	ac201a0eaf	[Feature] Support Decode Context Parallel (DCP) for MLA (#23734 ) Signed-off-by: hongchao <hongchao@msh.team> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: hongchao <hongchao@msh.team> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-09-06 13:24:05 +08:00
Lucas Wilkinson	402759d472	[Attention] FlashAttn MLA (#14258 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2025-09-04 02:47:59 -07:00
Isotr0py	d7fbc6ddac	[Misc] Enable V1 FP16 inference on pre-Ampere GPUs (#24022 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-01 08:12:22 +00:00
Christian Pinto	1cb39dbcdd	[Misc] IO Processor plugins for pooling models (#22820 ) Signed-off-by: Christian Pinto <christian.pinto@ibm.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com>	2025-08-31 23:07:12 -07:00
Maximilien de Bayser	2554b27baa	[V0 Deprecation] Remove pooling model support in V0 (#23434 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-29 00:04:02 -07:00
Didier Durand	d3da2eea54	[Doc]: fix typos in Python scripts (#23828 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-08-28 05:37:38 -07:00
Asaf Joseph Gardin	853c371fc3	[V1][Mamba] - Enable V1 by default for Mamba Models (#23650 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-08-27 20:53:30 +00:00
Harry Mellor	513c1fe255	Only run `get_attr_docs` if generating help text (#23723 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-27 13:55:12 +00:00
Cyrus Leung	69244e67e6	[Core] Use key-only cache for `BaseMultiModalProcessor` (#23018 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-27 14:19:13 +08:00
Cyrus Leung	50fede6634	[V1] Enable V1 for compute capability < 8.0 + FP32 (#23614 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-26 03:00:18 -07:00
rongfu.leng	1b9b16649c	[Misc] update dict parse to EPLBConfig from json dumps to dict unpacking (#23305 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-08-24 08:06:34 +00:00
Didier Durand	22cf679aad	[Doc]: fix various typos in multiple files (#23179 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-08-22 10:38:46 -07:00
Matthew Bonanni	19fe1a0510	[Kernel] Add FP8 support with FlashMLA backend (#22668 ) Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-08-22 02:26:32 +00:00
22quinn	480bdf5a7b	[Core] Support custom executor qualname (#23314 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-22 09:40:54 +08:00
Robert Shaw	c8e33c72c6	[V1] Remove unnecessary check for main thread (#23298 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2025-08-21 14:08:35 +00:00
22quinn	f571ff8eb6	[Sampler] Support returning final logprobs (#22387 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-20 21:28:32 -07:00
Michael Goin	bbea1cefdd	[CI Bugfix] Fix CI by fully removing --enable-prompt-adapter (#23284 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-20 17:18:12 -07:00
rongfu.leng	4fbda0b20c	[Feature] use --eplb_config to set eplb param (#20562 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: rongfu.leng <lenronfu@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-20 14:07:28 -07:00
Cyrus Leung	5efd6905bc	[CLI][Doc] Formalize `--mm-encoder-tp-mode` (#23190 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-20 23:42:28 +08:00
rongfu.leng	38217877aa	[Fix] fix offline env use local mode path (#22526 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-08-20 13:34:49 +00:00
Nikhil Suryawanshi	78dba404ad	[Hardware][IBM Z]Enable v1 for s390x and s390x dockerfile fixes (#22725 ) Signed-off-by: Nikhil Suryawanshi <suryawanshin74@gmail.com>	2025-08-19 04:40:37 +00:00
afeldman-nm	bf7f470b22	[V1] Logits processors extensibility (#19912 ) Signed-off-by: Andrew Feldman <afeldman@redhat.com> Signed-off-by: Andrew Feldman <afeld2012@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Andrew Feldman <afeld2012@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-16 12:59:17 -07:00
Maximilien de Bayser	52ce1420e9	Fix handling of `max_num_batched_tokens` for pooling tasks (#23004 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-08-16 17:36:30 +00:00
Thomas Parnell	75531a6c13	[V1] [Hybrid] Support using float32 for state in Hybrid Models (Mamba2, Mamba1, Minimax) (#22928 ) Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Daniel Afrimi <danielafrimi8@gmail.com> Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-08-15 12:57:06 +00:00
Roger Wang	49252cf59e	[MM] Allow skipping memory profiling for multimodal models. (#22950 ) Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-15 11:41:38 +00:00
Cyrus Leung	dbe298046c	[Bugfix] Fix parsing of `--disable-mm-preprocessor-cache` (#22909 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-14 08:09:44 -07:00
Woosuk Kwon	71683ca6f6	[V0 Deprecation] Remove multi-step scheduling (#22138 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-08-12 20:18:39 -07:00
Harry Mellor	4fbd8bb597	Fix passing `SpeculativeConfig` from the CLI (#22652 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-11 22:13:32 -07:00
wang.yuqi	84cf78acee	[Model] Pooling models default to using chunked prefill & prefix caching if supported. (#20930 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-08-11 09:41:37 -07:00
Harry Mellor	c49848396d	Refactor sliding window configuration to Transformers best practice (#21927 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-09 20:50:48 -07:00
Harry Mellor	56186474f6	[Docs] Reduce noise in docs and `--help` from the JSON tip (#22567 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-09 08:31:32 -07:00
Harry Mellor	e3edc0a7a8	Extract `CompilationConfig` from `config.py` (#22524 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-08 16:34:25 -07:00
Harry Mellor	7e3a8dc906	Remove `from_dict` from `SpeculativeConfig` (#22451 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-07 10:13:04 -07:00
Cyrus Leung	139d155781	[Frontend] Use engine argument to control MM cache size (#22441 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-07 09:47:10 -07:00
Cyrus Leung	766bc8162c	[Core] Store only the keys for multi-modal data in P0 (#22198 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-07 01:45:04 -07:00
Giancarlo Delfin	469b3ffaaa	[V1] port xformers backend to v1 (#21342 ) Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>	2025-08-05 10:04:46 -07:00
Giancarlo Delfin	aa7012eb6d	Add tree attention backend for v1 (part 1) (#20401 ) Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>	2025-08-03 22:13:26 -07:00
H	24d1dffbeb	[executor] feat: add supports_pp attr to executors (#21786 ) Signed-off-by: Haibin Lin <haibin.lin@bytedance.com>	2025-08-03 18:04:45 +08:00
Rui Qiao	4ac8437352	[Misc] Getting and passing ray runtime_env to workers (#22040 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-08-01 23:54:40 -07:00
Harry Mellor	2d7b09b998	Deprecate `--disable-log-requests` and replace with `--enable-log-requests` (#21739 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 17:16:37 +01:00
Harry Mellor	fb0e0d46fc	Fix `get_kwargs` for case where type hint is `list[Union[str, type]]` (#22016 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 05:26:42 -07:00
Dipika Sikka	dfbc1f8880	[Speculative Decoding] Add `speculators` config support (#21345 )	2025-08-01 08:25:18 -04:00

1 2 3 4 5 ...

469 Commits