biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
7mile	b2ea5ba677	[Bugfix][Spec Decode] Fix wrong valid_mask for padded speculation when chunked prefill occurs (#26231 ) Signed-off-by: seven-mile <i@7li.moe> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-06 18:24:22 +00:00
Karan Goel	824a3f403f	[Misc] auto_tune: kill specific vllm process (#26304 ) Signed-off-by: Karan Goel <karangoel@google.com>	2025-10-06 18:02:51 +00:00
Rahul Tuli	05f6846ede	Support llama3 eagle3 head with llama4 verifier (#25961 ) Signed-off-by: rahul-tuli <rtuli@redhat.com> Signed-off-by: Rahul Tuli <rtuli@redhat.com>	2025-10-06 13:56:08 -04:00
Michael Goin	20db99cc69	[CI Bugfix] Make sure TRTLLM attention is available in test_blackwell_moe (#26188 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-06 13:50:11 -04:00
Yannick Schnider	6431be808f	[Tests] conftest: Extending VllmRunner and HfRunner to accept token_ids as input (#26295 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com> Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-06 17:19:34 +00:00
Matthew Bonanni	4727a8afa7	[Attention] Remove unused reorder_batch method (#24463 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-06 13:13:39 -04:00
tomeras91	b8f603cebe	[Model] EVS support for nano_nemotron_vl (#26269 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com> Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com> Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>	2025-10-07 00:23:37 +08:00
Chatcharin Sangbutsarakum	fc679696f8	Fix `DotsOCR` tensor type (#26281 ) Signed-off-by: what_in_the_nim <chatcharinsang@gmail.com>	2025-10-06 12:23:43 +00:00
Raushan Turganbay	ab5e7d93f4	[Bugfix] Fix mrope in Transformers Backend (#26087 ) Signed-off-by: raushan <raushan@huggingface.co> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-06 11:40:50 +00:00
Harry Mellor	0340f45553	Support expert parallel load balancing in Transformers backend (#26287 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-06 11:20:16 +00:00
Cyrus Leung	19a00eb210	[Model] Use `merge_by_field_config` for MM models (Llava family) (#26280 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-06 09:45:26 +00:00
Cyrus Leung	391612e78b	[Frontend] Consolidate tokenizer init code (#26276 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-06 09:34:52 +00:00
abhisheksheth28	77c95f72f7	[Doc] add KAITO to integrations (#25521 ) Signed-off-by: "Abhishek Sheth" <absheth@microsoft.com>	2025-10-06 17:30:03 +08:00
Aritra Roy Gosthipaty	59f30d0448	[Docs] Edit HF Inference Endpoints documentation (#26275 ) Signed-off-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Signed-off-by: ariG23498 <aritra.born2fly@gmail.com>	2025-10-06 10:13:09 +01:00
Roger Wang	43c146ca42	[Misc] Clean up unnecessary E501 ignore (#26274 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-10-06 07:29:18 +00:00
Yasmin Moslem	7c2ec0fe87	[Benchmarking] Add disable_shuffle option for dataset loading (#26258 ) Signed-off-by: Yasmin Moslem <48152713+ymoslem@users.noreply.github.com>	2025-10-06 07:05:44 +00:00
dependabot[bot]	039b6bade3	Bump actions/stale from 10.0.0 to 10.1.0 (#26272 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-10-06 07:01:21 +00:00
Harry Mellor	6c04638214	Fix per file ruff ignores related to line length (#26262 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-06 05:12:40 +00:00
wuhang	91ac7f764d	[CI][gpt-oss] Enable python tool tests in CI (#24315 ) Signed-off-by: wuhang <wuhang6@huawei.com>	2025-10-06 04:20:06 +00:00
Chen Zhang	4be7d7c1c9	[MISC] Add heheda12345 to CODEOWNERS of vllm/config/cache.py (#26270 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-06 10:58:59 +08:00
orangeng	59b477645c	[Doc] Edited minor typo (#26266 ) Signed-off-by: Orange Ng <ngquanhao@outlook.com>	2025-10-05 19:53:09 -07:00
Thomas Parnell	778f554157	[V1] [Hybrid] Some additional clean-up in Mamba2 prefix caching (#26222 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-10-06 10:40:30 +08:00
Thomas Parnell	d3c84297c3	[CI] Add comment about the single cudagraph capture size that is used (#26252 )	2025-10-06 02:35:37 +00:00
Elieser Pereira	f509a20846	[DOC] Update production-stack.md (#26177 ) Signed-off-by: Elieser Pereira <elieser.pereiraa@gmail.com>	2025-10-05 21:32:48 +00:00
Michael Goin	60bc25e74c	[CI] Add Blackwell LM Eval Small Models test to nightly (#26052 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-05 14:59:50 -06:00
Harry Mellor	b893d661b1	Fix per file ruff ignores related to simplification (#26259 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 20:31:53 +00:00
Jason Li	6b6e98775f	[NVIDIA] flashinfer TRTLLM attention prefill token limit (#25998 ) Signed-off-by: jasonlizhengjian <jason.li@centml.ai> Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com>	2025-10-05 14:24:37 -06:00
Jiangyun Zhu	9c3c21c519	[CI] fix mamba kernel test (#26250 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-10-05 18:26:59 +00:00
Harry Mellor	512b8affa4	Update `ruff` pre-commit hooks version (#26255 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-05 09:50:50 -07:00
Harry Mellor	1c0c68202c	Fix per file ruff ignores related to typing (#26254 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 16:37:55 +00:00
ihb2032	5f317530ec	fix(tests): Resolve late binding of loop variable in assert message lambda (#26249 ) Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn> Signed-off-by: ihb2032 <1355790728@qq.com	2025-10-05 09:18:22 -07:00
Harry Mellor	557b2e961d	Remove all cases of `fmt: on/off` (#26253 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 09:18:14 -07:00
Harry Mellor	4e256cadc2	Remove all references to `yapf` as it's no longer used (#26251 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 09:18:11 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Hank_	17edd8a807	[Platform][Kernel] platform-specific kernel loading (#25823 ) Signed-off-by: Hank <hcc.mayday@gmail.com>	2025-10-05 13:25:15 +02:00
ihb2032	3303cfb4ac	[Bugfix][Hardware][RISC-V] Limit supported dtypes to float32 to avoid scheduler segfault (#26228 ) Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn> Signed-off-by: ihb2032 <1355790728@qq.com>	2025-10-05 10:36:54 +00:00
Cyrus Leung	b7e8e4e6be	[Bugfix] Always apply MM processor even when no MM items are passed (#26240 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-05 10:10:20 +00:00
Simon Danielsson	432e1cbc23	[Bugfix]: Assertion error when using FlashInfer backend (#25933 ) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-05 16:46:36 +08:00
Jialin Ouyang	201c971e96	[Perf][Easy] Early stop in request_block_hasher (#26112 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-10-05 16:46:03 +08:00
Maximilien de Bayser	e0986ea07b	Add documentation for granite 4 tool calling (#26175 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-10-05 07:35:42 +00:00
Cyrus Leung	a964e5e6c3	[Bugfix] Allow `--skip-tokenizer-init` with `echo and return_token_ids` (#26238 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-05 05:38:53 +00:00
22quinn	78c1d5bfd2	[Easy] Add str repr for IterationStats (#26232 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-10-05 05:00:21 +00:00
Cyrus Leung	59a85c366e	[Model] Use `merge_by_field_config` for MM models (H-L) (#26230 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-05 11:54:17 +08:00
Cyrus Leung	119f00630b	[Renderer] Clean up renderer code (#26216 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-04 17:05:29 +00:00
Isotr0py	a42d2df75f	[Frontend] Cache chat template kwargs resolution (#26227 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-04 15:32:30 +00:00
Li, Jiang	5c057e068f	[CPU] Refine batch reorder of CPU attention backend (#26096 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-10-04 21:54:35 +08:00
Thomas Parnell	ed3aeb25a4	[V1] [Hybrid] Remove code to override default CUDA graph configuration (#26226 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-10-04 13:47:48 +00:00
yuafng	86ee949128	Fix tensor device and dtype placement in Qwen2VL model (#26219 ) Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Yuanfeng Li <yuanfengli@meta.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-10-04 06:41:39 -07:00
Cyrus Leung	4570535ec4	[Model] CLIP Embedding Support (#26010 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-04 06:21:42 -07:00
Nicolò Lucchesi	2a6dc67eb5	[Bugfix] Fix `_reqs_to_process` leak on abort (#26012 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-10-04 11:39:31 +00:00

1 2 3 4 5 ...

10184 Commits