biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
yugong333	ffe1fc7a28	Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005 ) Signed-off-by: Yu Gong <yu3.gong@gmail.com>	2026-02-02 12:30:06 -05:00
Harry Mellor	8b7346d5f1	Update huggingface-hub again (#33567 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-02 09:20:54 -08:00
Harry Mellor	6141ebe0dd	Remove incorrect tokenizer info test (#33565 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-02 17:11:44 +00:00
Yang Liu	199e3cb476	[Model] Use mm_position to compute mrope positions for GLM-4.xV (#33039 ) Signed-off-by: Yang <lymailforjob@gmail.com>	2026-02-02 16:55:48 +00:00
Matthew Bonanni	9f8cb81b44	[CI] Add DeepSeek V3.2 nightly eval (#33566 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-02-02 16:10:02 +00:00
Cyrus Leung	d7e17aaacd	[Refactor] Move profiling methods to MM budget (#33559 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-02 23:27:00 +08:00
Kebe	528e9b1490	[Feature][Core] Support Fabric detection to adapt the MNNVL protocol for the GB series (#33540 ) Signed-off-by: Kebe <mail@kebe7jun.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Thomas Vegas <tvegas@nvidia.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2026-02-02 22:55:46 +08:00
shanjiaz	d95b4be47a	move spec decode slow test to test_areas.yaml (#33365 ) Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>	2026-02-02 06:28:36 -08:00
Isotr0py	4061dcf4c5	[Bugfix] Enable Kimi k25 processor test (#33562 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-02 14:25:25 +00:00
danielafrimi	0aca8b8c62	[MoE] Enable Shared/Routed Overlap For Latent MoE (Nemotron-H) (#32790 ) Signed-off-by: dafrimi <dafrimi@nvidia.com>	2026-02-02 09:18:50 -05:00
Rabi Mishra	9eb58f8cf1	fix[ROCm]: Remove unconditional aiter import (#32902 ) Signed-off-by: rabi <ramishra@redhat.com>	2026-02-02 22:10:02 +08:00
Cyrus Leung	b10d05b8a8	[Model] Use explicit types in `get_generation_prompt` (#33551 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-02 12:38:49 +00:00
Borushiki	b398e5c819	Update get_expert_mapping to include self parameter (#33525 ) Signed-off-by: Borushiki <38628261+Otsutsukii@users.noreply.github.com>	2026-02-02 20:29:07 +08:00
Grzegorz K. Karch	78061ef584	Fix accessing hidden_act from model config (#32686 ) Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>	2026-02-02 11:11:33 +00:00
Nicolò Lucchesi	528b3076af	[CI][Bugfix] Fix flaky `tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_example_connector_consistency` (#33555 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-02 03:01:29 -08:00
Cyrus Leung	a502831d36	[Chore] Remove redundant input parsing methods (#33542 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-02 10:50:47 +00:00
Komal Kumar Teru	ba871fb788	[Misc] support arbitrary MM datasets in spec dec bench (#33486 ) Signed-off-by: kkt-cohere <komal@cohere.com> Signed-off-by: Komal Kumar Teru <162363718+kkt-cohere@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-02 08:49:48 +00:00
R3hankhan	ab374786c7	[CPU][IBM Z][Dockerfile] Fix IBM Z builds (#33243 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2026-02-01 23:41:29 -08:00
RED	808dd87b30	[Model] Support DeepSeek-OCR-2 (#33165 ) Signed-off-by: liuli <ll407707@alibaba-inc.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: liuli <ll407707@alibaba-inc.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-02 06:24:10 +00:00
Andy Lo	beb8899482	Fix mistral sliding window parsing (#33521 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2026-02-02 05:08:04 +00:00
Sawyer Bowerman	ce88756b96	[Doc]: update paths for Offline/Online/Others example sections (#33494 ) Signed-off-by: Sawyer Bowerman <sbowerma@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-02 03:56:53 +00:00
Paco Xu	a3154a6092	[Doc] add missing model entries in supported_models.md (#33220 ) Signed-off-by: Paco Xu <paco.xu@daocloud.io>	2026-02-02 03:37:25 +00:00
jack	7c036432fc	[Bugfix] GLM-4 tool parser: incremental string streaming (#33218 ) Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com> Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>	2026-02-02 11:13:31 +08:00
Robert Shaw	318b120766	[Nightly CI] Remove CT Model (#33530 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-02-01 19:09:09 -08:00
csy0225	c3b40dc3e7	[Models] Step-3.5-Flash (#33523 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: i-zhangmingming <i-zhangmingming@stepfun.com> Co-authored-by: xiewuxun <xiewuxun@stepfun.com> Co-authored-by: zetaohong <i-hongzetao@stepfun.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-02-02 10:21:18 +08:00
Yifan Qiao	a01ef3fa51	[Fix] prefix cache hit rate == 0 bug with gpt-oss style models (#33524 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2026-02-02 01:59:58 +00:00
Runkai Tao	7320ca3942	Add unpermute-aware fused MoE LoRA path (#32655 ) Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>	2026-02-02 09:46:09 +08:00
Nick Hill	cf0a99f84d	[ModelRunner V2] Support spec decode with structured outputs (#33374 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-02 00:19:59 +00:00
Nick Hill	e535d90deb	[ModelRunner V2] Misc minor simplifications and optimizations (#33467 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-01 22:17:14 +00:00
Komal Kumar Teru	0b225fb7b2	[Misc] skip target model mm emb in draft proposal step when draft is text-only (#33437 ) Signed-off-by: kkt-cohere <komal@cohere.com>	2026-02-01 21:13:35 +00:00
will b.	46b4a02794	Fix DeepSeek V2 RoPE initialization error (#33501 ) Signed-off-by: Eduardo Salinas <edus@microsoft.com> Signed-off-by: catswe <212922539+catswe@users.noreply.github.com> Co-authored-by: Eduardo Salinas <edus@microsoft.com>	2026-02-01 21:00:56 +00:00
shaharmor98	8869cd8ec1	Add MoE config for Super B200 TP2 (#33510 )	2026-02-01 18:48:37 +00:00
JartX	cd86fff38f	[BUGFIX] Fix hipErrorIllegalState in Qwen3-Omni during startup profiling allow inference Omni on ROCM (#33077 ) Signed-off-by: JartX <sagformas@epdcenter.es>	2026-02-01 13:36:25 +00:00
Maral	b5f8c3092d	[W8A8 Block Linear Refactor][1/N] Keep all quantization types into `QuantFP8` class. (#33047 ) Signed-off-by: maral <maralbahari.98@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-01 09:28:01 +00:00
Cyrus Leung	21997f45b1	[Redo] #33110 with threading limit (#33502 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: YunzhuLu <lucia.yunzhu@gmail.com>	2026-02-01 09:18:11 +00:00
Luka Govedič	672023877b	Change defaults for vllm bench startup (#33489 )	2026-01-31 23:46:01 -08:00
Zack Yu	754a8ca942	fix: only include Authorization header when OPENAI_API_KEY is set (#33488 ) Signed-off-by: zack041 <zackyu041@gmail.com>	2026-01-31 23:35:09 -08:00
Eduardo Salinas	302ecf64ff	[Models]: lfm2_siglip2 return intermediate encoder layers (#33370 ) Signed-off-by: Eduardo Salinas <edus@microsoft.com>	2026-02-01 06:17:49 +00:00
Cyrus Leung	b6bb2842cf	[Critical] Revert #33110 (#33500 )	2026-01-31 21:06:42 -08:00
Cyrus Leung	79b6ec6aab	[Bugfix] Fix inconsistent handling of cache reset (#33481 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-31 20:23:41 -08:00
Greg Pereira	d6416fdde9	pin LMCache to v0.3.9 or greater with vLLM v0.15.0 (#33440 ) Signed-off-by: greg pereira <grpereir@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-31 20:50:38 -07:00
Andreas Karatzas	0fb3157267	[ROCm][CI] Update huggingface-hub pin (#33492 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-01 02:51:54 +00:00
Cyrus Leung	a358e4dffe	[Refactor] Make Renderer an abstract class (#33479 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-01 10:36:30 +08:00
René Honig	079781177a	fix: Add SM120 (RTX Blackwell) support for FlashInfer CUTLASS NVFP4 MoE kernels (#33417 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2026-01-31 14:06:42 -08:00
Roy Wang	63c0889416	[Misc] Fix flashinfer related tests (#33462 ) Signed-off-by: esmeetu <jasonailu87@gmail.com>	2026-01-31 16:10:24 -05:00
smashyalts	1e86c802d4	Fix grammar (#33121 ) Signed-off-by: smashyalts <smashyalts@gmail.com>	2026-01-31 09:59:34 -08:00
linhaifeng	fedf64332e	[Bugfix]: Fix display errors in TORCH_CHECK messages (#32942 ) Signed-off-by: linhaifeng <1371675203@qq.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-31 09:48:48 -08:00
Xiao Yang	2238a12c13	[Misc] support collect_env for endpoint /server_info (#33246 ) Signed-off-by: yang.xiao <yang.xiao@daocloud.io>	2026-02-01 01:42:59 +08:00
Harry Mellor	ce0afe2451	Update `huggingface-hub` pin for the last time before Transformers v5 (#33473 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-31 09:14:24 -08:00
Cyrus Leung	88c3e114d8	[Refactor] Move MM data parsing outside processor (#33408 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-31 16:46:14 +00:00

... 3 4 5 6 7 ...

13728 Commits