biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Radu Salavat	e69c990c21	[Feature][CPU Backend]: Optimize ARM vectorization backend (#30329 ) Signed-off-by: Radu Salavat <radu.salavat@arm.com>	2026-02-02 20:17:56 -08:00
Richard Zou	5eac9a1b34	[torch.compile] Don't do the fast moe cold start optimization if there is speculative decoding (#33624 ) Signed-off-by: Richard Zou <zou3519@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-03 03:38:49 +00:00
Nathan Weinberg	1b60b45d0d	[CI/Build] add directions for CPU image upload to Docker Hub (#32032 ) Signed-off-by: Nathan Weinberg <nweinber@redhat.com> Signed-off-by: Nathan Weinberg <31703736+nathan-weinberg@users.noreply.github.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com>	2026-02-03 02:48:06 +00:00
Dezhan	4b3803d180	[BugFix] DPMetadata raises assert error for dense model (#32739 ) Co-authored-by: Dezhan Tu <dztu@meta.com>	2026-02-03 00:56:44 +00:00
Patrick von Platen	5019c59dd2	[Voxtral Realtime] Introduce global log mel max (#33574 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-02 17:01:47 -05:00
Lain	089cd4f002	fix cutlass_3x_gemm_fp8_blockwise on sm103a (#32224 ) Signed-off-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Pavani Majety <pmajety@nvidia.com>	2026-02-02 11:47:46 -08:00
Vasiliy Kuznetsov	0130223bd9	fix memory for online fp8 quantization with streaming weight load (#31914 ) Signed-off-by: vasiliy <vasiliy@fb.com>	2026-02-02 14:17:42 -05:00
Matthew Bonanni	5d1aef3004	[UX] Format attention backend log line (#33570 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-02-02 18:57:12 +00:00
yugong333	ffe1fc7a28	Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. (#32005 ) Signed-off-by: Yu Gong <yu3.gong@gmail.com>	2026-02-02 12:30:06 -05:00
Harry Mellor	8b7346d5f1	Update huggingface-hub again (#33567 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-02 09:20:54 -08:00
Harry Mellor	6141ebe0dd	Remove incorrect tokenizer info test (#33565 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-02 17:11:44 +00:00
Yang Liu	199e3cb476	[Model] Use mm_position to compute mrope positions for GLM-4.xV (#33039 ) Signed-off-by: Yang <lymailforjob@gmail.com>	2026-02-02 16:55:48 +00:00
Matthew Bonanni	9f8cb81b44	[CI] Add DeepSeek V3.2 nightly eval (#33566 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-02-02 16:10:02 +00:00
Cyrus Leung	d7e17aaacd	[Refactor] Move profiling methods to MM budget (#33559 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-02 23:27:00 +08:00
Kebe	528e9b1490	[Feature][Core] Support Fabric detection to adapt the MNNVL protocol for the GB series (#33540 ) Signed-off-by: Kebe <mail@kebe7jun.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Thomas Vegas <tvegas@nvidia.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2026-02-02 22:55:46 +08:00
shanjiaz	d95b4be47a	move spec decode slow test to test_areas.yaml (#33365 ) Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>	2026-02-02 06:28:36 -08:00
Isotr0py	4061dcf4c5	[Bugfix] Enable Kimi k25 processor test (#33562 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-02 14:25:25 +00:00
danielafrimi	0aca8b8c62	[MoE] Enable Shared/Routed Overlap For Latent MoE (Nemotron-H) (#32790 ) Signed-off-by: dafrimi <dafrimi@nvidia.com>	2026-02-02 09:18:50 -05:00
Rabi Mishra	9eb58f8cf1	fix[ROCm]: Remove unconditional aiter import (#32902 ) Signed-off-by: rabi <ramishra@redhat.com>	2026-02-02 22:10:02 +08:00
Cyrus Leung	b10d05b8a8	[Model] Use explicit types in `get_generation_prompt` (#33551 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-02 12:38:49 +00:00
Borushiki	b398e5c819	Update get_expert_mapping to include self parameter (#33525 ) Signed-off-by: Borushiki <38628261+Otsutsukii@users.noreply.github.com>	2026-02-02 20:29:07 +08:00
Grzegorz K. Karch	78061ef584	Fix accessing hidden_act from model config (#32686 ) Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>	2026-02-02 11:11:33 +00:00
Nicolò Lucchesi	528b3076af	[CI][Bugfix] Fix flaky `tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_example_connector_consistency` (#33555 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-02 03:01:29 -08:00
Cyrus Leung	a502831d36	[Chore] Remove redundant input parsing methods (#33542 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-02 10:50:47 +00:00
Komal Kumar Teru	ba871fb788	[Misc] support arbitrary MM datasets in spec dec bench (#33486 ) Signed-off-by: kkt-cohere <komal@cohere.com> Signed-off-by: Komal Kumar Teru <162363718+kkt-cohere@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-02 08:49:48 +00:00
R3hankhan	ab374786c7	[CPU][IBM Z][Dockerfile] Fix IBM Z builds (#33243 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2026-02-01 23:41:29 -08:00
RED	808dd87b30	[Model] Support DeepSeek-OCR-2 (#33165 ) Signed-off-by: liuli <ll407707@alibaba-inc.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: liuli <ll407707@alibaba-inc.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-02 06:24:10 +00:00
Andy Lo	beb8899482	Fix mistral sliding window parsing (#33521 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2026-02-02 05:08:04 +00:00
Sawyer Bowerman	ce88756b96	[Doc]: update paths for Offline/Online/Others example sections (#33494 ) Signed-off-by: Sawyer Bowerman <sbowerma@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-02 03:56:53 +00:00
Paco Xu	a3154a6092	[Doc] add missing model entries in supported_models.md (#33220 ) Signed-off-by: Paco Xu <paco.xu@daocloud.io>	2026-02-02 03:37:25 +00:00
jack	7c036432fc	[Bugfix] GLM-4 tool parser: incremental string streaming (#33218 ) Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com> Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>	2026-02-02 11:13:31 +08:00
Robert Shaw	318b120766	[Nightly CI] Remove CT Model (#33530 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-02-01 19:09:09 -08:00
csy0225	c3b40dc3e7	[Models] Step-3.5-Flash (#33523 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: i-zhangmingming <i-zhangmingming@stepfun.com> Co-authored-by: xiewuxun <xiewuxun@stepfun.com> Co-authored-by: zetaohong <i-hongzetao@stepfun.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-02-02 10:21:18 +08:00
Yifan Qiao	a01ef3fa51	[Fix] prefix cache hit rate == 0 bug with gpt-oss style models (#33524 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2026-02-02 01:59:58 +00:00
Runkai Tao	7320ca3942	Add unpermute-aware fused MoE LoRA path (#32655 ) Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>	2026-02-02 09:46:09 +08:00
Nick Hill	cf0a99f84d	[ModelRunner V2] Support spec decode with structured outputs (#33374 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-02 00:19:59 +00:00
Nick Hill	e535d90deb	[ModelRunner V2] Misc minor simplifications and optimizations (#33467 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-01 22:17:14 +00:00
Komal Kumar Teru	0b225fb7b2	[Misc] skip target model mm emb in draft proposal step when draft is text-only (#33437 ) Signed-off-by: kkt-cohere <komal@cohere.com>	2026-02-01 21:13:35 +00:00
will b.	46b4a02794	Fix DeepSeek V2 RoPE initialization error (#33501 ) Signed-off-by: Eduardo Salinas <edus@microsoft.com> Signed-off-by: catswe <212922539+catswe@users.noreply.github.com> Co-authored-by: Eduardo Salinas <edus@microsoft.com>	2026-02-01 21:00:56 +00:00
shaharmor98	8869cd8ec1	Add MoE config for Super B200 TP2 (#33510 )	2026-02-01 18:48:37 +00:00
JartX	cd86fff38f	[BUGFIX] Fix hipErrorIllegalState in Qwen3-Omni during startup profiling allow inference Omni on ROCM (#33077 ) Signed-off-by: JartX <sagformas@epdcenter.es>	2026-02-01 13:36:25 +00:00
Maral	b5f8c3092d	[W8A8 Block Linear Refactor][1/N] Keep all quantization types into `QuantFP8` class. (#33047 ) Signed-off-by: maral <maralbahari.98@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-01 09:28:01 +00:00
Cyrus Leung	21997f45b1	[Redo] #33110 with threading limit (#33502 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: YunzhuLu <lucia.yunzhu@gmail.com>	2026-02-01 09:18:11 +00:00
Luka Govedič	672023877b	Change defaults for vllm bench startup (#33489 )	2026-01-31 23:46:01 -08:00
Zack Yu	754a8ca942	fix: only include Authorization header when OPENAI_API_KEY is set (#33488 ) Signed-off-by: zack041 <zackyu041@gmail.com>	2026-01-31 23:35:09 -08:00
Eduardo Salinas	302ecf64ff	[Models]: lfm2_siglip2 return intermediate encoder layers (#33370 ) Signed-off-by: Eduardo Salinas <edus@microsoft.com>	2026-02-01 06:17:49 +00:00
Cyrus Leung	b6bb2842cf	[Critical] Revert #33110 (#33500 )	2026-01-31 21:06:42 -08:00
Cyrus Leung	79b6ec6aab	[Bugfix] Fix inconsistent handling of cache reset (#33481 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-31 20:23:41 -08:00
Greg Pereira	d6416fdde9	pin LMCache to v0.3.9 or greater with vLLM v0.15.0 (#33440 ) Signed-off-by: greg pereira <grpereir@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-31 20:50:38 -07:00
Andreas Karatzas	0fb3157267	[ROCm][CI] Update huggingface-hub pin (#33492 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-01 02:51:54 +00:00

... 16 17 18 19 20 ...

14386 Commits