biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Nick Hill	876a16f4fb	[ModelRunner V2] Fix spec decoding + logprobs (#33391 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-31 03:33:26 +00:00
Matthew Bonanni	aaa901ad55	[Attention] Move MLA `forward` from backend to layer (#33284 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-30 19:30:00 -08:00
Wentao Ye	010ec0c30e	[Deprecation] Deprecate `seed_everything` and `scatter_mm_placeholders` in v0.15 (#33362 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-31 02:54:16 +00:00
Alberto Ferrer	64a40a7ab4	[Bugfix] Fix typo in read_offset variable name (#33426 ) Signed-off-by: Alberto Ferrer <albertof@barrahome.org>	2026-01-31 01:26:15 +00:00
Gregory Shtrasberg	31aedfe7d6	[Bugfix][ROCm] Fixing the skinny gemm dispatch logic from #32831 (#33366 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-01-30 19:05:23 -06:00
Michael Goin	67ebaff528	Refactor NVFP4 Linear utils for ModelOpt and CT (#33201 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-30 16:37:42 -08:00
Chendi.Xue	2b465570e6	[CI][HPU]accelerate hpu test by skip python re-install and clean container name (#33286 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2026-01-30 21:36:29 +00:00
Huy Do	9ca66ecc10	Indicate compile mode in the benchmark results (#32990 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2026-01-30 15:34:36 -05:00
Pavani Majety	c3a9752b0c	[Hardware][SM100] Add TRTLLM Kernel for INT4 W4A16 Kernel. (#32437 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2026-01-30 10:30:46 -08:00
xuebwang-amd	f451b4558b	[Quantization][ROCm] Fix MoE weight loading to be robust (Qwen3_MoE/Qwen3_next as example models) (#33173 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com>	2026-01-30 17:50:23 +00:00
Vasiliy Kuznetsov	3f96fcf646	fix QERL attention import path (#33432 ) Signed-off-by: vasiliy <vasiliy@fb.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-30 09:29:09 -08:00
Yanan Cao	6c1f9e4c18	[Kernel] [Helion] [1/N] Add Helion ConfigManager (#32740 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-01-30 12:19:19 -05:00
Harry Mellor	67239c4c42	Fix encoder-decoder model disabling mm processor cache (#33236 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-30 16:30:10 +00:00
Nicolò Lucchesi	8ece60768f	[CI] Qwen3-ASR transcriptios tests (#33414 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-30 16:17:56 +00:00
Michael Goin	fd0e377244	Support FP8 block quant for CompressedTensorsW8A16Fp8 (#33280 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-30 11:15:20 -05:00
Kyle Sayers	f857a03f6b	[QeRL] Layerwise Reloading (#32133 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2026-01-30 08:50:05 -07:00
Danielle Robinson	74898a7015	[BugFix][LoRA] TritonExperts is ModularMoEPath for FP8 models (#33393 ) Signed-off-by: Danielle Robinson <dmmaddix@amazon.com> Co-authored-by: Danielle Robinson <dmmaddix@amazon.com>	2026-01-30 15:27:42 +00:00
Frank Wang	8f5d51203b	Disable Cascade Attention for Batch Invariance (#32561 ) Signed-off-by: frankwang28 <frank.wbb@hotmail.com> Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-30 10:00:46 -05:00
Julien Denize	ae5b7aff2b	Improve Mistral format checks. (#33253 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: juliendenize <julien.denize@mistral.ai> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-30 06:23:33 -08:00
Harry Mellor	a11bc12d53	Fix `test_moe.py` for Transformers v5 (#33413 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-30 14:03:25 +00:00
Nathan Weinberg	58cb55e4de	[Doc] Enhance documentation around CPU container images (#32286 ) Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2026-01-30 13:36:20 +00:00
杨朱 · Kiki	cf896ae0e3	[Misc] Clean up HIDDEN_DEPRECATED_METRICS after metric removal (#33323 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 13:31:17 +00:00
Harry Mellor	c5113f60f2	Remove deprecated `reasoning_content` message field (#33402 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-30 11:48:15 +00:00
vllmellm	174f16700b	[Doc] [ROCm] Update Documentation to reflect v0.15.0 release (#33388 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-01-30 19:06:08 +08:00
Julien Denize	8e2ad97ad0	[BUGFIX] Pixtral cannot be loaded with --limit-mm-per-prompt 0 (#33406 ) Signed-off-by: juliendenize <julien.denize@mistral.ai>	2026-01-30 02:52:02 -08:00
Patrick von Platen	10152d2194	[Realtime API] Adds minimal realtime API based on websockets (#33187 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-30 18:41:29 +08:00
杨朱 · Kiki	1a7894dbdf	[Misc] Replace Optional[X] with X \| None syntax (#33332 ) Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 01:56:59 -08:00
Cyrus Leung	c87eac18f7	[Refactor] Move MM item count validation outside of processor (#33396 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-30 09:27:31 +00:00
tianshu-Michael-yu	f45870b53f	fix: allow LFM2 MoE prefix caching (align) (#33376 ) Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>	2026-01-30 08:23:14 +00:00
hujiaxin0	ba45bedfd1	[model] Add support for openPangu7B-VL (#32449 ) Signed-off-by: hujiaxin <524446785@qq.com> Signed-off-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com> Co-authored-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com>	2026-01-30 15:54:27 +08:00
Harry Mellor	9432ed8c7e	Explicitly set `return_dict` for `apply_chat_template` (#33372 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-30 07:27:04 +00:00
Lucas Kabela	726d89720c	[CI] Enable mypy import following for `vllm/spec_decode` (#33282 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-01-30 06:43:32 +00:00
Harry Mellor	d334dd26c4	Move decode context parallel validationn to `ParallelConfig` (#33239 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-30 06:18:41 +00:00
Ryan Rock	070c811d6f	[CI][AMD] Skip 4 GPUs testgroup ray tests (#33305 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-01-29 21:39:53 -08:00
Isotr0py	8bfc8d5600	[Models] Refactor Kimi-K2.5 weight loading (#33346 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-30 05:31:20 +00:00
Harry Huang	ec51831a22	[BugFix] Disable async scheduling for Mamba prefix caching (#33352 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>	2026-01-30 04:40:19 +00:00
Harry Mellor	80b918f2bd	Fix `tie_word_embeddings` for multimodal models in Transformers v5 (#33359 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-30 03:37:39 +00:00
Wang Haoyu	c46b0cd0af	[Model][Multimodal] Add explicit MusicFlamingo adapter (#32696 ) Signed-off-by: WangHaoyuuu <mailwhaoyu@gmail.com>	2026-01-30 11:01:29 +08:00
Aidan Reilly	133765760b	[Docs] Adding links and intro to Speculators and LLM Compressor (#32849 ) Signed-off-by: Aidan Reilly <aireilly@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> v0.16.0rc0	2026-01-29 14:12:35 -08:00
Michael Goin	bfb9bdaf3f	[Bugfix] Enable Triton MoE for FP8 per-tensor dynamic (#33300 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-29 12:15:17 -08:00
Kevin H. Luu	2284461d02	[release] Minor fixes to release annotation and wheel upload (#33129 ) Signed-off-by: khluu <khluu000@gmail.com>	2026-01-29 12:09:35 -08:00
danisereb	8e2a469b3b	Add Triton fused MoE config for B200 (Nemotron Nano) (#32804 )	2026-01-29 19:21:33 +00:00
CarstyYou	23591e631e	[Bugfix][Kernel] Fix negative memory offset in GDN Triton kernel (#33326 ) Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>	2026-01-29 10:40:11 -08:00
Linda	0493d897c4	[NVIDIA] [feat] Integrate flashinfer Trtllmgen bf16 moe (#32954 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2026-01-29 10:00:13 -08:00
Chendi.Xue	8c8ebeb941	[BUGFIX][XPU] fix memory check after XPU reuse GPU_worker (#33358 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2026-01-29 09:56:30 -08:00
Cyrus Leung	831453fcef	[Chore] Move `MediaConnector` to `vllm.multimodal.media` (#33324 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-29 16:54:31 +00:00
Angela Yi	5a66c9cc76	[ez] Delete torch25_custom_graph_pass (#33287 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-01-29 16:47:05 +00:00
Isotr0py	5e73e4900c	[Bugfix] Fix broken GLM-OCR initialization (#33350 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-29 07:56:05 -08:00
Cyrus Leung	c6e7404cc5	[Multimodal] Simplify MM input definitions (#33331 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-29 13:32:04 +00:00
sthWrong	17b17c0684	[Backport] [Kimi-K2.5] Replace torch.cuda with current_platform for d… (#33320 )	2026-01-29 12:29:17 +00:00

1 2 3 4 5 ...

13450 Commits