biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
tianshu-Michael-yu	741f4e046b	fix: align lfm2 thumbnail token counting with HF (#36707 )	2026-03-11 10:28:38 -07:00
Julien Denize	a5d06dc557	Add 320 dimension size support to MLA (#36161 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2026-03-11 10:21:22 -07:00
Harry Mellor	5efa206a8c	Fix `ExaoneMoeMTP` test that never ran in Transformers v4 (#36792 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 17:10:23 +00:00
Cyrus Leung	196802dfa6	[Misc] Clean up renderers (#36770 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-11 16:39:29 +00:00
Isotr0py	c84b519cf3	[Bugfix] Fix negative max_tokens when input prompt is too long (#36789 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-11 16:30:51 +00:00
Flora Feng	741ecf0630	[CI] Add bfcl tool call correctness eval (#36560 ) Signed-off-by: sfeng33 <4florafeng@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-03-11 12:27:36 -04:00
Robert Shaw	b7e5a588d8	[Bugfix] Fix DP/EP Shared Expert With Monolithic Kernels (#36061 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-03-11 16:07:14 +00:00
Richard Zou	822e250ab7	[torch.compile] Use FakeTensors instead of real GPU tensors for single-size compilation (#36093 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-11 16:07:09 +00:00
Hongxin Xu	bea02cdf93	Fix routed experts capture for hybrid models (Mamba + Attention) (#35744 ) Signed-off-by: arlenxu <arlenxu@tencent.com> Signed-off-by: xhx1022 <1737006628@qq.com> Co-authored-by: arlenxu <arlenxu@tencent.com>	2026-03-11 08:53:10 -07:00
Julien Denize	a3ea760ea5	Add 'none' reasoning effort to ChatCompletionRequest (#36238 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2026-03-11 15:45:34 +00:00
Harry Mellor	35db669f1d	Correct link to supported hardware on vllm.ai (#36798 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 08:43:28 -07:00
Julien Denize	afebeffbfb	Add support to Mistral large 3 eagle with dense layers (#36163 ) Signed-off-by: juliendenize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-11 15:42:56 +00:00
Jhao-Ting Chen	5573894737	Kimi k2.5 MLA based eagle3 (#36361 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Co-authored-by: Izzy Putterman <iputterman@nvidia.com>	2026-03-11 11:36:11 -04:00
Harry Mellor	d5816c8c2f	Fix tied weights in weight mapping test for Transformers v5 (#36788 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 15:10:26 +00:00
Woosuk Kwon	8ccbcda5c0	[Model Runner V2] Remove unused warmup_for_prefill method (#36762 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-11 08:02:44 -07:00
tvirolai-amd	a9e532afe2	[ROCm][Perf] Allow MTP lens > 1 in Sparse MLA (#36681 ) Signed-off-by: Teemu Virolainen <teemu.virolainen@amd.com>	2026-03-11 14:43:03 +00:00
Harry Mellor	f3163bba67	Disable docs build skipping until a better solution is found (#36790 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 13:53:23 +00:00
Martin Hickey	700a1ddc65	[Misc] Use envs module to get VLLM_DISABLED_KERNELS (#35776 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>	2026-03-11 13:37:46 +00:00
Silvia Colabrese	f33251ffc8	[Bugfix] Fix Mistral-small `--format` (#36782 ) Signed-off-by: 12010486 <silvia.colabrese@intel.com>	2026-03-11 04:47:52 -07:00
Wuxun Zhang	e584dce52b	Add XPU MLA Sparse backend for DeepSeek v3.2 (#33230 ) Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com>	2026-03-11 19:19:15 +08:00
Ning Xie	40c0461f24	[openapi] refactor render related openapi [3/N] (#36749 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-03-11 03:14:34 -07:00
Weiguang Li	724759684c	[Bugfix] Fix Qwen3-VL timestamp mismatch when using num_frames without fps (#36136 ) Signed-off-by: OiPunk <codingpunk@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 03:13:06 -07:00
Michael Goin	9c34e9d24f	Disable cascade attention by default (#36318 )	2026-03-11 03:12:23 -07:00
Richard Zou	09b6f99852	[compile] aot_compile should respect VLLM_DISABLE_COMPILE_CACHE (#36358 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-11 03:12:03 -07:00
Ethan T.	c87fb515ed	fix(lora): use replaced_module_name in pooling model name check (#36402 ) Signed-off-by: gambletan <ethanchang32@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 03:11:27 -07:00
Itay Alroy	5353c9b016	platforms: Fix Ray DP startup crash (#36665 ) Signed-off-by: Itay Alroy <ialroy@nvidia.com>	2026-03-11 03:08:55 -07:00
Angela Yi	13e79fc811	[ci] Update rtol for test_classification (#36556 ) Signed-off-by: angelayi <yiangela7@gmail.com> Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>	2026-03-11 03:08:16 -07:00
Rahul Tuli	9d07a3d6e4	Add: Eagle3 support for Qwen3.5 (#36658 ) Signed-off-by: Rahul-Tuli <rtuli@redhat.com>	2026-03-11 03:07:42 -07:00
Cyrus Leung	646b85544b	[Refactor] Remove Molmo2 processor wrapper (#36667 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-11 03:07:20 -07:00
tc-mb	4286cc5ec2	fix(minicpmv): fix audio inference by handling meta device in init_re… (#36751 ) Signed-off-by: caitianchi <caitianchi@modelbest.cn>	2026-03-11 03:06:28 -07:00
LoganJane	545d18d81b	[Bugfix] Support other quantization methods in glm41v (#36321 ) Signed-off-by: g00887675/loganJane <g00887675/loganJane73@hotmail.com> Co-authored-by: g00887675/loganJane <g00887675/loganJane73@hotmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-11 09:48:05 +00:00
roikoren755	e661b9ee83	[NemotronH] Small fix reasoning parser (#36635 ) Signed-off-by: Roi Koren <roik@nvidia.com>	2026-03-11 02:44:41 -07:00
YiSheng5	c910eeb125	[XPU]Bug fix for some unexpected error when use AgRs backend on XPU device. (#36593 ) Signed-off-by: yisheng <yi.sheng@intel.com>	2026-03-11 09:17:46 +00:00
Harry Mellor	f4ae58b38b	Remove unused config field from Gemma2 (#36672 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 01:51:19 -07:00
Isotr0py	e568cf88bc	[UX] Infer dtype for local checkpoint (#36218 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-11 08:50:04 +00:00
Nicolò Lucchesi	098d844731	[NIXL][1/N] Refactor `kernel_block_size` detection (#35752 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-11 01:11:23 -07:00
JartX	a40ee486f2	[Bugfix] Add Multiple of 16 block_size to triton fallback on rocm Attention to support qwen3_5 (#35923 ) Signed-off-by: JartX <sagformas@epdcenter.es> Co-authored-by: akaratza <akaratza@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-03-11 07:45:57 +00:00
pschlan-amd	eac2dc2b41	AITER MLA backend: Avoid CPU sync in _build_decode (#35765 ) Signed-off-by: Patrick Schlangen <pschlan@amd.com>	2026-03-11 07:25:00 +00:00
Flora Feng	d5080aeaa4	[Refactor] Remove deadcode in Responses API serving (#36726 ) Signed-off-by: sfeng33 <4florafeng@gmail.com> Co-authored-by: Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-11 07:11:41 +00:00
liuzhenwei	f22d6e0267	[Hardware][NIXL] set default kv buffer type for different platform (#36438 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-11 05:19:28 +00:00
Kunshang Ji	76c6e6da08	[XPU] Support block fp8 moe by fallback to TritonExpert on XPU (#36458 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-10 21:54:09 -07:00
typer-J	4184653775	feat: add RISC-V support for CPU backend (v2) (#36578 ) Signed-off-by: typer-J <2236066784@qq.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2026-03-10 21:51:39 -07:00
Sladyn	4aaaf8c8ce	feat(spec_decode): fuse EAGLE step slot mapping and metadata updates (#33503 ) Signed-off-by: sladynnunes <snunes@usc.edu>	2026-03-11 04:35:33 +00:00
Hongbin Guo	4bf533623b	[Doc] Fix duplicate words in comments (#36713 ) Signed-off-by: Hongbin10 <jdmjdm1998@163.com>	2026-03-10 21:28:31 -07:00
Matthew Bonanni	5f77ef15ae	[Misc][Attention] Clean up unused method in `CPU_ATTN` (#36673 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-10 21:27:22 -07:00
elvischenv	7d6abdd022	[Fix] Use torch.empty for output in attention+quant fusion (#31785 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2026-03-10 21:26:14 -07:00
Wentao Ye	a8ff2cca92	[Perf] Optimize scheduler overhead for PD disaggregation, around 5% E2E perf improvement (#35781 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-03-10 21:25:30 -07:00
tunglinwood	42fadebecb	[Model] Add support for moonshotai/Kimi-Audio-7B-Instruct (#36127 ) Signed-off-by: tunglinwood <tunglinwood@gmail.com> Signed-off-by: tunglinwood <tomwu.tunglin@gmail.com> Signed-off-by: tunglinwood <113751333+tunglinwood@users.noreply.github.com>	2026-03-10 21:24:48 -07:00
tianshu-Michael-yu	a197eda9c3	Add tuned H100 MoE configs for LFM2 8B and 24B (#36699 )	2026-03-10 21:22:02 -07:00
Kevin H. Luu	82b110d50e	[ci] Bound nvidia-cudnn-frontend version (#36719 ) Signed-off-by: khluu <khluu000@gmail.com>	2026-03-11 12:17:35 +08:00

1 2 3 4 5 ...

14730 Commits