biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Nishidha Panpaliya	bd2b52fc2d	[CPU][Bugfix] Fix ppc64le CPU build (#30871 ) Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>	2025-12-19 12:26:35 +00:00
Li, Jiang	420ba2dbb6	Enable aarch64 CPU performance benchmarks (#26494 ) Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com> Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com> Co-authored-by: Ioana Ghiban <ioana.ghiban@arm.com> Co-authored-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-12-19 12:16:18 +00:00
Marko Rosenmueller	455949675d	[Frontend][Bug] allow tool calls in analysis channel (#28139 ) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-12-19 10:47:44 +00:00
lif	086b96339f	[Bugfix] Add validation for tool requests when tool_parser is unavailable (#30613 ) Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-19 18:23:28 +08:00
Jinzhen Lin	9187de9fac	[Quantization] enable compressed-tensors marlin support for turing (2) (#31008 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>	2025-12-19 08:56:35 +00:00
Isotr0py	ac1c934276	[Bugfix] Fix incorrect tiles creation for mm prefix triton attention (#30974 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-19 16:00:33 +08:00
Wenqi Glantz	4924ac582c	Add hidden dimension validation for multimodal embedding inputs (#30968 ) Signed-off-by: Wenqi Glantz <wglantz@nvidia.com>	2025-12-19 07:59:36 +00:00
Li, Jiang	096b25c9ed	[Doc][CPU] Fix index link for CPU regular release wheels (#31015 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-12-19 07:29:52 +00:00
Jinzhen Lin	de08b8f61b	[Quantization] enable compressed-tensors marlin support for turing (#31000 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>	2025-12-18 20:29:48 -08:00
Nick Hill	2ac85a4544	[BugFix] Fix logprobs with spec decode and modified logits (#30846 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-18 19:58:28 -08:00
Andreas Karatzas	7b43db210c	[ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements (#30270 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-19 02:17:27 +00:00
PlatinumGod	6a09612b2e	[Bugfix] Fix tool_choice="none" being ignored by GPT-OSS/harmony models (#30867 ) Signed-off-by: yujiepu <pyjapple@gmail.com> Signed-off-by: PlatinumGod <pyjapple@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> v0.14.0rc0	2025-12-19 09:34:27 +08:00
Nick Hill	45c0526ac9	[BugFix] Handle errors when preprocessing added requests (#30895 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-19 01:29:11 +00:00
Benjamin Chislett	d6b3d39b6d	[Cleanup] Refactor FlashInferMetadataBuilder (#29128 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-18 14:45:30 -08:00
Chendi.Xue	6ca74bc11a	[NIXL][BUG FIX] Fix both failing issue and accuracy issue with nixl + host_buffer on CUDA (#30419 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-12-18 22:10:02 +00:00
Harry Mellor	19c583398a	Check for truthy `rope_parameters` not the existence of it (#30983 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-18 13:59:10 -08:00
Nick Hill	b0b77c4655	[BugFix] Fix spec decode + structured outputs + preemption edge case (#30916 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-18 12:59:55 -08:00
Kayvan Mivehnejad	634a14bd7d	Strengthen input validation and tests for 'parse_raw_prompts’. (#30652 ) Signed-off-by: Kayvan Mivehnejad <K.Mivehnejad@gmail.com>	2025-12-18 19:51:58 +00:00
Chen Zhang	24b65eff0d	[BugFix] Spec decode with VLLM_ENABLE_V1_MULTIPROCESSING=0 (#30319 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-12-18 19:47:56 +00:00
Elizabeth Thomas	41b6f9200f	Remove all2all backend envvar (#30363 ) Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-18 19:46:28 +00:00
Wentao Ye	97000a2be7	[Bug] Fix compressed tensor not using deepgemm (#30820 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-18 14:45:55 -05:00
Isotr0py	d2dc5dfc6e	[Bugfix] Remove `tile_size=64` for mm_prefix triton attention (#30973 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-18 20:42:32 +01:00
navmarri14	b8c477c115	tuned fused configs for B300 (#30629 )	2025-12-18 11:41:59 -08:00
jiahanc	53ad423f26	[Perf] enable flashinfer rotary_embedding custom ops in DeepSeek rotary (#30729 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-12-18 14:31:18 -05:00
wz1qqx	889f8bb250	[BugFix]Reclaim resources to prevent memory leaks when use LMCacheMPConnector (#30745 ) Signed-off-by: wz1qqx <ziqi.wang@novita.ai> Co-authored-by: wz1qqx <ziqi.wang@novita.ai>	2025-12-18 19:09:51 +00:00
Fanli Lin	058926d48c	[XPU] allow custom workers (e.g. vllm-omni workers) to be used on XPU (#30935 ) Signed-off-by: Fanli Lin <fanli.lin@intel.com>	2025-12-18 10:16:36 -08:00
Isotr0py	700a5ad6c6	[MM Encoder]: Migrate legacy ViT `MultiHeadAttention` to new `MMEncoderAttention` interface (#30684 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-19 02:04:19 +08:00
Alec	62be3670cb	[BugFix] Add sleep to fix tight loop and release GIL (#29476 ) Signed-off-by: alec-flowers <aflowers@nvidia.com> Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-18 09:52:55 -08:00
inkcherry	500f26e6d3	[Bugfix] fix DP-aware routing in OpenAI API requests (#29002 ) Signed-off-by: inkcherry <mingzhi.liu@amd.com>	2025-12-18 09:50:42 -08:00
Nick Hill	686cbaac64	[Cleanup] Remove unused ModelRunner V1 `InputBatch.num_tokens` field (#30218 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-18 09:17:00 -08:00
Vasiliy Kuznetsov	f4ee2c3d90	fix fp8 online quantization streaming with tp > 1 (#30900 ) Signed-off-by: vasiliy <vasiliy@fb.com>	2025-12-18 11:45:15 -05:00
Xin Yang	9a5e96523b	[LoRA] Set default MXFP4 LoRA backend to Marlin (#30598 ) Signed-off-by: Xin Yang <xyangx@amazon.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-18 08:42:22 -08:00
wzyrrr	326e7c3105	[Doc] Add Sophgo TPU Support (#30949 ) Co-authored-by: zhaoyang.wang <zhaoyang.wang@sophgo.com>	2025-12-18 16:29:33 +00:00
Lucas Kabela	0db5439ded	[Bugfix][torch2.10] Fix test_qwen2_5_vl_compilation with 2.10 RC (#30822 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-18 08:23:31 -08:00
sarathc-cerebras	28d15ab56b	adds jais 2 support (#30188 ) Signed-off-by: sarathc-cerebras <sarath.chandran@cerebras.net> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-12-18 15:46:58 +00:00
Wentao Ye	6628758233	[Bug] Fix batch invariant in torch 2.10 (#30907 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-18 07:27:51 -08:00
zhrrr	eee600c34f	[Misc] support nsys profile for bench latency (#29776 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>	2025-12-18 14:52:20 +00:00
Michael Goin	100f93d2be	Filter safetensors files to download if .safetensors.index.json exists (#30537 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-18 14:51:17 +00:00
vllmellm	96bf50a2c0	[ROCm] Serving Fails on Radeon Due to AITER Dtype Import (#30952 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-12-18 11:47:46 +00:00
Li, Jiang	f90d3636e2	[Bugfix][CPU] Fix Mac CPU build (#30955 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-12-18 01:38:22 -08:00
Ming Yang	8372be2828	[moe] Use enable_chunking func (to support disabling chunking) (#29935 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-12-18 09:02:38 +00:00
Andreas Karatzas	8da6ae49c3	[ROCm][Bugfix] Fix `fa_version` argument error in `flash_attn_maxseqlen_wrapper` for ROCm without aiter (#30909 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-18 16:45:51 +08:00
Lucas Wilkinson	30bb19a760	[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) (#30910 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-17 23:50:15 -08:00
Chauncey	aa7e836055	[Bugfix] Fix Unicode issues in GLM-4 tool calling (#30920 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-12-18 07:12:17 +00:00
Andreas Karatzas	be2ad5f920	[ROCm][Bugfix] fix(structured_output): Skip guidance backend for schemas with patternProperties (#30730 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-18 07:04:57 +00:00
wangxiyuan	a85724bd6e	[Platform] Let EPD work with non-cuda platform (#30225 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-18 06:45:29 +00:00
Yifan Qiao	11a89cf95c	[Fix][FlexAttention] return max logical block index to handle reused blocks (#30915 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2025-12-18 06:42:21 +00:00
Li, Jiang	e3ab93c896	[CPU] Refactor CPU fused MOE (#30531 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-12-18 14:36:49 +08:00
Nathan Price	fc2ae6d617	fix: add warmup for audio preprocessing (#30706 ) Signed-off-by: Nathan Price <nathan@abridge.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-18 06:12:29 +00:00
Yihua Cheng	ec965569d9	[KV connector][LMCache] Only record the cuda event when there are request to store/load (#30814 ) Signed-off-by: ApostaC <yihua98@uchicago.edu>	2025-12-18 05:31:34 +00:00

1 2 3 4 5 ...

12412 Commits