biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
caozuoba	9e19f8338b	[Perf] add packed recurrent fast path for decode (#36596 ) Signed-off-by: hdj <1293066020@qq.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-03-12 04:01:57 -07:00
Sage	06e0bc21d2	[Frontend] Split `OpenAIServingModels` into `OpenAIModelRegistry` + `OpenAIServingModels` (#36536 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-12 03:29:37 -07:00
Chauncey	5a71cdd76e	[Bugfix] Fix crash when tool_choice=required exceeds max_tokens (#36841 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-12 03:28:45 -07:00
Shanshan Shen	f0d3658c0f	[MM][OOT] Support CPU `seq_lens` for OOT MMEncoderAttention kernels (#36605 ) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-12 03:28:23 -07:00
Michael Goin	57431d8231	[UX] Only show FP4 Marlin fallback warning for w4a4 models (#36806 ) Co-authored-by: Claude <noreply@anthropic.com>	2026-03-12 05:19:35 -04:00
Xu Jinyang	3e64fe4a18	[Bugfix] Warm up Triton autotuner for GDN layers during V1 profiling (#36599 ) Signed-off-by: AuYang <459461160@qq.com>	2026-03-12 00:51:09 -07:00
sfeiqiang	8cb24d3aed	[KV Connector] Support using FlexKV as KV Cache Offloading option. (#34328 ) Signed-off-by: phaedonsun <phaedonsun@tencent.com> Co-authored-by: phaedonsun <phaedonsun@tencent.com>	2026-03-12 00:46:20 -07:00
István Ketykó	00726c74c9	[Bugfix][Model] Fix DeepSeek-OCR TensorSchema crash on empty images_crop (#36670 ) Signed-off-by: István Ketykó <istvan.ketyko@gmail.com>	2026-03-12 15:35:54 +08:00
Chauncey	9fe404ed04	[Frontend] OpenAI Responses API supports Tool/Function calling with streaming (#29947 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-12 15:03:50 +08:00
Sage	802f306cd1	[Tests] Skip model weight download for render-only test server (#36813 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-12 06:24:42 +00:00
Yan Ma	894843eb25	replace `with torch.cuda.device` with `with torch.accelerator.device_index` (#36144 ) Signed-off-by: Yan Ma <yan.ma@intel.com>	2026-03-11 23:12:57 -07:00
Yanan Cao	584a3f56de	[Kernel][Helion][13/N] Force static_shapes=False in helion register (#36677 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 05:35:29 +00:00
Nick Hill	36735fd772	[BugFix] Fix multiple/duplicate stdout prefixes (#36822 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-12 12:23:21 +08:00
wang.yuqi	6ecabe4936	[CI Failure] Fix Language Models Test (Extended Pooling) daily CI Failure (#36761 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-12 12:22:05 +08:00
Woosuk Kwon	2f8b4ce0c0	[Model Runner V2] Do not initialize sampler for non-last PP ranks (#36824 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-12 03:55:28 +00:00
Yuwei An	2ef69456f5	[LMCache] Fault Tolerance Mechanism (#36586 ) Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>	2026-03-12 03:54:39 +00:00
Louie Tsai	17852aa503	more models for vLLM Benchmark Suite (#35086 ) Signed-off-by: louie-tsai <louie.tsai@intel.com>	2026-03-12 11:36:51 +08:00
Flora Feng	8647c6cf51	[Bugfix] Fix minimax_m2 tool parser when stream interval > 1 (#35895 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-12 10:25:14 +08:00
Kunshang Ji	513949f95f	[XPU][Doc] Remove manual OneAPI install step, now handled by torch-xpu (#36831 ) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-03-12 01:46:02 +00:00
Nick Hill	262b76a09f	[Frontend] Exclude anthropic billing header to avoid prefix cache miss (#36829 ) Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-12 01:20:34 +00:00
Wentao Ye	c34ba6b961	[Perf] Optimize compute maxsim using batched version, 3.2% E2E throughput improvement (#36710 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-12 08:37:01 +08:00
Matthias Gehre	24062b704f	[ROCm][CI/Build] Add gfx1152/gfx1153 (Krackan) to HIP supported architectures (#36499 ) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>	2026-03-11 23:14:40 +00:00
Aaron Hao	d6b61e5166	[BUG] Fix async rlhf tests (#35811 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2026-03-11 18:06:10 -04:00
Yanan Cao	cf632499ee	[Kernel] [Helion] [15/N] Split config files into per-platform files (#36698 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 17:25:29 -04:00
Yanan Cao	a3774a8198	[Kernel] [Helion] [12/N] Use FakeTensorMode to avoid GPU allocation during config key computation (#36563 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 17:25:16 -04:00
Yanan Cao	0ce21c46a0	[Kernel] [Helion] [14/N] Set autotune_ignore_errors=True during autotuning (#36683 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 17:25:04 -04:00
Woosuk Kwon	55eed6b7a5	[Model Runner V2] Add WhisperModelState [6/N] (#35790 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-11 14:20:38 -07:00
Giancarlo Delfin	c77181e534	[Model Runner V2] Add probabilistic rejection sampling for spec decoding (#35461 ) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>	2026-03-11 14:04:32 -07:00
maobaolong	12001f2ebc	[LMCache] Pass TP size in lookup for MLA multi-reader locking (#36129 ) Signed-off-by: baoloongmao <baoloongmao@tencent.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>	2026-03-11 20:45:20 +00:00
Or Ozeri	7ee5d5093b	[BugFix][kv_offload] Fix offloading decodes with async scheduling (#33881 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-11 20:43:40 +00:00
jennyyyyzhen	428bc718bd	[Bugfix][ROCm] Strip block_size before attention backend validation (#36274 ) Signed-off-by: jennyyyyzhen <yzhen@hmc.edu> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2026-03-11 13:37:31 -07:00
汪志鹏	ff1e3d9c63	[BugFix]: add bagel to MM_PREFIX_LM_MODELS (#36316 ) Signed-off-by: princepride <wangzhipeng628@gmail.com>	2026-03-11 19:55:59 +00:00
Wentao Ye	35bdca5431	[Refactor] Remove dead code in KV connector (#36424 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-11 19:40:17 +00:00
Amanzhol Salykov	8a24842765	[ROCm] add tuned moe_wna16_triton kernel configs for CDNA4 (#35093 ) Signed-off-by: salykova <amsalykov@gmail.com> Signed-off-by: amd-asalykov <asalykov@amd.com>	2026-03-11 19:00:08 +00:00
Harry Mellor	65986db6ba	Make Gemma and Gemma 2 accept `inputs_embeds` like Gemma 3 (#36787 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 18:12:43 +00:00
Luka Govedič	9556af87d5	[torch.compile] Add support for non-contiguous fused RMSNorm + group quant (#36551 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>	2026-03-11 10:56:55 -07:00
Or Ozeri	a1a3523a56	[KVConnector] Support worker -> scheduler metadata (#31964 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-11 17:36:37 +00:00
tianshu-Michael-yu	741f4e046b	fix: align lfm2 thumbnail token counting with HF (#36707 )	2026-03-11 10:28:38 -07:00
Julien Denize	a5d06dc557	Add 320 dimension size support to MLA (#36161 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2026-03-11 10:21:22 -07:00
Harry Mellor	5efa206a8c	Fix `ExaoneMoeMTP` test that never ran in Transformers v4 (#36792 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 17:10:23 +00:00
Cyrus Leung	196802dfa6	[Misc] Clean up renderers (#36770 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-11 16:39:29 +00:00
Isotr0py	c84b519cf3	[Bugfix] Fix negative max_tokens when input prompt is too long (#36789 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-11 16:30:51 +00:00
Flora Feng	741ecf0630	[CI] Add bfcl tool call correctness eval (#36560 ) Signed-off-by: sfeng33 <4florafeng@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-03-11 12:27:36 -04:00
Robert Shaw	b7e5a588d8	[Bugfix] Fix DP/EP Shared Expert With Monolithic Kernels (#36061 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-03-11 16:07:14 +00:00
Richard Zou	822e250ab7	[torch.compile] Use FakeTensors instead of real GPU tensors for single-size compilation (#36093 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-11 16:07:09 +00:00
Hongxin Xu	bea02cdf93	Fix routed experts capture for hybrid models (Mamba + Attention) (#35744 ) Signed-off-by: arlenxu <arlenxu@tencent.com> Signed-off-by: xhx1022 <1737006628@qq.com> Co-authored-by: arlenxu <arlenxu@tencent.com>	2026-03-11 08:53:10 -07:00
Julien Denize	a3ea760ea5	Add 'none' reasoning effort to ChatCompletionRequest (#36238 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2026-03-11 15:45:34 +00:00
Harry Mellor	35db669f1d	Correct link to supported hardware on vllm.ai (#36798 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 08:43:28 -07:00
Julien Denize	afebeffbfb	Add support to Mistral large 3 eagle with dense layers (#36163 ) Signed-off-by: juliendenize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-11 15:42:56 +00:00
Jhao-Ting Chen	5573894737	Kimi k2.5 MLA based eagle3 (#36361 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Co-authored-by: Izzy Putterman <iputterman@nvidia.com>	2026-03-11 11:36:11 -04:00

... 6 7 8 9 10 ...

15117 Commits