biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Nick Hill	5bdd155277	[CI] Fix async scheduling + spec decoding test flake (#28902 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-18 05:26:32 +00:00
Ning Xie	0168f69e50	[Misc] Remove unnecessary parentheses from log statements (#28897 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-11-17 20:33:46 -08:00
Didier Durand	083cf326dc	[Doc]: fix typos in various files (#28863 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-17 20:32:14 -08:00
Cyrus Leung	bf9e1e8767	[Bugfix] Fix wrong CLI defaults for dynamic `SchedulerConfig` fields (#28872 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-17 20:30:29 -08:00
Wentao Ye	3ddcf46011	[Refactor] Remove Unused Func in Batch Invariant (#28881 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-17 20:29:29 -08:00
xuebwang-amd	d0a73620cc	[ROCm][Quantization] add apply_vllm_mapper in quark config for models like gpt-oss (#28638 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-18 11:16:45 +08:00
Michael Goin	88ab591f0b	Run macos smoke test workflow on main commit (#28752 ) Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-18 11:16:03 +08:00
Benjamin Bartels	b6e04390d3	[Bugfix] Fix Kimi-K2 tool parser concatenated tool calls parsing (#28831 ) Signed-off-by: Thomas Mao <yiyeguhu@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev> Co-authored-by: Thomas Mao <yiyeguhu@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-11-17 19:13:25 -08:00
Zhuohan Li	552cac95b5	[Misc] Fix wrong comment in scheduler (#28880 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-11-17 15:32:22 -08:00
Bangsheng Tang	61485844fc	[BugFix] Corner case that could cause out-of-sync with external launcher mode and dp >1 (#28774 )	2025-11-17 15:22:11 -08:00
Pranav	f77bce001a	[Model] Add Afmoe architecture implementation (#28332 ) Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr> Signed-off-by: Pranav <veldurthipranav@gmail.com> Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>	2025-11-17 15:11:20 -08:00
Wentao Ye	a289cc1dde	[Test] Batch Invariant: Rename and organize tests (#27421 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-17 18:09:47 -05:00
Shreyas Kulkarni	95ae50b7d1	[Quantization] [Eagle] Add complete quantization support to the draft model in Eagle (#28435 ) Signed-off-by: Shreyas Kulkarni <shreyas.gp269@gmail.com>	2025-11-17 15:01:34 -08:00
Nick Hill	7765e5ba75	[BugFix] Fix PP performance and PP kv connector output regression (#28768 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-17 14:08:50 -08:00
Ronald	d8874c61a5	[Core] Async Scheduling X Spec Decoding Compatibility (#24799 ) Signed-off-by: Ronald1995 <ronaldautomobile@163.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-11-17 12:16:20 -08:00
Zhewen Li	f8b19c0ffd	[Bugfix] Fix GPT-OSS on AMD after #28603 (#28816 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-17 13:15:26 -05:00
tiehexue	e42bd8c2e3	Cast return value to int64_t for cache size (#28814 ) Signed-off-by: tiehexue <tiehexue@hotmail.com>	2025-11-17 16:02:32 +00:00
Roger Wang	7f064491f8	[Bugfix][Perf] Revert applying HF processor on text-only inputs for multimodal models (#28858 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-11-17 14:49:25 +00:00
Lucas Wilkinson	64e39d667c	[BugFix] Temporary fix for IMA with MTP = 2 and full-cg (#28315 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-17 09:41:22 -05:00
Kunshang Ji	1b82fb0ad3	[XPU] work around for sp, avoid custom op import error (#28822 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-17 13:16:44 +00:00
Jae-Won Chung	d4acf518d0	[Metrics] Fix KV cache usage percent metric multiproc (#28792 ) The `vllm:kv_cache_usage_perc` Gauge metric is missing `multiprocess_mode="mostrecent"` and ends up returning ``` vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="277"} 0.0 vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="275"} 0.0 vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="273"} 0.6530455880475035 ... ``` The deprecated `vllm:gpu_cache_usage_perc` Gauge metric has `multiprocess_mode="mostrecent"`. Signed-off-by: Jae-Won Chung <jwnchung@umich.edu>	2025-11-17 09:54:15 +00:00
wuyaoxuehun	ab01cd14e5	[BugFix] Fix glm4_moe_mtp load weights bug (#28805 ) Signed-off-by: wuyaoxuehun <798143193@qq.com>	2025-11-17 17:13:11 +08:00
Li, Jiang	577bb34fff	[CPU][Bugfix] Fix _to_list in CPU model runner (#28824 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-17 07:47:24 +00:00
Jee Jee Li	3380ed5e11	[Doc] Add llama4 LoRA tag (#28825 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-17 14:08:48 +08:00
Jay Caldwell	6f37419244	[Bugfix][Model] Prevent special token leakage in KimiK2ToolParser streaming mode (#28543 ) Signed-off-by: Jscaldwell55 <jay.s.caldwell@gmail.com>	2025-11-17 13:54:46 +08:00
Xiake Sun	60e089f0b9	[ROCm][Qwen3-32B] Fix AITER MHA accuracy issue cause by #25763 (#28670 ) Signed-off-by: Xiake Sun <xiake.sun@amd.com>	2025-11-16 20:52:11 -08:00
liuzhenwei	d64429bb36	[NIXL][XPU] update install script of NIXL (#28778 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>	2025-11-17 03:01:33 +00:00
jiahanc	561253b37f	[Performance][Fix] update nvfp4 code to support renorm routing (#28569 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-16 18:02:42 -08:00
Nick Hill	80b6080ddc	[BugFix] Fix async scheduling + chunked prefill + preemption (#28787 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-17 06:46:46 +08:00
amirkl94	03ee48111d	Feature: Support Relu2 in FusedMoE fp8 cutlass path (#27261 )	2025-11-16 13:39:44 -05:00
Lukas Geiger	5a87076d6e	[Model][QwenVL] Optimize `Qwen2_5_VisionAttention` q,k preparation (#28769 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-16 17:37:15 +00:00
Ning Xie	ac1daf3233	fix comment typo (#28802 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-11-16 17:03:21 +00:00
Didier Durand	63fed55506	[Doc]: fix typos in various files (#28811 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-16 14:30:06 +00:00
Anna Shors	8d259fad6c	Fix gpt oss weight loading with EP + bf16 (#28765 ) Signed-off-by: ashors1 <ashors@nvidia.com>	2025-11-16 13:12:45 +00:00
scottzh8	3bc1175798	[Bugfix] Fix host and port join for ipv6 in bench serve (#28679 ) Signed-off-by: Scott Zhang <scottzh@fb.com> Co-authored-by: Scott Zhang <scottzh@fb.com>	2025-11-16 10:20:57 +00:00
Dezhan	af02c40970	Fixed gpt-oss _load_weights_other() parameter position bug (#28715 ) Co-authored-by: Dezhan Tu <dztu@meta.com>	2025-11-16 09:46:29 +00:00
Lucia Fang	b316ac6589	[V1] Support MP Executor for multi node distributed inference (#23691 ) Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-11-16 09:01:21 +00:00
wang.yuqi	a55b64635c	[Model] Allow users to control skip reading cache per request. (#28194 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com>	2025-11-16 00:04:50 -08:00
ai-jz	d231876ce3	[Benchmark] Fix client seed synchronization in multi-turn benchmark (#28512 ) Signed-off-by: ai-jz <aijz.xplr@gmail.com>	2025-11-16 15:04:32 +08:00
Bram Wasti	f849ee739c	Adding a benchmark for batch invariance (#28161 ) Signed-off-by: Bram Wasti <bwasti@meta.com> Signed-off-by: Bram Wasti <bwasti@fb.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-16 13:22:17 +08:00
Lucas Wilkinson	be263f7645	[BugFix] Fix `AssertionError: DCP not support reorder_batch_threshold > 1 now.` (#28751 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-15 22:35:06 +00:00
Didier Durand	2bb4435cb7	[Doc]: fix typos in various files (#28567 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-15 19:27:50 +00:00
Lukas Geiger	07cadab27a	[Model][Qwen3VL] Cache positional embedding indices (#28475 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-11-15 19:03:09 +00:00
Nick Hill	637f292196	[CI] Fix broken pipeline (#28781 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-15 08:44:14 -08:00
Eldar Kurtić	e439c784fa	Add support for Eagle with separate lm-head and embed_tokens layers (#28549 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>	2025-11-15 06:12:02 -08:00
hwhaokun	085a525332	[Model] Fix lmhead init bug of bailing_moe (#28777 ) Signed-off-by: hwhaokun <haokun0405@163.com> Co-authored-by: zhaozx-cn <zhaozx2116@163.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-15 05:44:12 -08:00
Cyrus Leung	89d3679221	[Doc] Fix failing doc build (#28772 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-15 05:33:27 -08:00
tingtinggithub	cb15ee28db	Allow Gemma3 to take image embeddings (#28483 ) Signed-off-by: tingtinggithub <streamttt@gmail.com>	2025-11-15 04:18:08 -08:00
Angela Yi	f36292dbee	[compile] Enable sequence parallelism matching w/o custom ops enabled (#27126 ) Signed-off-by: angelayi <yiangela7@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>	2025-11-15 11:46:12 +00:00
Vadim Gimpelson	173b356abf	[PERF] Remove TRTLLM Gen attn kernel limitation `max_seq_len <=131072` (#28755 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-15 15:43:41 +05:30

1 2 3 4 5 ...

11384 Commits