biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Aaron Pham	c29fb540ff	[gpt-oss] tool parser supports for /chat/completions [1/n] (#22386 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-09-04 20:39:12 -07:00
Nicolò Lucchesi	65e038931d	[Frontend] Skip unnecessary detokenization when token_id is requested (#24236 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-09-04 23:04:12 +00:00
Zhuohan Li	886ccbe5ba	[CI/Build] Reduce the number of redundant cases to test for LoRA (#24276 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-09-04 21:58:44 +00:00
elvischenv	adc3ddb430	[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files (#23727 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-04 14:25:45 -07:00
Seiji Eicher	60b755cbcb	[Misc] Have AsyncLLM `custom_stat_loggers` extend default logger list (#20952 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-09-04 14:25:30 -07:00
Saman A. Pour	482e52f56c	QWEN3 Coder Fused MoE kernels Optimization configs (#24266 ) Signed-off-by: Saman Keon <samanamp@outlook.com>	2025-09-04 20:33:43 +00:00
Po-Han Huang (NVIDIA)	78336a0c3e	Upgrade FlashInfer to v0.3.0 (#24086 ) Signed-off-by: Po-Han Huang <pohanh@nvidia.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-09-04 09:49:20 -07:00
Jee Jee Li	94866d7c93	[Misc] Slight improve deepgemm print (#24085 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-04 16:06:51 +00:00
Didier Durand	83609ca91d	[Doc]: fix typos in Python comments (#24173 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-04 08:52:17 -07:00
Nick Hill	e41a0fa377	[Perf] Freeze core engine proc heap after init (#24008 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-04 22:55:23 +08:00
nvjullin	37241077d5	[Misc] Removed force_fp8_e4m3fnuz from FP8LinearOp (#23725 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-04 09:25:40 -04:00
Yash Pratap Singh	c9f7081f9c	[LoRA]: Add lora support to qwen-2.5-omni (#24231 )	2025-09-04 05:50:50 -07:00
Kunshang Ji	16ded21eeb	[XPU] support Triton Attention backend on Intel GPU (#24149 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-04 20:41:08 +08:00
nopperl	2b30afa442	Use hidden_size_per_head as head_size fallback (#24221 ) Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>	2025-09-04 12:59:16 +01:00
Jiangyun Zhu	eafa8dcde6	[Model] Add pp support for hunyuan (#24212 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-09-04 03:58:26 -07:00
TJian	6c7af8110a	[Doc] Update vLLM Singapore Meetup info (#24234 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-09-04 02:58:18 -07:00
Kebe	8f423e5f43	[Feature][Response API] Add streaming support for non-harmony (#23741 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-09-04 17:49:06 +08:00
Ignacio Sica	369a079568	[Hardware][Apple-CPU] Disable OneDNN build for Apple Silicon (#24200 ) Signed-off-by: ignaciosica <mignacio.sica@gmail.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2025-09-04 02:48:25 -07:00
Lucas Wilkinson	402759d472	[Attention] FlashAttn MLA (#14258 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2025-09-04 02:47:59 -07:00
Fanli Lin	2c301ee2eb	[Bugfix] Fix Incremental Detokenization with `tokenizers == 0.22.0` (#24159 ) Signed-off-by: Fanli Lin <fanli.lin@intel.com> Signed-off-by: Fanli Lin <fanli0116@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-04 02:47:08 -07:00
whx	3efb9f4d95	[Attention][Platform] Refactor MLA to support Custom Op (#23332 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2025-09-04 02:46:37 -07:00
anthonsu	04f3c35cff	Improve flexibility of auto_tune.sh execution. (#23766 ) Signed-off-by: Anthony Su <50185138+anthonsu@users.noreply.github.com> Signed-off-by: anthonsu <50185138+anthonsu@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-04 09:41:41 +00:00
mgazz	51d5e9be7d	[Core][Model] Terratorch backend integration (#23513 ) Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com> Signed-off-by: Christian Pinto <christian.pinto@ibm.com> Co-authored-by: Christian Pinto <christian.pinto@ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-04 00:22:41 -07:00
bingchen-mi	e7fc70016f	[Model] Add MiDashengLM model support (#23652 ) Signed-off-by: chenbing8 <chenbing8@xiaomi.com> Signed-off-by: bingchen-mi <chenbing8@xiaomi.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-04 00:08:09 -07:00
Weida Hong	12e1e63cc5	[Misc] Enhance output readability of helper script (#24214 ) Signed-off-by: Weida Hong <wdhongtw@google.com>	2025-09-04 06:38:26 +00:00
Li, Jiang	57b1ce94f7	[CPU] Refactor CPU unquantized linear (#24150 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-09-04 14:28:45 +08:00
Benji Beck	cb55ad86fe	Migrate ultravox inputs to TensorSchema (#23503 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-09-04 06:09:11 +00:00
Flora Feng	712b273f65	[Refactor] Introduce basic Renderer for completion-style request (#24010 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2025-09-04 05:21:12 +00:00
Qiming Zhang	e919d6f549	[Kernel][Bugfix] Fix grouped topk cu (#24146 ) Signed-off-by: mayuyuace <qiming1.zhang@intel.com>	2025-09-04 12:37:37 +08:00
wuhang	a38f8bd54c	[Feature][Responses API]Support MCP tools with streaming mode + background mode (#23927 ) Signed-off-by: wuhang <wuhang6@huawei.com>	2025-09-04 04:05:10 +00:00
Peter Pan	b5ee1e3261	Remove deprecated `PyNcclConnector` (#24151 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>	2025-09-03 22:49:16 +00:00
George Nagy II	36c260dad6	[Feature][gpt-oss] Add support for num_cached_tokens and num_reasoning_tokens tracking (#23460 ) Signed-off-by: George Nagy II <george.nagy0969@gmail.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-09-03 21:08:47 +00:00
Kebe	a43a3f1770	[Bugfix][DP] DP distribution does not require ray[default] (#23822 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-09-03 13:21:36 -07:00
WeiQing Chen	6adaed42f4	[Feature][P/D]: Optimize NIXL Connector xfer Launch (#23887 ) Signed-off-by: ycyaw66 <497410282@qq.com> Co-authored-by: ycyaw66 <497410282@qq.com>	2025-09-03 19:14:30 +00:00
Matthew Bonanni	a742322092	[Attention] Blackwell FP8 MLA support with CUTLASS_MLA backend (#23289 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-09-03 14:05:24 -04:00
Benji Beck	731a6940e3	Migrate whisper inputs to TensorSchema (#23505 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-09-03 18:04:00 +00:00
bnellnm	e9b92dcd89	[Kernels] Overlap shared experts with send/recv (#23273 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-03 12:35:18 -04:00
nopperl	fa4311d85f	[V1] v1 engine + full CUDA graph support for PLaMo2 (#23998 ) Signed-off-by: Hemmi Shinichi <shemmi@preferred.jp> Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com> Co-authored-by: Hemmi Shinichi <shemmi@preferred.jp> Co-authored-by: Thomas Parnell <tom.parnell@gmail.com>	2025-09-03 08:24:02 -07:00
Burkhard Ringlein	6d80ae83e1	[Bugfix] Fixing division by zero in triton_attn if query_heads/kv_heads > 16 (#23424 ) Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>	2025-09-03 15:01:09 +00:00
dongbo910220	4ba0c587ba	FIX: Add libnuma-dev to Dockerfile for dev stage (#20388 ) Signed-off-by: dongbo910220 <1275604947@qq.com>	2025-09-03 07:17:20 -07:00
qscqesze	6997a25ac6	[Model] Remove useless code from MiniMax implementation (#23982 ) Signed-off-by: QscQ <qscqesze@gmail.com> Signed-off-by: qingjun <qingjun@minimaxi.com>	2025-09-03 11:27:04 +00:00
Jakub Smid	28f350e147	Support add_generation_prompt in embeddings endpoint with chat request (#23931 ) Signed-off-by: biba10 <jaksmid@seznam.cz>	2025-09-03 10:47:55 +00:00
wang.yuqi	51383bd472	[CI] Accelerate mteb test by setting SentenceTransformers mteb score to a constant (#24088 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-09-03 17:23:56 +08:00
Isotr0py	9c99e4871f	[Misc] Clean up deadcode for legacy processing pipeline (#24153 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-03 08:34:29 +00:00
dsinghvi	70549c1245	[CI/Build] Serve images used by multimodal tests through local HTTP Server (#23907 ) Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com> Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-09-03 16:13:11 +08:00
Nicolò Lucchesi	f0c503f66e	[Nixl] Heterogeneous TP support FlashInfer (#20189 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-09-03 15:19:54 +08:00
youkaichao	f38035c123	[distributed][rl] remove nccl cumem env var override (#24141 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-03 06:45:25 +00:00
Yong Hoon Shin	426cc8629f	[BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (#24132 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-09-03 04:57:59 +00:00
Jiangyun Zhu	e81d4e69c1	[Misc] Add check for dual_chunk_attention (#24070 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-09-03 04:19:14 +00:00
Didier Durand	02d411fdb2	[Doc]: fix typos in Python comments (#24115 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-02 21:14:07 -07:00

1 2 3 4 5 ...

9263 Commits