biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Lucas Wilkinson	636efd10a5	[Core] Separate out attention metadata building logic from prepare inputs (#26764 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-09 13:51:43 -05:00
Nick Hill	289eb6c537	[Core] Simplify async KV output aggregation (#28327 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-09 09:44:13 -08:00
Nicolò Lucchesi	19d91ece4b	[CI] Fix flaky `test_eagle_correctness` test (#28364 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-09 16:04:59 +00:00
Jiangyun Zhu	7ae5a5fb11	[Misc] Add some comments in qwen3-next (#28267 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-08 23:59:24 -08:00
Yong Hoon Shin	de2b78305f	[ROCm] Add env to enable/disable aiter triton gemm (#28321 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-11-08 22:27:00 -08:00
Ning Xie	e5e9067e61	[Misc] fix typo and add detailed log (#28178 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-11-09 05:33:46 +00:00
yihong	3a7d580343	fix: close issue 28338 by fixed python version (#28339 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-11-09 05:07:26 +00:00
Kevin H. Luu	05f8d69077	[chore] Move some wikimedia images to S3 (#28351 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2025-11-09 01:58:26 +00:00
Mohammad Miadh Angkad	404d7a9d14	[Performance][gpt-oss] Revert gpt-oss max cudagraph size to 1024 (#28345 ) Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>	2025-11-08 15:50:10 -07:00
ElizaWszola	171133f929	[Bugfix] Fix test fused quant layernorm tests (#27865 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-08 14:31:33 -08:00
Cole Murray	32787d0644	Remove setuptools upper bound constraint (<80) (#28337 ) Signed-off-by: Cole Murray <colemurray.cs@gmail.com>	2025-11-08 22:30:18 +00:00
Benjamin Chislett	975676d174	[Feat] Drop-in Torch CUDA Profiler (#27841 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-08 14:07:37 -08:00
Ev Lacey	77d702a22b	Enhance run_cluster.sh for multi-NIC support (#28328 ) Signed-off-by: Ev Lacey <elacey@nvidia.com>	2025-11-08 22:04:16 +00:00
zhangsicheng5	2108a571d7	[DCP] Support dcp kv_cache interleave size > 1 (#26696 ) Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com> Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: Qiu <qiuchunshuo@huawei.com> Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>	2025-11-09 04:45:27 +09:00
Andy Lo	47604137a2	[Bugfix] Spec decode + structured output + spec model max len edge case (#28298 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-11-08 19:44:25 +00:00
Robert Shaw	26990d25dc	[Bugfix] Update device name for H200 detection (#28349 ) Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-08 19:01:11 +00:00
Harry Mellor	d9ab1ad9d1	`reasoning_content` -> `reasoning` (#27752 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-08 12:15:08 +00:00
22quinn	608bb14462	[Attention] Remove max cudagraph size limit of 992 (#27840 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-11-07 22:33:27 -08:00
Xiaozhu Meng	4a36681f85	[flashinfer][fix] do not check nvcc availability when using pre-downloaded cubins (#27990 ) Signed-off-by: Xiaozhu <mxz297@gmail.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2025-11-07 22:25:21 -08:00
Abolfazl Shahbazi	d15afc1fd0	Refactor CPU/GPU extension targets for CMake build (#28026 ) Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>	2025-11-08 14:17:35 +08:00
Isotr0py	934a9c3b79	[Model] Consolidate Deepseek-MoE implementation with DeepSeek-v2 (#28101 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-08 05:01:27 +00:00
gnovack	70af44fd10	[bugfix] support eagle with lora cudagraph specialization (#28318 ) Signed-off-by: gnovack <gnovack@amazon.com>	2025-11-08 03:25:45 +00:00
Aurick Qiao	781f5ebf52	Bump arctic-inference requirement (#28174 ) Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-07 18:31:18 -08:00
Michael Goin	0852527647	[Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM (#28124 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-07 18:20:55 -08:00
Hamid Mukhtar	61d25dc44b	Update gpu.rocm.inc.md to add support for AMD Ryzen AI MAX / AI 300 Series (gfx1151, gfx1150) (#28308 ) Signed-off-by: Hamid Mukhtar <15519013+hammmmy@users.noreply.github.com>	2025-11-08 02:09:21 +00:00
Xiaohong (Sean) Chen	d0c7792004	[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding (#21068 ) Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Danielle Robinson <dcmaddix@gmail.com> Co-authored-by: Haipeng Li <li2haipeng@gmail.com> Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com>	2025-11-08 01:58:22 +00:00
Boyuan Feng	b158df2813	remove resolve_op_overloads and use splitting_ops directly (#28081 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-11-08 01:13:13 +00:00
Kunshang Ji	1aaecda078	[XPU] Enable Expert parallel for MoE models (#28263 ) Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-08 00:33:11 +00:00
Harry Mellor	811df41ee9	Update Flashinfer from `v0.4.1` to `v0.5.2` (#27952 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-07 16:24:42 -08:00
Nick Hill	67a2da890e	[PerfFix] Avoid separate thread for MP executor shm spin (take 2) (#28319 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-07 22:11:03 +00:00
Nick Hill	da786e339e	[Core] Rework handling of async scheduling config (#28250 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-07 20:01:23 +00:00
Benjamin Chislett	18903216f5	[Bugfix] Fix and add tests for GptOss reasoning parser (#28000 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-07 19:28:04 +00:00
Simon Mo	d0ceb38ae8	[Build] Fix release pipeline failing annotation (#28272 ) Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: Simon Mo <simon.mo@hey.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-07 10:06:45 -08:00
youkaichao	155ad56d7b	[doc] add guide about the provided PTX was compiled with an unsupported toolchain (#28305 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-11-08 00:26:34 +08:00
Fadi Arafeh	5fb4137c99	[README] Add Arm CPUs to the list of supported targets (#28290 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-11-07 15:41:47 +00:00
Nicolò Lucchesi	68a72a5cc1	Revert "[PerfFix] Avoid separate thread for MP executor shm spin (#28012 )" (#28289 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-07 15:07:01 +00:00
Boyuan Feng	0f872b7977	[Log] update shm wait time msg (#28255 )	2025-11-07 09:43:30 -05:00
Wentao Ye	4b1ff13221	[Feature] Default `ignore_eos` True for `random` dataset (#28227 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-07 07:35:33 -05:00
Iceber Gu	e0d6b4a867	[CLI] add --max-tokens to `vllm complete` (#28109 ) Signed-off-by: Iceber Gu <caiwei95@hotmail.com>	2025-11-07 12:21:40 +00:00
Pavani Majety	72b1c2ae2c	[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes (#27439 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-11-07 04:18:39 -08:00
Lukas Geiger	e0919f331d	[Core][MM] Add mechanism to configure multimodal fields which should stay on CPU (#28168 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-07 12:14:29 +00:00
Kevin H. Luu	8e19d470af	[fix] Revert "fixing mm placeholder replacement issue with gemma3" (#28285 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2025-11-07 12:09:09 +00:00
Mengqing Cao	1958bda9b4	[Misc][Model][Refactor] Pass the prefix into Linear layers (#28259 ) Signed-off-by: MengqingCao <cmq0113@163.com>	2025-11-07 19:38:38 +08:00
Zhang Xiangze	7bdb42b2f2	[CPU]Avoid repeated random sample compile (#28260 ) Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>	2025-11-07 11:03:57 +00:00
汪志鹏	315068eb4a	[FixBug]Aeala/ShareGPT_Vicuna_unfiltered marked as multimodal benchmark (#28265 ) Signed-off-by: princepride <wangzhipeng628@gmail.com>	2025-11-07 09:35:22 +00:00
Jialin Ouyang	ccd98b59c1	[Perf] Introduce FlattenLogprobs to store logprobs results to reduce GC overhead (#28171 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-07 00:27:12 -08:00
Jee Jee Li	21b82f4ea2	[Kernel] LoRA triton kernels support PDL (#27402 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-07 08:05:48 +00:00
Copilot	a736e5ff77	[CI] Reduce Blackwell Fusion test runtime by filtering tests and only run all tests in nightly (#28074 )	2025-11-07 15:58:16 +08:00
baonudesifeizhai	9da9208b20	[Bug] Fix missing token_ids for reasoning parser models in chat completions #28246 (#28256 )	2025-11-07 07:31:58 +00:00
smit kadvani	11fd69dd54	[amd][gptoss] Perf gain because of block alignment (#28024 ) Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com> Co-authored-by: Smit Shaileshbhai Kadvani <kadvani@meta.com>	2025-11-07 05:27:42 +00:00

... 43 44 45 46 47 ...

13302 Commits