biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Amir Samani	030fc44914	use the same stream for cuda graph catpure and replay for NCCL (#29207 ) Signed-off-by: Amir Samani <asamani@nvidia.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-12-25 19:10:03 +08:00
Isotr0py	2532f437ee	[Doc] Add troubleshooting for Triton PTX error about undefined gpu-name (#31338 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-12-25 02:26:34 -08:00
Louie Tsai	f15185fbdb	[Benchmark Suite] improve cpu Benchmark Suite tests and comparison report for 0.12.0 (#30994 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com>	2025-12-25 08:51:45 +00:00
Mark Gatere	ba25a65992	[Frontend] add FunctionGemma tool parser support (#31218 ) Signed-off-by: gateremark <gateremg@gmail.com>	2025-12-25 15:29:25 +08:00
Amith KK	42826bbccd	[Doc] Add tool call parser documentation for GPT-OSS models (#31212 ) Signed-off-by: Amith KK <amithkumaran@gmail.com>	2025-12-25 05:29:10 +00:00
Richard Zou	254f6b9867	[Bugfix] Fix eagle dp tests on A100 (#31241 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-12-25 00:05:04 +00:00
Michael Goin	bc5ef333e0	[Perf] Add skip_clone to SamplingParams for internal request handling (#31041 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-24 14:35:57 -08:00
Cyrus Leung	09dc7c690c	[Chore][1/2] Drop `v0.14` deprecations (#31285 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-24 09:54:01 -08:00
ゆり	506eb0f454	[Bugfix] Remove dead `block_quant_to_tensor_quant` function (#31294 ) Co-authored-by: yurekami <yurekami@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-24 17:22:48 +00:00
Ning Xie	5d93089686	[cli] complete vllm cli help message (#31226 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-12-24 15:45:47 +00:00
Kevin McKay	66c9887440	[Bugfix][Hardware][AMD] Fix FP8 dtype in silu_mul quantization (#31179 ) Signed-off-by: c0de128 <kevin.mckay@outlook.com>	2025-12-24 10:37:11 -05:00
wang.yuqi	1ff67df182	[CI] Reorganization pooling_mteb_test (#31265 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-12-24 23:36:20 +08:00
skaraban3807	7cd288a4b3	[PERF] Add interleaved memory allocation to NUMA module (#30800 )	2025-12-24 13:47:49 +00:00
Cyrus Leung	d201807339	[Chore] Bump `lm-eval` version (#31264 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-24 05:39:13 -08:00
Cyrus Leung	aa3868ecfe	[Chore] Remove unused `noqa`s (#31263 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-24 05:38:46 -08:00
Cyrus Leung	7adeb4bfa8	[Bugfix] Fix `max_model_len="auto"` handling (#31260 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-24 19:15:27 +08:00
wang.yuqi	bd89ce16d2	[Model] Introduce verify_and_update_model_config for VerifyAndUpdateConfig. (#31131 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com>	2025-12-24 09:54:57 +00:00
Pleaplusone	b41aeb3468	[Bugfix][ROCm] Fix load issue on deepseek quark quantization when shared expert enabled (#31261 ) Signed-off-by: ganyi <ygan@amd.com>	2025-12-24 16:47:44 +08:00
Ryan Rock	ddfac7034e	[CI/Build] Ignore data_parallel_size_local (#30281 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2025-12-24 07:40:54 +00:00
Micah Williamson	6559d96796	[ROCm][CI] Set TORCH_NCCL_BLOCKING_WAIT Distributed Tests On ROCm (#31259 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-24 07:19:07 +00:00
kliuae	1c74150bca	[ROCm][CI] Fix "Distributed Tests (H200)" Test (#31227 ) Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>	2025-12-24 06:56:30 +00:00
Andreas Karatzas	0247a91e00	[ROCm][CI] Fix entrypoints tests and Python-only installation test on ROCm (#28979 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-23 22:42:30 -08:00
Michael Goin	8ee90c83f8	Add `--max-model-len auto` to auto-fit context to available memory (#29431 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-23 21:37:14 -08:00
Nick Cao	d7e05ac743	[docker] Fix downloading sccache on aarch64 platform (#30070 ) Signed-off-by: Nick Cao <nickcao@nichi.co>	2025-12-23 21:36:33 -08:00
sihao_li	471ddb99a0	[XPU] Remove distributed_executor_backend check (#30760 ) Signed-off-by: sihao.li <sihao.li@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-12-23 21:34:33 -08:00
Xiong Wang	bb24592d13	[Qwen3-Omni] fixed _get_feat_extract_output_lengths function (#31007 ) Signed-off-by: Xiong Wang <wangxiongts@163.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-12-23 21:33:54 -08:00
Matthew Bonanni	369f47aa0f	[DeepSeek v3.2] Remove unnecessary syncwarps (#31047 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-23 21:33:30 -08:00
zejunchen-zejun	dabff12ed3	[Bugfix][ROCm][Dynamo][DS 3.1][FP8] fix unsupported hasattr call when Dynamo tracing for ROCm device (#31149 ) Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>	2025-12-23 21:32:19 -08:00
Ming Yang	3bb9561928	Revert "[bench] Support common prefix len config (for decode-only bench)" (#31240 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-12-23 21:17:23 -08:00
Micah Williamson	3ce791ac77	[ROCm][CI] Set VLLM_FLOAT32_MATMUL_PRECISION="tf32" For terratorch Tests In AMD CI (#31242 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-24 03:21:50 +00:00
Andreas Karatzas	e42894f5b5	[ROCm][CI][Bugfix] Fix Siglip2 rotary embedding dispatch and InternVL video test tolerance (#31235 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-24 02:56:58 +00:00
Wentao Ye	76e6a95192	[Bug] Fix `Number of dimensions of tensors must match.` for Deepseek V3.2 (#31160 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-24 10:41:09 +08:00
Chao Lei	8b59753cdb	[P/D] Mooncake connector support more protocols (#30133 ) Signed-off-by: LCAIZJ <leichao139636@163.com>	2025-12-24 10:24:07 +08:00
Chen Zhang	538e830caa	[KVEvent] User request.block_hash for parent block_hash (#30544 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Co-authored-by: Yifan Qiao <yifanqiao@berkeley.edu>	2025-12-23 18:23:43 -08:00
rongfu.leng	4ed11105d7	[Misc] Remove unused custom ops `copy_blocks` and `copy_blocks_mla` (#30967 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-12-23 18:22:35 -08:00
Cyrus Leung	dd424571c8	[Bugfix] Enable `dynamic_dims` for different embeds shape (#31223 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-24 10:15:47 +08:00
Cyrus Leung	ca6a95ba25	[Chore] Simplify logic of `_execute_mm_encoder` (#31222 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-23 18:15:16 -08:00
Vadim Gimpelson	bc0a5a0c08	[CI] Add Qwen3-Next-FP8 to Blackwell model tests (#31049 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-12-23 17:21:50 -08:00
Andreas Karatzas	bfa2c0bbb9	[ROCm][Bugfix] Fix RuntimeError in MMEncoderAttention by replacing .view() with .reshape() (#31203 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-23 21:48:01 +00:00
Mark McLoughlin	f790068600	[Core] Add a random suffix to frontend-provided request IDs (#27987 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-23 13:05:39 -08:00
Asaf Joseph Gardin	34916ae37f	[Mamba] - Consolidate Mambas Attention Logic (#28133 )	2025-12-23 21:57:00 +01:00
Yuan Tang	0736f901e7	docs: Add llm-d integration to the website (#31234 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-12-23 20:27:22 +00:00
Harry Mellor	c016c95b45	Use helper function instead of looping through attribute names (#29788 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-23 17:31:56 +00:00
Harry Mellor	1339878e13	Only patch `original_max_position_embeddings` for Transformers v4 (#31214 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-23 16:46:32 +00:00
danielafrimi	b94f80ffb8	[FIX] FP4 quantization kernel padding initialization bug (#31097 ) Signed-off-by: <> Co-authored-by: root <root@gpu-193.slurm-workers-slurm.slurm.svc.cluster.local> Co-authored-by: root <root@gpu-951.slurm-workers-slurm.slurm.svc.cluster.local>	2025-12-23 08:45:18 -08:00
Joachim Studnia	38c361f99d	Fix edge case Mistral tool parser (#30724 ) Signed-off-by: Joachim Studnia <joachim@mistral.ai> Signed-off-by: Joachim Studnia <studniajoachim@gmail.com> Signed-off-by: juliendenize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: juliendenize <julien.denize@mistral.ai> Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2025-12-23 14:19:58 +00:00
Cyrus Leung	bb62dda2c3	[Misc] Introduce `encode_*_url` utility function (#31208 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-23 13:45:21 +00:00
Patrick von Platen	3faa8bee57	adapt voxtral (#31095 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>	2025-12-23 05:31:55 -08:00
Harry Mellor	b10d47e0e0	Add util function for checking nesting of rope parameters (#31146 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-23 11:41:49 +00:00
R3hankhan	769f27e701	[OpenAI] Add parameter metadata to validation errors (#30134 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2025-12-23 11:30:12 +00:00

1 2 3 4 5 ...

12527 Commits