biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Julien Denize	434f3d3eb8	Fix mistral config (#29172 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2025-11-21 14:01:20 +00:00
sfbemerk	2092ce8c39	Tool Call Parser logs should not contain user input / model output except on DEBUG (#29160 ) Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com> Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-11-21 20:57:19 +08:00
who who who	fc9f821d20	fix cross attention (#28346 ) Signed-off-by: fsx950223 <fsx950223@outlook.com>	2025-11-21 04:55:43 -08:00
Russell Bryant	cca2d2cdbe	[Core] Align whisper closer to other multimodal models (#27292 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-11-21 12:01:54 +00:00
Cyrus Leung	aab0102a26	[V0 deprecation] Remove more V0 references (#29088 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-21 11:56:59 +00:00
Huamin Li	8ac3a41487	[CI Failure] Fix Gemma3 RoPE configuration for sliding attention layers (#29111 ) Signed-off-by: Huamin Li <3ericli@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-20 23:53:30 -08:00
Canlin Guo	7d6da483b0	[Minor][Clean] Remove the legacy assertion in video (#29150 ) Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2025-11-20 23:52:34 -08:00
Chenheli Hua	e4c3182c68	[Small] Capture AttributeError when checking ray dependency. (#29024 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-11-20 22:54:10 -08:00
Alex Brooks	b4734b9550	[Bugfix] Fix default MM LoRA alignment for single str prompts (#29140 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-11-21 13:32:30 +08:00
Jialin Ouyang	30b9c67743	Revert "[Redo] #26368 (#28771 )" (#29121 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-20 21:27:45 -08:00
Matthew Bonanni	11857a00b0	[Attention] Add ROCM_AITER_MLA_SPARSE to attention backend registry (#29103 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-20 20:24:43 -08:00
Boyuan Feng	8c25f9cfb6	[BugFix] skip combo kernel on cpu (#29129 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-11-21 11:50:59 +08:00
Cyrus Leung	56e96b37e4	[V0 Deprecation] Remove `best_of` (#29090 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-21 11:40:40 +08:00
jeremyteboul	0730414999	[Core] Add audio_embeds support to chat completions (#29059 ) Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com> Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>	2025-11-21 11:39:47 +08:00
zhrrr	a982f5b5ea	[kernel][perf] support uncontiguous input for rms_norm kernel (#28103 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com> Signed-off-by: izhuhaoran <izhuhaoran@qq.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-20 19:39:09 -08:00
Cyrus Leung	0e741c12e3	[Bugfix] Fix Plamo3 rope handling (#29092 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-21 11:38:35 +08:00
Wentao Ye	56669c1f29	[CI] Fix mypy for `vllm/v1/worker` (#29037 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-21 11:36:07 +08:00
Hongxia Yang	3f5f36da3f	[ROCm] Fix for import when building with upstream triton for gfx1100 for gpt-oss serving (#29127 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>	2025-11-21 03:30:07 +00:00
Wentao Ye	e1eefa4c40	[Bug] Fix torch warning of tf32 usage (#29112 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-21 01:54:59 +00:00
Xiao Li	ed6ae1e36a	[AITER] [ROCm] Fix crash when loading llama4 model with old aiter version installed, fallback to forward_native implementation (#29124 ) Signed-off-by: Xiao Li <ilx@meta.com>	2025-11-20 17:54:35 -08:00
Jee Jee Li	9875be6431	[LoRA][2/2]Remove LoRA extra vocab (#28545 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-21 09:46:43 +08:00
Wentao Ye	df44df0143	[Feature] Shared Experts Overlap with FI deepgemm swap kernel, 2.2% throughput improvement and 3.6% TTFT improvement (#28879 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-20 18:41:49 -07:00
Driss Guessous	3fd74189db	Fixes bench (#29058 ) Signed-off-by: drisspg <drisspguessous@gmail.com>	2025-11-20 21:21:54 +00:00
Software Developer	4d01b64284	[Bugfix] - Add Trace Headers to Beam Search Path (#29100 ) Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>	2025-11-20 20:00:33 +00:00
Or Ozeri	647464719b	[KVConnector][Core] Support cross-layer KV blocks (#27743 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-11-20 19:09:59 +01:00
rookie	56f45eddaf	[Frontend] Optimize beam search loop by sorting and then splicing (#19347 ) Signed-off-by: zhangguozhu <zhangguozhu@360.cn> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: zhangguozhu <zhangguozhu@360.cn> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-20 09:02:30 -08:00
Fanli Lin	a2e9ebe9e2	[BugFix] Fix flash_attn import in `siglip2navit.py` (#29082 ) Signed-off-by: Fanli Lin <fanli.lin@intel.com>	2025-11-20 12:14:29 +00:00
Zhewen Li	93c8672ceb	[Bugfix] Fix spec decode memory regression after #28549 (#28819 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-20 19:05:50 +08:00
Samit	371b1d4c61	[RL] Add Pause and Resume Generation for Asynchronous RL Training (#28037 ) Signed-off-by: SamitHuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: samithuang <285365963@qq.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-11-20 03:01:03 -08:00
Shinichi Hemmi	c9e093116c	[MODEL] Implement plamo3 (#28834 ) Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>	2025-11-20 03:00:19 -08:00
Or Ozeri	c0c2dd1e0b	[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks (#28951 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-20 18:55:10 +08:00
Pleaplusone	06c20c9904	[ROCm] Add AMD GPU support on Deepseek v3.2 and SparseMLA (#26670 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-20 02:54:01 -08:00
Anna Shors	6eb745d9bd	Add truncate arg to yarn to match openai implementation of gpt-oss (#28244 ) Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-11-20 18:53:50 +08:00
Dezhan	dc45efc8ef	[BugFix] Fix Llama4 Pipeline Parallelism Assert Error (#28577 ) Co-authored-by: Dezhan Tu <dztu@meta.com>	2025-11-20 02:52:36 -08:00
Wentao Ye	2c52c7fd9a	[Bug] Fix torch dynamo warning Dynamo detected a call to a `functools.lru_cache` (#29038 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-20 16:52:23 +08:00
Pleaplusone	7218f83992	[ROCm][BugFix] Fix shared expert loading error when disable `VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS` (#28633 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-20 14:50:23 +07:00
Cyrus Leung	20e4497be2	[V0 Deprecation] Remove `num_lookahead_slots` (#29000 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-20 06:39:10 +00:00
Quentin Gallouédec	1c7bcc55b8	[Frontend] Allow parsed tool arguments (#28820 ) Signed-off-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-19 22:20:12 -08:00
Lukas Geiger	a9705a290a	[Model][QwenVL] Replace `torch.repeat_interleave` with faster `np.repeat` (#28964 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-19 22:04:23 -08:00
Isotr0py	64192d5624	[Bugfix] Revert custom attention mask for gemma3-mm (#28995 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-20 13:23:22 +08:00
Canlin Guo	fe25772aa9	[Bugfix] Handle broken frames in video loading (#29001 ) Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: 凌葭 <lvjiang.lj@alibaba-inc.com> Co-authored-by: 凌葭 <lvjiang.lj@alibaba-inc.com>	2025-11-20 04:38:12 +00:00
prashanth058	0cca9b4d13	[Bugfix] Fix precision loss in LoRA-wrapped RowParallelLinear by fusing bias into GEMM (#28972 ) Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>	2025-11-20 03:50:37 +00:00
Shengliang Xu	a8c536829c	Consolidate Nvidia ModelOpt quant config handling for all quantization methods (#28076 ) Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>	2025-11-19 22:39:36 -05:00
Benjamin Chislett	fcbcba6c70	[Feat] Iteration-level profiling for Torch and CUDA profiler (#28987 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-19 19:17:48 -08:00
Qiang Zhang	3fb0d90999	[AMD] Use Decoupled Kernel Block Size to Support AITER MLA block_size=1 (#27715 ) Signed-off-by: chiangzhang <chiangzhang@tencent.com>	2025-11-20 02:11:52 +00:00
Kuntai Du	05c2dee7e9	[DeepSeek + LMCache Multiprocess] handle MLA for deepseek model + LMCache Multiprocess connector (#29039 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-11-20 01:40:49 +00:00
liangel-02	1d642872a2	[torchao] fix safetensors for sharding (#28169 ) Signed-off-by: Angel Li <liangel@meta.com>	2025-11-19 16:39:45 -08:00
Nick Hill	9ccef8e333	[Misc] Colorize logs (#29017 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-19 19:26:04 -05:00
Jialin Ouyang	537cc635c7	[GC Debugger] Simply and improve GC Debugger Utils (#29029 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-20 00:10:22 +00:00
Wentao Ye	5031cd5d55	[Refactor] Optimize `select_experts` (#28069 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-19 18:53:15 -05:00

1 2 3 4 5 ...

8027 Commits