Commit Graph

10002 Commits

Author SHA1 Message Date
Zhou Jiahao
2e1b8bc2b6 [Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect logical_not (#25925)
Signed-off-by: zhoukz <me@zhoukz.com>
2025-09-30 08:15:23 +00:00
acisseJZhong
e47433b3c1 [BugFix] Pass config_format via try_get_generation_config (#25912) 2025-09-30 05:09:50 +00:00
Lucas Wilkinson
23194d83e8 [BugFix] Fix DP/EP hang (#25906)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-09-30 04:18:59 +00:00
Harry Mellor
61aedb5ffe MoveVllmConfig from config/__init__.py to config/vllm.py (#25271)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-29 19:49:49 -07:00
Zhuohan Li
d3bd171123 [Benchmark] Support benchmark throughput for external launcher DP (#25913)
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
2025-09-30 01:43:57 +00:00
Wentao Ye
89e4050af4 [Bug] Fix Weight Loading for Block FP8 Cutlass SM90 (#25909)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-30 09:15:19 +08:00
Andrew Sansom
78a47f87ce Test Prompt Embeds/LoRA compatibility and Enable LoRA Support for OPT Models (#25717)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
2025-09-30 08:10:58 +08:00
Aaron Pham
6a113d9aed [V0 Deprecation] Remove vllm.worker and update according imports (#25901) 2025-09-29 23:26:11 +00:00
Nicolò Lucchesi
2e4fe48c37 [NIXL] Increase default KV block eviction timeout on P (#25897)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-09-29 21:35:14 +00:00
Zhuohan Li
8eb0a1d906 [Doc] Polish example for torchrun dp (#25899) 2025-09-29 21:31:34 +00:00
Thomas Parnell
fea3e476aa [Kernel] Chunk-aligned mamba2 (#24683) 2025-09-29 23:18:25 +02:00
Gregory Shtrasberg
61a3431613 [Bugfix][ROCm] Fixing trying to import non-existent symbols from libnccl.so (#25605)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-09-29 17:01:50 -04:00
Naman Lalit
9bedac9623 [Doc] Add documentation for vLLM continuous benchmarking and profiling (#25819)
Signed-off-by: Naman Lalit <nl2688@nyu.edu>
2025-09-29 20:49:49 +00:00
Adrian Abeyta
c42ff4f4fd [BugFix][torch.compile] KV scale calculation issues with FP8 quantization (#25513)
Signed-off-by: adabeyta <aabeyta@redhat.com>
2025-09-29 15:52:04 -04:00
Lee Nau
d5ab28511c [Bugfix] Use correct key "ignore" for config.json non-quantized layers (#25706)
Signed-off-by: Lee Nau <lnau@nvidia.com>
2025-09-29 15:07:29 -04:00
Jee Jee Li
e61eb5e09d [Model] Remove MotifForCausalLM (#25866)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-30 00:36:30 +08:00
Isotr0py
0899ba5b42 [CI/Build] Include Transformers backend test in nightly transformers test (#25885)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-29 09:33:39 -07:00
Rahul Tuli
145ac73317 [Bugfix][Speculative Decoding] Fix Eagle3 quantization config issue (#25883)
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
2025-09-29 11:37:20 -04:00
Chenxi Yang
d0d138bc55 [Nixl][P/D] Add cuda2cpu support (HD->DH transfer) (#24690)
Signed-off-by: Chenxi Yang <cxyang@fb.com>
Co-authored-by: Chenxi Yang <cxyang@fb.com>
2025-09-29 14:31:51 +00:00
Jiangyun Zhu
43227236ec [torch.compile] serialize cudagraph_mode as its enum name instead of value (#25868)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-09-29 13:54:52 +00:00
Zhou Jiahao
8616300ae2 [Model][Bugfix] Fix issues in MiDashengLM implementation for quantized models (#25854)
Signed-off-by: zhoukz <me@zhoukz.com>
2025-09-29 10:59:04 +00:00
Yingjun Mou
edbaadd91f [Bugfix] Fix requirements paths in install instructions (#25827)
Signed-off-by: yingjun-mou <renzomou@gmail.com>
2025-09-29 03:49:35 -07:00
youkaichao
9360d34fa1 update to latest deepgemm for dsv3.2 (#25871)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-09-29 17:51:43 +08:00
Cyrus Leung
1b67b04656 [Misc] Remove more get_input_embeddings_v0 (#25857)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-29 08:03:37 +00:00
Isotr0py
bd51f78e39 [V0 Deprecation][Models] Remove all V0 condition for mm embeddings merge (#25331)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
2025-09-29 14:09:18 +08:00
Roger Wang
65ecb4f134 [Bugfix] Fallback ViT attn backend to SDPA for blackwell (#25851)
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-09-29 06:03:51 +00:00
Kunshang Ji
143844fa43 [XPU]Fix xpu spec decoding UTs, avoid using cuda graph (#25847)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-09-29 05:15:10 +00:00
Thomas Parnell
219cfbe7f6 Add Phi4FlashForCausalLM to _PREVIOUSLY_SUPPORTED_MODELS (#25832)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-09-29 05:08:17 +00:00
Robert Shaw
9b44a7d926 [P/D] NIXL Updates (#25844)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Chenheli Hua <huachenheli@outlook.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-09-29 04:46:30 +00:00
Juechen Liu
a3ae45a38c [Misc] fix tests failure by using current_platform (#25825)
Signed-off-by: Juechen Liu <jueliu@meta.com>
2025-09-29 04:18:57 +00:00
Michael Goin
0307428d65 Remove redundant cudagraph dispatcher warning (#25841) 2025-09-28 17:12:42 -04:00
JJJYmmm
471997adf6 [Bugfix] fix Qwen3VLMoe load when pp > 1 (#25838)
Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com>
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com>
2025-09-28 17:56:12 +00:00
Yuxuan Zhang
b1ded114b9 Update GLM-4.5 Doc transformers version (#25830)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
2025-09-28 12:05:51 +00:00
weiliang
f4e4088c99 Fix random dataset mismatched token length with config. (#24937)
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-09-28 08:23:44 +00:00
Isotr0py
0efd540dbc [VLM] Update Qwen3-VL max_num_video_tokens calculation for configurable video profiling (#25557)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-09-28 04:21:01 +00:00
Roger Wang
6144754014 [Bugfix] Fix Qwen3-VL regression from #24982 (#25814)
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-09-28 03:21:09 +00:00
Roger Wang
69311446ba [MM] Optimize memory profiling for scattered multimodal embeddings (#25810)
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-09-28 02:17:58 +00:00
Nicolò Lucchesi
da63274d9f [Bugfix][NIXL] Fix Async Scheduler timeout issue (#25808)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-09-27 15:17:35 -04:00
Jialin Ouyang
c216119d64 [Core] GC Debug callback (#24829)
Signed-off-by: Jialin Ouyang <jialino@meta.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Co-authored-by: Jialin Ouyang <jialino@meta.com>
2025-09-27 17:53:31 +00:00
Clayton Coleman
5546acb463 [Bug]: Set LD_LIBRARY_PATH to include the 'standard' CUDA location (#25766)
Signed-off-by: Clayton Coleman <smarterclayton@gmail.com>
2025-09-27 13:36:28 -04:00
Jiangyun Zhu
c0ec81836f [torch.compile]: Add VLLM_DEBUG_DUMP_PATH environment variable (#25651)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-09-27 16:09:00 +00:00
Patrick C. Toulme
b65e56babe [Core] Refactor self.model() to call a helper for subclassing. (#25084)
Signed-off-by: Patrick Toulme <ptoulme@meta.com>
Signed-off-by: Patrick Toulme <pctoulme+1@gmail.com>
2025-09-27 08:40:59 -07:00
Peter Pan
49996cd597 [env] default nixl side port conflicts with kv-event zmq port (#25056)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2025-09-27 15:02:40 +00:00
yyzxw
ecb37e276a [docs] transcriptions API audio upload (#25446)
Signed-off-by: zxw <1020938856@qq.com>
2025-09-27 15:00:35 +00:00
Tyler Michael Smith
a5354b3ed2 [Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models (#24982)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-09-27 14:22:28 +00:00
Tyler Michael Smith
f9df8b4ad7 [Bugfix] Fix triton import precommit failure (#25803)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-09-27 07:13:11 -07:00
Harry Mellor
ec152c8748 Fix GPTQ model loading in Transformers backend (#25770)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-27 12:18:20 +00:00
Russell Bryant
7977e5027c Add filtering for chat template kwargs (#25794)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-27 10:46:49 +00:00
Russell Bryant
3f5d902d2a Validate API tokens in constant time (#25781)
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com>
2025-09-27 18:09:26 +08:00
Cyrus Leung
27d7638b94 [Bugfix] Merge MM embeddings by index instead of token IDs (#16229)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2025-09-27 08:15:12 +00:00