youkaichao
|
9360d34fa1
|
update to latest deepgemm for dsv3.2 (#25871)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-09-29 17:51:43 +08:00 |
|
Cyrus Leung
|
1b67b04656
|
[Misc] Remove more get_input_embeddings_v0 (#25857)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-29 08:03:37 +00:00 |
|
Isotr0py
|
bd51f78e39
|
[V0 Deprecation][Models] Remove all V0 condition for mm embeddings merge (#25331)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: isotr0py <2037008807@qq.com>
|
2025-09-29 14:09:18 +08:00 |
|
Roger Wang
|
65ecb4f134
|
[Bugfix] Fallback ViT attn backend to SDPA for blackwell (#25851)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-29 06:03:51 +00:00 |
|
Kunshang Ji
|
143844fa43
|
[XPU]Fix xpu spec decoding UTs, avoid using cuda graph (#25847)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-09-29 05:15:10 +00:00 |
|
Thomas Parnell
|
219cfbe7f6
|
Add Phi4FlashForCausalLM to _PREVIOUSLY_SUPPORTED_MODELS (#25832)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-09-29 05:08:17 +00:00 |
|
Robert Shaw
|
9b44a7d926
|
[P/D] NIXL Updates (#25844)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Chenheli Hua <huachenheli@outlook.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-09-29 04:46:30 +00:00 |
|
Juechen Liu
|
a3ae45a38c
|
[Misc] fix tests failure by using current_platform (#25825)
Signed-off-by: Juechen Liu <jueliu@meta.com>
|
2025-09-29 04:18:57 +00:00 |
|
Michael Goin
|
0307428d65
|
Remove redundant cudagraph dispatcher warning (#25841)
|
2025-09-28 17:12:42 -04:00 |
|
JJJYmmm
|
471997adf6
|
[Bugfix] fix Qwen3VLMoe load when pp > 1 (#25838)
Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com>
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com>
|
2025-09-28 17:56:12 +00:00 |
|
Yuxuan Zhang
|
b1ded114b9
|
Update GLM-4.5 Doc transformers version (#25830)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-09-28 12:05:51 +00:00 |
|
weiliang
|
f4e4088c99
|
Fix random dataset mismatched token length with config. (#24937)
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-28 08:23:44 +00:00 |
|
Isotr0py
|
0efd540dbc
|
[VLM] Update Qwen3-VL max_num_video_tokens calculation for configurable video profiling (#25557)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-28 04:21:01 +00:00 |
|
Roger Wang
|
6144754014
|
[Bugfix] Fix Qwen3-VL regression from #24982 (#25814)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-28 03:21:09 +00:00 |
|
Roger Wang
|
69311446ba
|
[MM] Optimize memory profiling for scattered multimodal embeddings (#25810)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-28 02:17:58 +00:00 |
|
Nicolò Lucchesi
|
da63274d9f
|
[Bugfix][NIXL] Fix Async Scheduler timeout issue (#25808)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-27 15:17:35 -04:00 |
|
Jialin Ouyang
|
c216119d64
|
[Core] GC Debug callback (#24829)
Signed-off-by: Jialin Ouyang <jialino@meta.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Co-authored-by: Jialin Ouyang <jialino@meta.com>
|
2025-09-27 17:53:31 +00:00 |
|
Clayton Coleman
|
5546acb463
|
[Bug]: Set LD_LIBRARY_PATH to include the 'standard' CUDA location (#25766)
Signed-off-by: Clayton Coleman <smarterclayton@gmail.com>
|
2025-09-27 13:36:28 -04:00 |
|
Jiangyun Zhu
|
c0ec81836f
|
[torch.compile]: Add VLLM_DEBUG_DUMP_PATH environment variable (#25651)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-27 16:09:00 +00:00 |
|
Patrick C. Toulme
|
b65e56babe
|
[Core] Refactor self.model() to call a helper for subclassing. (#25084)
Signed-off-by: Patrick Toulme <ptoulme@meta.com>
Signed-off-by: Patrick Toulme <pctoulme+1@gmail.com>
|
2025-09-27 08:40:59 -07:00 |
|
Peter Pan
|
49996cd597
|
[env] default nixl side port conflicts with kv-event zmq port (#25056)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2025-09-27 15:02:40 +00:00 |
|
yyzxw
|
ecb37e276a
|
[docs] transcriptions API audio upload (#25446)
Signed-off-by: zxw <1020938856@qq.com>
|
2025-09-27 15:00:35 +00:00 |
|
Tyler Michael Smith
|
a5354b3ed2
|
[Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models (#24982)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-09-27 14:22:28 +00:00 |
|
Tyler Michael Smith
|
f9df8b4ad7
|
[Bugfix] Fix triton import precommit failure (#25803)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-09-27 07:13:11 -07:00 |
|
Harry Mellor
|
ec152c8748
|
Fix GPTQ model loading in Transformers backend (#25770)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-27 12:18:20 +00:00 |
|
Russell Bryant
|
7977e5027c
|
Add filtering for chat template kwargs (#25794)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-27 10:46:49 +00:00 |
|
Russell Bryant
|
3f5d902d2a
|
Validate API tokens in constant time (#25781)
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com>
|
2025-09-27 18:09:26 +08:00 |
|
Cyrus Leung
|
27d7638b94
|
[Bugfix] Merge MM embeddings by index instead of token IDs (#16229)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-27 08:15:12 +00:00 |
|
Xiaohan Zou
|
176173989a
|
[Bugfix] Add missing image_size for phi4_multimodal (#25796)
|
2025-09-27 07:59:22 +00:00 |
|
Roger Wang
|
23b8ee672d
|
[Misc] Update openai client example file for multimodal (#25795)
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-27 07:57:07 +00:00 |
|
22quinn
|
3939152069
|
[Misc] Fix codeowners override for v1 sample and attention (#25037)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-09-27 07:47:29 +00:00 |
|
Cyrus Leung
|
cd87bfbf37
|
[CI/Build] Reorganize root-level V1 tests (#25767)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-27 13:51:15 +08:00 |
|
22quinn
|
b3613e3ace
|
[CI/Build] Add timing to Model Executor Test (#25799)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-09-26 21:57:27 -07:00 |
|
Cyrus Leung
|
d346ec695e
|
[CI/Build] Consolidate model loader tests and requirements (#25765)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-26 21:45:20 -07:00 |
|
Wentao Ye
|
c242c98031
|
[Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL (#25788)
|
2025-09-26 20:44:52 -07:00 |
|
WeiQing Chen
|
f1d53d150c
|
[Multimodal][Speculative Decoding]Eagle Eagle3 mm support, enablement on qwen2.5vl (#22872)
Signed-off-by: Junhong <liujunhong11@huawei.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>
|
2025-09-27 03:35:47 +00:00 |
|
Michael Goin
|
92da847cf5
|
Add flashinfer-build.sh and register precompiled cu128 wheel in Dockerfile (#25782)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-26 18:54:09 -07:00 |
|
Russell Bryant
|
3958b96bf5
|
Add option to restrict media domains (#25783)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-09-27 01:23:52 +00:00 |
|
Zhuohan Li
|
8bf8f45822
|
[Core] Don't count preempted tokens in prefix cache hit rate (#25787)
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
|
2025-09-27 00:16:40 +00:00 |
|
Jonas M. Kübler
|
6f5c0931c1
|
[Spec decode] automatically disable mm for text-only draft models (#25667)
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
|
2025-09-27 08:10:21 +08:00 |
|
Naman Lalit
|
4e33a7ea85
|
[Bugfix] Optimize CpuGpuBuffer initialization (#25447)
Signed-off-by: Naman Lalit <nl2688@nyu.edu>
|
2025-09-27 08:07:36 +08:00 |
|
Bram Wasti
|
dc48ba0c75
|
Kernel-override Determinism [1/n] (#25603)
Signed-off-by: Bram Wasti <bwasti@meta.com>
|
2025-09-26 16:59:09 -07:00 |
|
Sage Moore
|
4778b42660
|
Reduce the Cuda Graph memory footprint when running with DBO (#25779)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-09-26 22:29:56 +00:00 |
|
qizixi
|
c70ac4b8ff
|
[spec decode] Consolidate speculative decode method name for MTP (#25232)
Signed-off-by: zixi-qi <qizixi@meta.com>
|
2025-09-26 22:27:05 +00:00 |
|
Michael Goin
|
cf89202855
|
[CI] Fix FlashInfer AOT in release docker image (#25730)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-26 14:11:40 -07:00 |
|
fhl2000
|
f075693da7
|
[V1] address post issues related to #20059 (part 1) (#23046)
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-26 15:58:19 -04:00 |
|
Michael Goin
|
f708bd4904
|
[CI] Add E2E Blackwell Quantized MoE Test (#25723)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-26 12:23:00 -07:00 |
|
Michael Goin
|
0002b7f0d1
|
[Docs] Add Toronto Meetup (#25773)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-26 12:00:46 -07:00 |
|
Frank Wang
|
11aafd9886
|
[Bugfix] Improve GLM4 MoE Reasoning Parser's is_reasoning_end Condition (#25355)
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-09-26 11:54:00 -07:00 |
|
Clouddude
|
b761df963c
|
[Doc]: improve CPU(x86) build-wheel-from-source section (#25617)
Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com>
v0.11.0rc1
v0.11.1rc0
|
2025-09-26 10:26:33 -07:00 |
|