Li, Jiang
|
59d5d2c736
|
[CI/Build] Skip prompt embeddings tests on V1-only CPU backend (#24721)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-09-12 18:51:01 +08:00 |
|
wang.yuqi
|
d21a36f5f9
|
[CI] Add ci_envs for convenient local testing (#24630)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-12 08:52:25 +00:00 |
|
Chen Zhang
|
561a0baee0
|
[CI] Fix flaky test v1/worker/test_gpu_model_runner.py::test_kv_cache_stride_order (#24640)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-09-12 07:49:09 +00:00 |
|
Nick Hill
|
f592b3174b
|
[BugFix] Fix Qwen3-Next PP (#24709)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-09-11 23:35:04 -07:00 |
|
Li, Jiang
|
7920de0a2a
|
[Bugfix] Fix MRoPE dispatch on CPU (#24712)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-09-12 04:56:31 +00:00 |
|
Andrew Sansom
|
ddcec289c7
|
Fix implementation divergence for BLOOM models between vLLM and HuggingFace when using prompt embeds (#24686)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-09-12 04:35:48 +00:00 |
|
Maximilien de Bayser
|
e090b7b45b
|
Enable conversion of multimodal models to pooling tasks (#24451)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-09-12 03:30:41 +00:00 |
|
Gregory Shtrasberg
|
6a50eaa0d3
|
[DOCs] Update ROCm installation docs section (#24691)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-09-11 20:02:53 -07:00 |
|
Jee Jee Li
|
12a8414d81
|
[Qwen3-Next] MoE configs for H20 TP=1,2,4,8 (#24707)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-12 10:06:26 +08:00 |
|
Tao He
|
880c741bb6
|
[Bugfix] fixes the causal_conv1d_update kernel update non-speculative decoding cases (#24680)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
v0.10.2rc2
|
2025-09-11 18:16:43 -07:00 |
|
RichardoMu
|
40b6c9122b
|
[V1] feat:add engine v1 tracing (#20372)
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: RichardoMu <44485717+RichardoMrMu@users.noreply.github.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Mu Huai <tianbowen.tbw@antgroup.com>
Co-authored-by: Ye Zhang <zhysishu@gmail.com>
Co-authored-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: 瑜琮 <ly186375@antfin.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-09-11 17:10:39 -07:00 |
|
Lucas Wilkinson
|
2e6bc46821
|
[Startup] Make DeepGEMM warmup scale with max-num-batched-tokens (#24693)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-09-11 20:10:19 -04:00 |
|
Wentao Ye
|
fcba05c435
|
[Bug] Fix Layer weight_block_size Assertion Issue (#24674)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-11 19:47:59 -04:00 |
|
Zazzle516
|
7a30fa8708
|
[Doc] Clarify cudagraph capture size logic and default behavior in scheduler (#18698)
Signed-off-by: Zazzle516 <2405677060@qq.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-11 23:18:09 +00:00 |
|
Chen Zhang
|
f82f7a8990
|
[Qwen3-Next] MOE configs for H100 TP4 (#24699)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-09-11 15:45:52 -07:00 |
|
Michael Goin
|
c3aea10dc8
|
[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-11 15:43:14 -07:00 |
|
Matthew Bonanni
|
d4fd2768ef
|
[Bugfix][Attention] Fix FlashInfer MLA block size logic (#24692)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
|
2025-09-11 22:39:42 +00:00 |
|
Vadim Gimpelson
|
7a70a71892
|
[Qwen3-Next] Add B200 MoE configs for Qwen3-next (#24698)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-09-11 15:34:58 -07:00 |
|
Zhewen Li
|
7d4651997a
|
[CI/Build] Add bc-linter to vLLM CI (#21234)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-09-11 15:34:36 -07:00 |
|
Woosuk Kwon
|
569bf1c9c0
|
[Qwen3-Next] MoE configs for H200 TP=1,2,4 (#24695)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-11 14:38:16 -07:00 |
|
Wentao Ye
|
1ec20355f5
|
[Bugfix] Set VLLM_ALLREDUCE_USE_SYMM_MEM default to False (#24696)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-11 14:32:27 -07:00 |
|
Xiaozhu Meng
|
e42af78b18
|
[flashinfer] [kernel] support for fp8 kv cache for trtllm prefill attention (#24197)
Signed-off-by: Xiaozhu <mxz297@gmail.com>
|
2025-09-11 14:20:09 -07:00 |
|
Duncan Moss
|
074854b24f
|
[Kernel][B200] mxfp4 fused cutlass moe (#23696)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-11 17:04:56 -04:00 |
|
Andrew Xia
|
79ac59f32e
|
Update Spec Decode metrics to include drafted and accepted token throughput (#24127)
Signed-off-by: Andrew Xia <axia@meta.com>
|
2025-09-11 19:58:43 +00:00 |
|
Nick Hill
|
b971f91504
|
[BugFix] Fix tokenize asyncio task leak (#24677)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-09-11 19:44:04 +00:00 |
|
Woosuk Kwon
|
c733bd5e87
|
[Qwen3-Next] Add MoE Config for H200 (#24688)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-11 12:40:15 -07:00 |
|
Wentao Ye
|
a892b259b4
|
[Doc] Remove Useless Comments (#24687)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-11 12:25:47 -07:00 |
|
Peter Salas
|
127ded0a9e
|
[Ultravox] Use wrapped_model_config to instantiate inner model (#24679)
Signed-off-by: Peter Salas <peter@fixie.ai>
|
2025-09-11 18:52:24 +00:00 |
|
Isotr0py
|
bb2b5126da
|
[VLM] Migrate remain DP-supported ViT models to use disable_tp (#24363)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-11 18:30:41 +00:00 |
|
Harry Mellor
|
361ae27f8a
|
[Docs] Fix formatting of transcription doc (#24676)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-11 11:18:06 -07:00 |
|
co63oc
|
e26fef8397
|
fix some typos (#24616)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
|
2025-09-11 10:48:46 -07:00 |
|
Harry Mellor
|
c1eda615ba
|
Fix model name included in responses (#24663)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-11 10:47:51 -07:00 |
|
Konrad Zawora
|
4aa23892d6
|
[Bugfix] Fix platform-specific routing in CustomOp implementations (#24444)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
|
2025-09-11 17:15:01 +00:00 |
|
Ilya Markov
|
1fdd5c42d7
|
[Kernels] Enable Torch Symmetric Memory All-Reduce By Default (#24111)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-11 09:45:31 -07:00 |
|
Isotr0py
|
bcbe2a4d9e
|
[VLM] Optimize GLM4.5-V-style video processing to only decode necessary frames (#24161)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-11 09:44:34 -07:00 |
|
Harry Mellor
|
51d41265ad
|
[Docs] Fix typos in EP deployment doc (#24669)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-11 09:07:23 -07:00 |
|
Wentao Ye
|
4984a291d5
|
[Doc] Fix Markdown Pre-commit Error (#24670)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-11 09:05:59 -07:00 |
|
Nicolò Lucchesi
|
404c85ca72
|
[Docs] Add transcription support to model (#24664)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-11 07:39:01 -07:00 |
|
Jee Jee Li
|
817beef7f3
|
[Bugifx] Fix qwen-next packed_modules_mapping (#24656)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-11 22:26:17 +08:00 |
|
Mengqing Cao
|
4f6593b058
|
[HybridKVCache][Platform] Add support_hybrid_kv_cache for platform (#24646)
Signed-off-by: MengqingCao <cmq0113@163.com>
|
2025-09-11 21:47:58 +08:00 |
|
Boyuan Feng
|
94e6b2d55f
|
Allow users to specify kv cache memory size (#21489)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-11 13:41:07 +00:00 |
|
wang.yuqi
|
fd1ce98cdd
|
[CI] Split mteb test from Language Models Test (#24634)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-11 06:37:51 -07:00 |
|
Jee Jee Li
|
d11ec124a0
|
[Bench] Add qwen-next in benchmark_moe.py (#24661)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-11 21:29:43 +08:00 |
|
youkaichao
|
f510715882
|
[build] add torch to tool.uv no-build-isolation-package (#24303)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-11 13:19:44 +00:00 |
|
Tao He
|
f946197473
|
[Docs] Fixes a typo in the qwen3next model name. (#24654)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
|
2025-09-11 19:35:14 +08:00 |
|
Fanli Lin
|
0cd72a7b72
|
[XPU] add missing dependency tblib for XPU CI (#24639)
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
|
2025-09-11 11:22:33 +00:00 |
|
Harry Mellor
|
5f5271f1ee
|
Move LoRAConfig from config/__init__.py to config/lora.py (#24644)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-11 11:01:38 +00:00 |
|
Harry Mellor
|
d6249d0699
|
Fix typing for safetensors_load_strategy (#24641)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-11 10:41:39 +00:00 |
|
wang.yuqi
|
25bb9e8c65
|
[CI Failure] fix models/language/pooling/test_auto_prefix_cache_support.py (#24636)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-11 03:31:23 -07:00 |
|
Nicolò Lucchesi
|
a1213fae5f
|
[Misc] Add @NickLucche to codeowners (#24647)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-11 17:18:09 +08:00 |
|