Isotr0py
d593cf28fa
[Misc] Add removed encoder-decoder models to previously supported models list ( #24961 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-16 10:46:46 -07:00
lianyibo
faa7a5daac
[Bugfix] Fix unable to run encoder model when disable_hybrid_kv_cache_manager is true ( #24571 )
...
Signed-off-by: lianyibo <lianyibo1@kunlunit.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
2025-09-16 17:36:58 +00:00
Sage Moore
567939953b
[Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM ( #23693 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Co-authored-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-09-16 12:21:48 -04:00
Lukas Geiger
08369289af
[Core][MultiModalHasher] Don't convert memoryviews to bytes during hashing ( #24925 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-09-16 15:32:47 +00:00
Chih-Chieh Yang
73cfb3c5ee
[Model] Clean up and simplify Mamba2 Metadata Usage in both V0 and V1 ( #24331 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
2025-09-16 14:53:43 +00:00
Harry Mellor
0faf3cc3e8
Move SpeculativeConfig from config/__init__.py to config/speculative.py ( #24904 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-16 12:51:35 +01:00
Chen Bruce
7ea5c73ad7
[Feat][EPLB] A novel static EPLB placement strategy for MoE models. ( #23745 )
...
Signed-off-by: bruceszchen <bruceszchen@tencent.com >
Signed-off-by: Chen Bruce <bruceszchen@tencent.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Chen Bruce <cszwwdz@vip.qq.com >
Co-authored-by: lemon412 <lemon412@foxmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-16 10:55:16 +00:00
tomeras91
27fcfe7bcf
[Mamba] Support TP>1 with quantization for mamba2 mixer in case n_groups % tp_size == 0 ( #24593 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-16 10:51:01 +00:00
Cheng Kuan Yong Jason
68dbde5dbb
[Bugfix] remove duplicate tokens streamed in required tool choice streaming ( #23312 )
...
Signed-off-by: Jason Cheng <jasoncky96@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-09-16 15:16:32 +08:00
Jee Jee Li
04ad0dc275
[benchmark] Add triton version in the moe tuned config ( #24769 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-09-16 14:10:54 +08:00
Saman A. Pour
238c4c1705
[QWEN NEXT] Fused MoE kernels Optimization configs ( #24924 )
...
Signed-off-by: Saman Keon <samanamp@outlook.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-09-16 13:06:03 +08:00
vllmellm
8c54610265
[Bug] [Spec Dec]: Fix kv_cache dtype mismatch for Eagle3 drafter on FP8 target ( #24505 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-09-16 04:45:38 +00:00
cascade
17871983a2
[Bugfix] Fix sequence parallelism bug when enable pipeline parallelism ( #24021 )
...
Signed-off-by: cascade812 <cascade812@outlook.com >
2025-09-16 04:32:32 +00:00
Woosuk Kwon
759ef49b15
Remove V0 Encoder-Decoder Support ( #24907 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-15 21:17:14 -07:00
Kunshang Ji
5206ab20ba
[XPU] Fix circular import error. ( #24927 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-09-16 03:35:36 +00:00
Mark McLoughlin
2942970d44
[Metrics] Hide deprecated metrics with gpu_ prefix ( #24245 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-09-15 20:15:57 -06:00
Wentao Ye
b42566f440
[Bug] Fix is_flashmla_supported Check Error ( #24774 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-09-15 20:10:55 -06:00
Gregory Shtrasberg
2891603efd
[ROCm][Bugfix] Fix the case where there's bias ( #24895 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-09-15 20:05:12 -06:00
Wentao Ye
de2cc3d867
[Deprecation] Remove DeepGEMM Old Symbol Wrapper ( #24902 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-09-15 20:03:29 -06:00
Jiangyun Zhu
5bcc153d7b
[Compile] Fix noop_elimination pass and add tests for noop_elimination ( #24880 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-09-15 23:33:18 +00:00
Simon Mo
fd2f10546c
[ci] fix wheel names for arm wheels ( #24898 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-09-15 14:39:08 -07:00
Alexander Matveev
aae725af7c
[Performance] Remove redundant clone() calls in cutlass_mla ( #24891 )
2025-09-15 20:21:53 +00:00
Andrew Xia
73df49ef3a
[gpt-oss][1a] create_responses stream outputs BaseModel type, api server is SSE still ( #24759 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
2025-09-15 13:08:08 -07:00
Andrew Xia
25aba2b6a3
[gpt-oss] Add IncompleteDetails to ResponsesRepsonse ( #24561 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
2025-09-15 13:07:55 -07:00
Sage Moore
49bfc538e4
Update num_tokens_across_dp to use nccl instead of gloo ( #24105 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-09-15 19:05:48 +00:00
Kyle Sayers
a0b26701c9
[Transform] Deterministic Hadacore Transforms ( #24106 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2025-09-15 12:59:31 -06:00
Harry Mellor
c4afdb69cc
Move MultiModalConfig from config/__init__.py to config/multimodal.py ( #24659 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-15 17:43:16 +00:00
Rafael Marcelino Koike
b834b4cbf1
[USAGE] Improve error handling for weight initialization in Unquantized… ( #20321 )
...
Signed-off-by: Rafael Marcelino Koike <rafael.koike@oracle.com >
Signed-off-by: Rafael Koike <koike.rafael@gmail.com >
2025-09-15 16:45:49 +00:00
xiao-llm
01413e0cf5
Fp8 paged attention update ( #22222 )
...
Signed-off-by: Xiao Yu <xiao.yu@amd.com >
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com >
Co-authored-by: Xiao Yu <xiao.yu@metamaterial.com >
Co-authored-by: Xiao Yu <xiao.yu@amd.com >
Co-authored-by: Bowen Bao <bowenbao@amd.com >
2025-09-15 10:43:26 -04:00
Isotr0py
0e219cd50b
[Bugfix] Fix GLM4.1V multimodal processor with compatability for Transformers v4.56 ( #24822 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-15 20:45:06 +08:00
ant-yy
72c99f2a75
[Model]: support Ling2.0 ( #24627 )
...
Signed-off-by: vito.yy <vito.yy@antgroup.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-15 05:09:30 -07:00
Nicolò Lucchesi
2e41f5abca
[XPU] Set consistent default KV cache layout ( #24745 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-09-15 18:09:34 +08:00
Chao Lei
8de261b04a
[P/D]kv_output_aggregator support P TP > D TP ( #23917 )
...
Signed-off-by: LCAIZJ <leichao139636@163.com >
Co-authored-by: leichao.lc <leichao.lc@antgroup.com >
2025-09-15 11:36:06 +02:00
Ning Xie
59e17dd4a0
[Misc] rename interval to max_recent_requests ( #24229 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-09-15 09:18:42 +00:00
Didier Durand
4979eb79da
[Doc]: fix typos in various files ( #24821 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-09-15 01:08:52 -07:00
bingchen-mi
a8c0f59973
[Bugfix] MiDashengLM model contact error under concurrent testing ( #24738 )
...
Signed-off-by: chenbing8 <chenbing8@xiaomi.com >
Signed-off-by: bingchen-mi <chenbing8@xiaomi.com >
2025-09-15 06:38:12 +00:00
Ce Gao
f4a948f33f
[Frontend] Skip stop in reasoning content ( #14550 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-09-15 06:04:55 +00:00
Ning Xie
3f3313981c
[kv cache] update num_free_blocks in the end ( #24228 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-09-15 05:15:12 +00:00
Chen Zhang
8e5cdcda4e
[Hybrid Allocator] Support Pipeline Parallel ( #23974 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-09-14 15:55:17 -07:00
wuhang
90f3f7d73e
[Spec Decoding]Support Spec Decoding Metrics in DP Mode ( #24049 )
...
Signed-off-by: wuhang <wuhang6@huawei.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-09-14 21:11:09 +00:00
Robert Shaw
6dc8da5dc1
[Chore] Remove ipex_ops warning ( #24835 )
...
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-09-14 19:41:53 +00:00
Ye (Charlotte) Qi
ff68035932
[Benchmarks] Throw usage error when using dataset-name random and dataset-path together ( #24819 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-09-14 17:50:01 +00:00
co63oc
1177dd53e9
fix type of sampling rate for encode_base64 ( #24826 )
...
Signed-off-by: co63oc <co63oc@users.noreply.github.com >
2025-09-14 16:17:16 +00:00
Wentao Ye
fc2dbcda8b
[Perf] Fix DeepGEMM Contiguous Layout Issue, 5.5% Throughput Improvement ( #24783 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-09-14 11:20:17 -04:00
Hyogeun Oh (오효근)
fec347dee1
[Misc] Improve s3_utils type hints with BaseClient ( #24825 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com >
2025-09-14 12:11:14 +00:00
Wenlong Wang
cc3173ae98
[Multi Modal][Performance] Fused Q,K's apply_rope into one ( #24511 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-14 08:10:21 +00:00
Woosuk Kwon
3e903b6cb4
[Chore] Minor simplification for non-PP path ( #24810 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-13 17:41:36 -07:00
Victor Ziliang Peng
973c9d01da
[Minor] Simplify duplicative device check for cuda ( #24793 )
...
Signed-off-by: Ziliang Peng <ziliangdotme@gmail.com >
2025-09-13 18:28:38 +00:00
TaoYu Chen
15b8fef453
Remove redundant assignment in xfer_buffers, This is a little fix ( #24732 )
...
Signed-off-by: ChenTaoyu-SJTU <ctynb@qq.com >
2025-09-13 08:11:59 +00:00
Didier Durand
41ae4a1eab
[Doc]: fix typos in various files ( #24798 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-09-13 00:43:33 -07:00