Woosuk Kwon
136825de75
[Misc] Enhance code formatting in mxfp4.py ( #22423 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-07 00:26:24 -07:00
JaceyShao
c2dba2dba8
Add H20-3e fused MoE kernel tuning configs for GLM-4.5 ( #22433 )
...
Signed-off-by: shaojunqi <shaojunqi.sjq@alibaba-inc.com >
Co-authored-by: shaojunqi <shaojunqi.sjq@alibaba-inc.com >
2025-08-07 00:24:47 -07:00
Ming Yang
82216dc21f
[Misc] Support routing logic simulation ( #21990 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-06 23:06:20 -07:00
vllmellm
cbc8457b26
[Model] Switch to Fused RMS norm in Qwen2.5_VL model. ( #22184 )
...
Signed-off-by: kf <kuanfu.liu@embeddedllm.com >
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com >
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: kf <kuanfu.liu@embeddedllm.com >
2025-08-06 23:05:24 -07:00
WeiQing Chen
4be02a3776
[Bugfix] EPLB load statistics problem ( #22167 )
...
Signed-off-by: ycyaw66 <497410282@qq.com >
Signed-off-by: David Chen <530634352@qq.com >
Co-authored-by: ycyaw66 <497410282@qq.com >
2025-08-07 04:07:54 +00:00
Syed Muhammad Bin Asif
609b533cb6
[Bugfix] Add proper comparison for package versions ( #22314 )
...
Signed-off-by: Syed Muhammad Bin Asif <syedmba7@connect.hku.hk >
2025-08-06 20:31:03 -07:00
Cyrus Leung
04cf435d95
[Bugfix] Fix wrong method name in Intern-S1 image processor ( #22417 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-08-06 20:05:20 -07:00
Tao He
7377131a2c
[Qwen3] Enable dual-chunk-attention support for Qwen3 models. ( #21924 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
2025-08-06 19:58:08 -07:00
Lucas Wilkinson
1dc8a70b6d
[Attention] Support multiple attention metadata builders per kv_cache_spec + proper local attention no hybrid kv cache fix ( #21588 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-08-06 18:40:52 -07:00
tc-mb
41b67f4263
[model] Support MiniCPM-V 4.0 ( #22166 )
...
Co-authored-by: imning3 <hbning@pku.edu.cn >
2025-08-06 18:35:46 -07:00
Lain
9a3835aaa9
Fix trtllm-gen attention env and add attention sink ( #22378 )
...
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com >
Signed-off-by: Lain <fusiyuan2000@hotmail.com >
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com >
2025-08-06 18:07:41 -07:00
Yongye Zhu
5c7cc33f4d
[gpt-oss] fix model config with hf_config ( #22401 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
2025-08-06 18:04:04 -07:00
Wentao Ye
eec890c1c1
[Bug] Fix B200 DeepGEMM E8M0 Accuracy Issue ( #22399 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-08-06 17:03:53 -07:00
Asaf Joseph Gardin
46a13949d5
[v1] - Mamba1 Attention Metadata ( #21249 )
...
Signed-off-by: asafg <asafg@ai21.com >
Co-authored-by: asafg <asafg@ai21.com >
2025-08-06 17:03:42 -07:00
Yongye Zhu
31f09c615f
[gpt-oss] flashinfer mxfp4 ( #22339 )
...
Signed-off-by: simon-mo <xmo@berkeley.edu >
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
2025-08-06 12:37:27 -07:00
Chen Zhang
a47e6ffe93
[GptOss] Add GptOss reasoning parser to support structure output ( #22322 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com >
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com >
2025-08-05 23:39:13 -07:00
Woosuk Kwon
de98252f49
Add GPT-OSS model code and config [1/N] ( #22327 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-05 23:26:00 -07:00
Harry Mellor
796bae07c5
Update transformers to v4.55 ( #21931 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-05 22:56:14 -07:00
Jee Jee Li
8e6c7e873f
[Bugfix] Fix MoE BNB version ( #22260 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-08-05 19:56:22 -07:00
Benji Beck
05fae02175
Migrate KimiVLImagePixelInputs to TensorSchema ( #21769 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-08-05 02:36:18 -07:00
wang.yuqi
586f286789
[Model] Pooling model activation supports per request control by PoolingParams ( #20538 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-08-05 00:37:00 -07:00
ZiTian.Zhao
4b3e4474d7
Optimize configuration access with LRU cache in custom ops ( #22204 )
...
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com >
2025-08-04 21:43:24 -07:00
Wentao Ye
d7b28f3415
[Log] DeepGEMM Update Log for Unaligned Problem Size ( #22208 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-08-04 19:13:19 -07:00
Yuxuan Zhang
6fa41e0c32
self.gate dtype update for GLM-4.5 ( #22203 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
2025-08-04 19:12:38 -07:00
TJian
6ad6b8e115
[FEAT] Refactor ROPE into module ( #22192 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-08-04 19:12:16 -07:00
Po-Han Huang (NVIDIA)
bdcb42e45d
[NVIDIA] Auto detect modelopt quant and fix DSR1-FP4 weight loading ( #22073 )
2025-08-04 21:02:55 -04:00
Raghav Ravishankar
a5fff3bd49
Fix Arcee model weight loading: Add custom load_weights ( #21725 )
...
Signed-off-by: alyosha-swamy <raghav@arcee.ai >
2025-08-04 04:09:56 -07:00
Weixiao Huang
c1b4eb048a
[feat] move WEIGHT_SCALE_SUPPORTED into raise block to accelerate RLHF weight loading ( #21164 )
...
Signed-off-by: huangweixiao <huangweixiao@msh.team >
2025-08-04 15:43:06 +08:00
Jee Jee Li
a7b8788d2c
[Misc] Modify the organization of GLM series ( #22171 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-08-03 23:51:20 -07:00
Chenxi Yang
e5949e5ae0
Remove index_put from MM embeddings merging ( #22105 )
...
Co-authored-by: Chenxi Yang <cxyang@meta.com >
2025-08-03 22:15:14 -07:00
Yuxuan Zhang
d3c18c9cb0
fuse fp32 for GLM-4.5 e_score_correction_bias ( #22143 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
2025-08-03 09:04:54 -07:00
Li, Jiang
b5dfb94fa0
[CI/Build][Bugfix] Fix Qwen2.5 tests in CPU CI via fallback silu_and_mul to torch native implementation ( #22145 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-08-03 05:34:04 -07:00
Isotr0py
3dddbf1f25
[Misc] Add tensor schema test coverage for multimodal models ( #21754 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-08-03 00:52:14 -07:00
jiahanc
337eb23bcc
[Fix] Fix llama4 modelopt weight loading error ( #22107 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-08-03 00:50:34 -07:00
Yan Ma
73e1b9b1d4
[xpu]support moe models on XPU platform ( #21643 )
...
Signed-off-by: yan <yan.ma@intel.com >
Signed-off-by: Yan Ma <yan.ma@intel.com >
2025-08-02 07:49:08 -07:00
Chih-Chieh Yang
b690e34824
[Model] Mamba2 preallocate SSM output tensor to avoid d2d copy overhead ( #21075 )
...
Signed-off-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com >
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
2025-08-02 01:59:34 -07:00
Yuxuan Zhang
25373b6c6c
for glm-4.1V update ( #22000 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-08-02 01:46:57 -07:00
Chih-Chieh Yang
c64861d63c
[Bugfix] Mamba2 remove bugged initial state condition in chunk scan ( #22034 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
2025-08-01 23:55:57 -07:00
vllmellm
d3a6f2120b
[FEAT][ROCm] Enable running Flash Attention as ViT attn backend for Qwen-VL models on ROCm platform. ( #22069 )
...
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com >
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: tjtanaavllm <tunjian.tan@amd.com >
2025-08-01 23:53:18 -07:00
Dipika Sikka
9f9c38c392
[Speculators][Speculative Decoding] Add Qwen Eagle3 Support ( #21835 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
2025-08-01 19:43:37 -07:00
Varun Sundar Rabindranath
a65f46be5e
[Misc] DeepGemmExperts : Avoid JIT generation in the hot-path ( #21955 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-08-01 19:42:03 -07:00
vllmellm
ee2eb6ecd8
[Model] Qwen2.5 VL SiLU-and-Mul ( #22066 )
...
Signed-off-by: kf <kuanfu.liu@embeddedllm.com >
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: kf <kuanfu.liu@embeddedllm.com >
2025-08-01 19:34:37 -07:00
JartX
3654847db5
feat: Add Support GPTQ Quantization MOE on ROCM vllm serve ( #21733 )
2025-08-01 21:12:19 -04:00
Harry Mellor
38c8bce8b6
Enable headless models for pooling in the Transformers backend ( #21767 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-08-01 10:31:29 -07:00
Varun Sundar Rabindranath
ac45c44d98
[Bugfix] [Performance] DeepEPHighThroughput + DeepSeek : Quant before Dispatch ( #21837 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-08-01 10:14:38 -07:00
Isotr0py
3f8e952179
[Bugfix] Fix glm4.1v video inference issue ( #22067 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-08-01 09:33:30 -07:00
Dipika Sikka
dfbc1f8880
[Speculative Decoding] Add speculators config support ( #21345 )
2025-08-01 08:25:18 -04:00
Harry Mellor
87c94bc879
Revert "Update sampling_metadata.py ( #21937 )" ( #22088 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-08-01 05:24:46 -07:00
Jee Jee Li
28b18cc741
[Quantization] Enable BNB support for InternS1 ( #21953 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-08-01 11:09:54 +00:00
Aviad Rossmann
53d7c39271
Update sampling_metadata.py ( #21937 )
...
Signed-off-by: Aviad Rossmann <aviadr@neureality.ai >
2025-07-31 23:23:18 -07:00