Cyrus Leung
|
04cf435d95
|
[Bugfix] Fix wrong method name in Intern-S1 image processor (#22417)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-06 20:05:20 -07:00 |
|
Tao He
|
7377131a2c
|
[Qwen3] Enable dual-chunk-attention support for Qwen3 models. (#21924)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
|
2025-08-06 19:58:08 -07:00 |
|
Lucas Wilkinson
|
1dc8a70b6d
|
[Attention] Support multiple attention metadata builders per kv_cache_spec + proper local attention no hybrid kv cache fix (#21588)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-08-06 18:40:52 -07:00 |
|
tc-mb
|
41b67f4263
|
[model] Support MiniCPM-V 4.0 (#22166)
Co-authored-by: imning3 <hbning@pku.edu.cn>
|
2025-08-06 18:35:46 -07:00 |
|
Lain
|
9a3835aaa9
|
Fix trtllm-gen attention env and add attention sink (#22378)
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Signed-off-by: Lain <fusiyuan2000@hotmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-08-06 18:07:41 -07:00 |
|
Yongye Zhu
|
5c7cc33f4d
|
[gpt-oss] fix model config with hf_config (#22401)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-08-06 18:04:04 -07:00 |
|
Asaf Joseph Gardin
|
46a13949d5
|
[v1] - Mamba1 Attention Metadata (#21249)
Signed-off-by: asafg <asafg@ai21.com>
Co-authored-by: asafg <asafg@ai21.com>
|
2025-08-06 17:03:42 -07:00 |
|
Chen Zhang
|
a47e6ffe93
|
[GptOss] Add GptOss reasoning parser to support structure output (#22322)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-08-05 23:39:13 -07:00 |
|
Woosuk Kwon
|
de98252f49
|
Add GPT-OSS model code and config [1/N] (#22327)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-05 23:26:00 -07:00 |
|
Harry Mellor
|
796bae07c5
|
Update transformers to v4.55 (#21931)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-05 22:56:14 -07:00 |
|
Benji Beck
|
05fae02175
|
Migrate KimiVLImagePixelInputs to TensorSchema (#21769)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-08-05 02:36:18 -07:00 |
|
wang.yuqi
|
586f286789
|
[Model] Pooling model activation supports per request control by PoolingParams (#20538)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-05 00:37:00 -07:00 |
|
Yuxuan Zhang
|
6fa41e0c32
|
self.gate dtype update for GLM-4.5 (#22203)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-08-04 19:12:38 -07:00 |
|
Raghav Ravishankar
|
a5fff3bd49
|
Fix Arcee model weight loading: Add custom load_weights (#21725)
Signed-off-by: alyosha-swamy <raghav@arcee.ai>
|
2025-08-04 04:09:56 -07:00 |
|
Jee Jee Li
|
a7b8788d2c
|
[Misc] Modify the organization of GLM series (#22171)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-03 23:51:20 -07:00 |
|
Chenxi Yang
|
e5949e5ae0
|
Remove index_put from MM embeddings merging (#22105)
Co-authored-by: Chenxi Yang <cxyang@meta.com>
|
2025-08-03 22:15:14 -07:00 |
|
Yuxuan Zhang
|
d3c18c9cb0
|
fuse fp32 for GLM-4.5 e_score_correction_bias (#22143)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2025-08-03 09:04:54 -07:00 |
|
Isotr0py
|
3dddbf1f25
|
[Misc] Add tensor schema test coverage for multimodal models (#21754)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-08-03 00:52:14 -07:00 |
|
jiahanc
|
337eb23bcc
|
[Fix] Fix llama4 modelopt weight loading error (#22107)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-08-03 00:50:34 -07:00 |
|
Chih-Chieh Yang
|
b690e34824
|
[Model] Mamba2 preallocate SSM output tensor to avoid d2d copy overhead (#21075)
Signed-off-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com>
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
|
2025-08-02 01:59:34 -07:00 |
|
Yuxuan Zhang
|
25373b6c6c
|
for glm-4.1V update (#22000)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-08-02 01:46:57 -07:00 |
|
vllmellm
|
d3a6f2120b
|
[FEAT][ROCm] Enable running Flash Attention as ViT attn backend for Qwen-VL models on ROCm platform. (#22069)
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: tjtanaavllm <tunjian.tan@amd.com>
|
2025-08-01 23:53:18 -07:00 |
|
Dipika Sikka
|
9f9c38c392
|
[Speculators][Speculative Decoding] Add Qwen Eagle3 Support (#21835)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
|
2025-08-01 19:43:37 -07:00 |
|
vllmellm
|
ee2eb6ecd8
|
[Model] Qwen2.5 VL SiLU-and-Mul (#22066)
Signed-off-by: kf <kuanfu.liu@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: kf <kuanfu.liu@embeddedllm.com>
|
2025-08-01 19:34:37 -07:00 |
|
Harry Mellor
|
38c8bce8b6
|
Enable headless models for pooling in the Transformers backend (#21767)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-01 10:31:29 -07:00 |
|
Isotr0py
|
3f8e952179
|
[Bugfix] Fix glm4.1v video inference issue (#22067)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-08-01 09:33:30 -07:00 |
|
Dipika Sikka
|
dfbc1f8880
|
[Speculative Decoding] Add speculators config support (#21345)
|
2025-08-01 08:25:18 -04:00 |
|
Kyle Sayers
|
0f46a780d4
|
[Model] [Quantization] Support quantization for Gemma3n (#21974)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-07-31 22:45:15 -07:00 |
|
Cyrus Leung
|
82de9b9d46
|
[Misc] Automatically resolve HF processor init kwargs (#22005)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-31 22:44:10 -07:00 |
|
Benjamin Chislett
|
2dff2e21d9
|
[Bugfix] Fix MTP weight loading (#21941)
|
2025-07-31 16:33:53 -04:00 |
|
zhiweiz
|
9e0726e5bf
|
[Meta] Official Eagle mm support, first enablement on llama4 (#20788)
Signed-off-by: morgendave <morgendave@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-07-31 10:35:07 -07:00 |
|
Song
|
9484641616
|
[Model] Add step3 vl (#21998)
Signed-off-by: oliveryuan <yuansong@step.ai>
Co-authored-by: oliveryuan <yuansong@step.ai>
|
2025-07-31 23:19:06 +08:00 |
|
wang.yuqi
|
2836dd73f1
|
[Model][CI] Let more pooling models support v1 (#21747)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-31 01:51:15 -07:00 |
|
Sanchit Gandhi
|
ec02e536df
|
[Bugfix] Relax lang pin for voxtral (#21833)
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-07-30 20:38:52 -07:00 |
|
Cyrus Leung
|
004203e953
|
[CI/Build] Fix registry tests (#21934)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-30 09:10:41 -07:00 |
|
Yong Hoon Shin
|
ad510309ee
|
Override attention metadata for fast prefill in some KV sharing setups (#21590)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-30 08:54:15 -07:00 |
|
Isotr0py
|
6e599eebe8
|
[Bugfix] Fix OOM tests in initialization test (#21921)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-30 07:35:47 -07:00 |
|
Po-Han Huang (NVIDIA)
|
ff08e51940
|
[NVIDIA] Fix Llama4 Scout FP4 functionality issues (#21499)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
|
2025-07-30 07:33:40 -07:00 |
|
aladerran
|
d979dd6beb
|
[Feature][EPLB] Add eplb support for Qwen3 (#20815)
Signed-off-by: aladerran <aladerran@gmail.com>
|
2025-07-30 06:27:57 -07:00 |
|
Jee Jee Li
|
fc91da5499
|
[Model] Remove DSV2 unused code (#21903)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-30 00:55:03 -07:00 |
|
Cyrus Leung
|
2ca5f82c2a
|
[Misc] Remove redundant config definitions (#21891)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-29 23:54:18 -07:00 |
|
Areeb Syed
|
fdde18229e
|
[Bugfix] Fix shape mismatch assertion error when loading Gemma3n model with BitsAndBytes quantization (#21808)
Signed-off-by: sydarb <areebsyed237@gmail.com>
|
2025-07-30 11:35:21 +08:00 |
|
Yong Hoon Shin
|
9266d98048
|
[BugFix] Fix interleaved sliding window not set for Gemma3n (#21863)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-29 16:34:19 -07:00 |
|
Jee Jee Li
|
61a6905ab0
|
[Model] Refactor JambaForCausalLM (#21394)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-29 18:25:07 +08:00 |
|
Isotr0py
|
a4528f0cac
|
[Model]: Fused MoE for nomic-embed-text-v2-moe (#18321)
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-29 03:13:27 -07:00 |
|
Benji Beck
|
f1e2c095ec
|
Migrate InternVLImageInputs and InternVLVideoInputs to TensorSchema (#21684)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-07-28 22:09:45 -07:00 |
|
Cyrus Leung
|
e17a4d3bf9
|
[Bugfix] Fix granite speech shape validation (#21762)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-28 14:19:21 -04:00 |
|
Anton Vlasjuk
|
656c24f1b5
|
[Ernie 4.5] Name Change for Base 0.3B Model (#21735)
Signed-off-by: vasqu <antonprogamer@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-28 12:22:32 +00:00 |
|
Isotr0py
|
0ae970ed15
|
[Bugfix] Fix glm4.1v video_grid_thw tensor shape scheme (#21744)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-28 04:26:49 -07:00 |
|
Jee Jee Li
|
1b769dccf3
|
[Bugfix] Fix Ernie4_5_MoeForCausalLM shared experts (#21717)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-28 11:02:25 +00:00 |
|