jiahanc
|
34553b9d27
|
[Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next (#27492)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
|
2025-11-10 12:34:57 -05:00 |
|
Cyrus Leung
|
d0e186c16f
|
[V0 Deprecation] Remove unused context_len and seq_len from M-RoPE (#28395)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-11 00:30:06 +08:00 |
|
vllmellm
|
f080a83511
|
[RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops (#24490)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-10 08:20:53 -08:00 |
|
Ferrebo
|
912744d066
|
[Fix] optimize visual token mask with caching and multi-token support (#28374)
Signed-off-by: Ferrebo <itachi971009@gmail.com>
Signed-off-by: kebo01 <kebo01@baidu.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-10 13:23:49 +00:00 |
|
Yu Jiaqi
|
15be507c86
|
[bugfix] fix siglip batch text output error (#28365)
Signed-off-by: piood <2477084691@qq.com>
|
2025-11-10 21:21:15 +08:00 |
|
Jiangyun Zhu
|
c4768dcf47
|
[Kernel] Fix fused_gdn_gating (#28343)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-11-09 14:26:35 -07:00 |
|
Jiangyun Zhu
|
7ae5a5fb11
|
[Misc] Add some comments in qwen3-next (#28267)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-11-08 23:59:24 -08:00 |
|
Mohammad Miadh Angkad
|
404d7a9d14
|
[Performance][gpt-oss] Revert gpt-oss max cudagraph size to 1024 (#28345)
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>
|
2025-11-08 15:50:10 -07:00 |
|
Isotr0py
|
934a9c3b79
|
[Model] Consolidate Deepseek-MoE implementation with DeepSeek-v2 (#28101)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-08 05:01:27 +00:00 |
|
Lukas Geiger
|
e0919f331d
|
[Core][MM] Add mechanism to configure multimodal fields which should stay on CPU (#28168)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-11-07 12:14:29 +00:00 |
|
Kevin H. Luu
|
8e19d470af
|
[fix] Revert "fixing mm placeholder replacement issue with gemma3" (#28285)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2025-11-07 12:09:09 +00:00 |
|
Mengqing Cao
|
1958bda9b4
|
[Misc][Model][Refactor] Pass the prefix into Linear layers (#28259)
Signed-off-by: MengqingCao <cmq0113@163.com>
|
2025-11-07 19:38:38 +08:00 |
|
Harry Mellor
|
c0a4b95d64
|
Fix issues from #28242 (#28257)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-07 04:23:17 +00:00 |
|
Lucas Kabela
|
4bf56c79cc
|
[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile (#28242)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2025-11-07 00:16:03 +00:00 |
|
Julien Denize
|
7a8375f8a0
|
Add llama 4 scaling support (#28145)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
|
2025-11-06 18:55:17 +00:00 |
|
Seungduk Kim
|
201dc98acc
|
Fix hard-coded parameter name in gemma3n.py (#27946)
Signed-off-by: Seungduk Kim <seungduk.kim@yanolja.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2025-11-05 23:07:36 -08:00 |
|
Isotr0py
|
43ecd0a900
|
[Chore] Clean up deepseek v2/v3 config copy (#28055)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-06 03:46:30 +00:00 |
|
Vadim Gimpelson
|
b6a248bdd7
|
[PERF] Decouple projections from GDN custom op. Attempt 2 (#28083)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-11-05 17:01:12 -08:00 |
|
wang.yuqi
|
802748bddb
|
[Bugfix] Fix Qwen3-Reranker-8B load (#28117)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-11-05 18:33:50 +00:00 |
|
Chen Zhang
|
c765f0b443
|
[FlashInfer] Avoid FlashInfer block_size 16 + head_size 256 on blackwell (#27994)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-11-05 09:25:32 -08:00 |
|
Jiangyun Zhu
|
c18f88c6ca
|
[Kernel] Fuse computation of g and beta for Gated Delta Net (#28095)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-11-05 09:14:55 -08:00 |
|
Isotr0py
|
3f5a4b6473
|
[Bugfix] Validate custom logits processor xargs for online serving (#27560)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-05 16:53:33 +00:00 |
|
Ilya Markov
|
e50c454672
|
[BugFix] Support EP/DP + EPLB with MTP (#25311)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2025-11-05 15:22:17 +00:00 |
|
Alex Brooks
|
b7cbc25416
|
[Model, Core] Support Granite Speech & LoRA for STT (#24455)
|
2025-11-05 08:33:48 +01:00 |
|
Isotr0py
|
0ff05e3770
|
[Bugfix] Fix encoder-only model support for transformers backend (#28021)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-04 22:24:41 -08:00 |
|
wangxiyuan
|
428bc7bf1c
|
[V0 deprecation] Remove VLLM_USE_V1 usage in most modules (#27955)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-11-04 20:51:16 -08:00 |
|
Kunshang Ji
|
18b39828d9
|
[XPU] Add gpt-oss model support for Intel GPU (#27786)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-05 02:17:23 +00:00 |
|
Vadim Gimpelson
|
d4e547bb7e
|
Revert "[PERF] Decouple projections from GDN custom op" (#28080)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-11-04 15:58:23 -08:00 |
|
Aleksandr Malyshev
|
2d977a7a9e
|
[ROCm] gemm_a16w16 upstreaming (#26969)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2025-11-04 16:01:00 -05:00 |
|
yt0428
|
05cae69f0f
|
[model] Add support for openPangu_Ultra_MoE (#27521)
Signed-off-by: yuantao <2422264527@qq.com>
Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-04 08:17:20 -08:00 |
|
Vadim Gimpelson
|
5fd8f02ea9
|
[PERF] Decouple projections from GDN custom op (#27512)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-11-04 08:11:41 -08:00 |
|
tomeras91
|
77f8001f53
|
[Model][Bugfix] fix pipeline parallelism support for NemotronH (#27968)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-11-04 12:28:36 +00:00 |
|
vllmellm
|
b13a447546
|
[Bugfix][ROCm] Fix ViT rotary embeddings for torch.compile compatibility on ROCm (#27748)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-03 17:12:19 -08:00 |
|
Lucas Kabela
|
55011aef24
|
[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile (#27764)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2025-11-03 11:12:15 -08:00 |
|
zhang-prog
|
40b69e33e7
|
[Model] Add PaddleOCR-VL Model Support (#27758)
Signed-off-by: zhangyue <zhangyue66@baidu.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: zhangyue66 <zhangyue66@baidu.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-03 19:04:22 +08:00 |
|
Asaf Joseph Gardin
|
00b31a36a2
|
[V1] [Hybrid] Mamba1 Automatic Prefix Caching (#26377)
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
|
2025-11-02 04:16:23 -08:00 |
|
Cyrus Leung
|
853a8eb53b
|
[Bugfix] Fix Qwen Omni audio inference (#27920)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-02 05:06:05 +00:00 |
|
TJian
|
e2347dbf58
|
[Bugfix] [Model] Missing MRoPE function definition from KeyeForConditionalGeneration (#27895)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-11-01 13:45:23 +08:00 |
|
Cyrus Leung
|
879a06579e
|
[CI/Build] Bump transformers version (#27528)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-31 22:11:07 -07:00 |
|
Yan Ma
|
7e2729b57e
|
[Multimodal][XPU]Enable vision attn backend for xpu platform (#27525)
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yejing Lai <yejing.lai@intel.com>
Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-11-01 04:45:02 +00:00 |
|
ZiTian Zhao
|
bc306fe5e9
|
fix incorrect type annotation in KimiMLP (#27885)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
|
2025-10-31 17:38:02 +00:00 |
|
Isotr0py
|
7e06c40e63
|
[Bugfix] Fix broken MRoPE for GLM-4.1V/GLM-4.5V (#27860)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-31 17:04:51 +00:00 |
|
toncao
|
e5ef4dfc11
|
[Kimi-Linear] Correct prefixes and add compatibility to AWQ quants (#27834)
Signed-off-by: toncao <cpatonn@gmail.com>
Co-authored-by: toncao <cpatonn@gmail.com>
|
2025-10-31 17:36:37 +08:00 |
|
Tyler Michael Smith
|
ab98f6556f
|
[Bugfix] Fix 2 precommit issues - (mamba_block_size, kv_cache_config) (#27811)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-10-30 11:52:18 -07:00 |
|
Mengqing Cao
|
1004205795
|
[MTP] Refactor mtp predictor to avoid d2h operation (#27643)
Signed-off-by: MengqingCao <cmq0113@163.com>
|
2025-10-30 17:27:39 +00:00 |
|
Fan Yin
|
9956aae4ea
|
[Model][Ouro] Support Ouro Model (#27794)
Signed-off-by: yinfan.1024 <yinfan.1024@bytedance.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-30 22:34:41 +08:00 |
|
Zhiyuan Li
|
4e68cc9b6a
|
[Model] Introduce Kimi Linear to vLLM (#27809)
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn>
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>
|
2025-10-30 21:02:27 +08:00 |
|
wang.yuqi
|
4464723f22
|
[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. (#25524)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-10-30 12:13:05 +00:00 |
|
Zhewen Li
|
e806178d2a
|
[BugFix][VL] Fix FA selection on Qwen2.5-VL (#27790)
Signed-off-by: zhewenli <zhewenli@meta.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-30 07:54:44 +00:00 |
|
Chenheli Hua
|
48eb8eba58
|
[Temp fix] Disable torch.compile for Qwen2.5 VL's VisionBlock temporarily. (#27760)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-29 23:17:48 +00:00 |
|