Mert Unsal
|
c625d7b1c6
|
[Bugfix] Fix O(n²) multimodal string prompt processing (#29667)
Signed-off-by: mertunsall <mertunsal1905@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-28 16:10:39 -08:00 |
|
Zhengxu Chen
|
6173682b6e
|
[compile] Include enable_sleep_mode into caching factors. (#29696)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-11-29 07:58:38 +08:00 |
|
Augusto Yao
|
9726e64530
|
bugfix: correct attn output with base 2 or e (#28840)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
|
2025-11-29 07:52:12 +08:00 |
|
Huamin Li
|
3fd1fb0b60
|
Revert "[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971)" (#29697)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-11-28 15:26:52 -08:00 |
|
Jiangyun Zhu
|
a51f4186f2
|
[Bugfix] fix dots.llm1.inst (#29687)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-28 15:25:26 -08:00 |
|
Cyrus Leung
|
7675ba30de
|
[Misc] Remove redundant ClassRegistry (#29681)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-28 15:24:47 -08:00 |
|
Ralf Gommers
|
7c1ed45848
|
[CI/Build]: make it possible to build with a free-threaded interpreter (#29241)
Signed-off-by: Ralf Gommers <ralf.gommers@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-28 15:21:46 -08:00 |
|
Benjamin Chislett
|
1986de1375
|
[Perf] Optimize EAGLE prepare_inputs_padded with triton kernels (#28597)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2025-11-28 22:25:05 +00:00 |
|
Yanan Cao
|
3461e7efd8
|
[Frontend] Remap -O to -cc commandline flag (#29557)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2025-11-28 21:51:12 +00:00 |
|
Harry Mellor
|
fecae12cd7
|
Remove all_special_tokens_extended from tokenizer code (#29686)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-28 20:26:51 +00:00 |
|
Cyrus Leung
|
8d9338fae4
|
[Chore] Rename Processor to InputProcessor (#29682)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-28 09:35:41 -08:00 |
|
Isotr0py
|
d40c854009
|
[CI/Build] Rework CPU multimodal processor test (#29684)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-28 17:10:29 +00:00 |
|
Harry Mellor
|
4332955602
|
[Docs] Add CLI reference doc for vllm bench sweep plot_pareto (#29689)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-28 08:10:08 -09:00 |
|
Isotr0py
|
f946a8d743
|
[Chore]: Reorganize model repo operating functions in transformers_utils (#29680)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-28 08:46:51 -08:00 |
|
Isotr0py
|
6f9d81d03b
|
[V0 deprecation] Clean up legacy paged attention helper functions (#28043)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-28 16:44:33 +00:00 |
|
Didier Durand
|
fae6943068
|
[Doc]: fixing typos in multiple files. (#29685)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-11-28 08:41:41 -08:00 |
|
果冻虾仁
|
3bcbb30cbf
|
add add_truncate_prompt_tokens in repr for PoolingParams (#29683)
|
2025-11-28 08:41:05 -08:00 |
|
Cyrus Leung
|
9e6bcda3ac
|
[mypy] Enable type checking for more directories (#29674)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-28 08:39:27 -08:00 |
|
Harry Mellor
|
9eec282cb5
|
Guard FlashInfer sampler using the same check as FlashInfer attention backend (#29415)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-28 08:34:48 -08:00 |
|
Cyrus Leung
|
0808eb813b
|
[Misc] Remove yapf directives (#29675)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-28 15:07:23 +00:00 |
|
Mingyuan Ma
|
460d8bbf2d
|
Remove upstream fa checks (#29471)
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-28 05:52:42 -08:00 |
|
Li, Jiang
|
e2f56c309d
|
[CPU] Update torch 2.9.1 for CPU backend (#29664)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-11-28 13:37:54 +00:00 |
|
HappyAmazonian
|
f8151b66fa
|
Revert "Supress verbose logs from model_hosting_container_standards (… (#29335)
Signed-off-by: Shen Teng <sheteng@amazon.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-28 05:29:05 -08:00 |
|
Cyrus Leung
|
1168768a2d
|
[Optimization] Early return for _apply_matches and _iter_placeholders (#29668)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-28 13:26:47 +00:00 |
|
Nick Hill
|
8e7a891602
|
[BugFix] Fix spec decoding max_tokens scheduling perf issue (#29542)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-28 20:52:23 +08:00 |
|
Cyrus Leung
|
953d9c820b
|
[mypy] Pass type checking for vllm/utils and vllm/v1/pool (#29666)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-28 20:40:47 +08:00 |
|
Cyrus Leung
|
33b06a6f24
|
[Misc] Remove redundant attention var constants (#29650)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-28 04:35:19 -08:00 |
|
Wilson Wu
|
5c2b5cb422
|
[Docs] Add SPLADE and Ultravox models to supported models documentation (#29659)
Signed-off-by: Wilson Wu <iwilsonwu@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-11-28 01:29:28 -09:00 |
|
杰兮
|
3cb32e5d6e
|
[Rocm] Set VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS default is disabled (#28985)
Signed-off-by: zhyajie <yajizhan@amd.com>
Co-authored-by: zhyajie <yajizhan@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2025-11-28 02:08:42 -08:00 |
|
Cyrus Leung
|
ccbdf51bd5
|
[Doc] Reorganize benchmark docs (#29658)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-28 17:19:25 +08:00 |
|
Filipp Fisin
|
5f5521bd5d
|
Fix parameter order in GPT-OSS weight loading function for non-MXFP4 weights (#29506)
Signed-off-by: Filipp Fisin <48059208+qGentry@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-28 00:45:10 -08:00 |
|
Julien Denize
|
b2c1d294fa
|
[BUGFIX] MistralTokenizer._call__ adds an invalid EOS token (#29607)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-28 16:44:47 +08:00 |
|
maang-h
|
cc0f2a0e19
|
[Doc] Improve abnormal information string (#29655)
Signed-off-by: maang <maang_h@163.com>
|
2025-11-28 00:12:20 -08:00 |
|
rongfu.leng
|
480598958e
|
[Feature][Bench] Add pareto visualization (#29477)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-11-27 23:53:20 -08:00 |
|
Cyrus Leung
|
b34e8775a3
|
Revert "[CPU]Update CPU PyTorch to 2.9.0 (#29589)" (#29647)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-27 22:43:18 -08:00 |
|
wang.yuqi
|
f4b76056ee
|
Improve enable chunked_prefill & prefix_caching logic. (#26623)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-27 22:05:48 -08:00 |
|
EanWang211123
|
37b15e97e8
|
[Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qwen3vl (#29594)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: EanWang211123 <wangyiheng@sangfor.com.cn>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-11-27 22:05:45 -08:00 |
|
maang-h
|
c7ba1f6bc7
|
[BugFix] Fix ValueError in NewRequestData repr methods (#29392)
Signed-off-by: maang <maang_h@163.com>
|
2025-11-28 13:42:30 +08:00 |
|
Wilson Wu
|
18523b87f6
|
[Docs] Update supported models for Olmo 3 in tool calling documentation (#29411)
Signed-off-by: Wilson Wu <iwilsonwu@gmail.com>
|
2025-11-28 02:53:55 +00:00 |
|
Xin Yang
|
745a3bae1a
|
[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-28 10:48:28 +08:00 |
|
scydas
|
35657bcd7a
|
[CPU]Update CPU PyTorch to 2.9.0 (#29589)
Signed-off-by: scyda <scyda@outlook.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2025-11-28 09:34:33 +08:00 |
|
Lucas Wilkinson
|
be493e0b3c
|
[BugFix] Fix new nightly failures (#29578)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-27 13:45:38 -08:00 |
|
Woosuk Kwon
|
ae0ce1be27
|
[Model Runner V2][BugFix] Keep reference to GPU tensors in AsyncOutput (#29623)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-27 12:38:53 -08:00 |
|
Andrii Skliar
|
a5345bf49d
|
[BugFix] Fix plan API Mismatch when using latest FlashInfer (#29426)
Signed-off-by: Andrii Skliar <askliar@askliar-mlt.client.nvidia.com>
Co-authored-by: Andrii Skliar <askliar@askliar-mlt.client.nvidia.com>
|
2025-11-27 11:34:59 -08:00 |
|
Nicolò Lucchesi
|
e5a621b724
|
[CI] Add batched audios Whisper test (#29308)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-27 19:31:52 +00:00 |
|
Isotr0py
|
38658ec6f3
|
[Bugfix][MM encoder] Fix ViT attention backend resolving for Turing GPU (#29614)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-27 19:17:37 +00:00 |
|
Cyrus Leung
|
a24ea5414b
|
[Deprecation] Advance deprecation status (#29617)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-27 19:04:58 +00:00 |
|
Cyrus Leung
|
ea228b4491
|
[Misc] Remove unused code from protocol.py (#29616)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-27 18:39:59 +00:00 |
|
果冻虾仁
|
d45269b378
|
add skip_reading_prefix_cache in repr for PoolingParams (#29620)
|
2025-11-27 09:21:00 -08:00 |
|
Cyrus Leung
|
ee9841daa9
|
[Bugfix] Fix doc build on main (#29619)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-27 09:08:08 -08:00 |
|