Jinzhen Lin
|
1656ad3704
|
[Kernel][Quantization] add w4a8 support for marlin kernel (#24722)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
|
2025-11-29 07:19:33 -08:00 |
|
Cyrus Leung
|
fa59fe417f
|
[Chore] Move detokenizer_utils to vllm/tokenizers (#29727)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-29 06:25:17 -08:00 |
|
Cyrus Leung
|
fe3398fab2
|
[Chore] Enable passing tokenizer=None into MM processor (#29724)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-29 06:25:10 -08:00 |
|
Chukwuma Nwaugha
|
ad7f714d62
|
hfrunner.classify should return list[list[float]] not list[str] (#29671)
Signed-off-by: Chukwuma Nwaugha <nwaughac@gmail.com>
|
2025-11-29 13:57:00 +00:00 |
|
Cyrus Leung
|
34a984274e
|
[Misc] Refactor tokenizer interface (#29693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-29 04:02:21 -08:00 |
|
Jee Jee Li
|
39e63dec7c
|
[LoRA] Cleanup LoRA unused code (#29611)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-28 22:52:58 -08:00 |
|
Angela Yi
|
4b17ce6815
|
Add gpu memory wait before test_async_tp (#28893)
Signed-off-by: angelayi <yiangela7@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-11-28 20:19:05 -08:00 |
|
Lucas Wilkinson
|
e23f665d83
|
[BugFix] Fix DBO failing with TypeError: 'NoneType' object is not iterable (#29698)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-28 20:19:01 -08:00 |
|
Tsukasa OI
|
762a4a6ca9
|
[Frontend] Perform offline path replacement to tokenizer (#29706)
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
|
2025-11-28 18:32:08 -08:00 |
|
Cyrus Leung
|
b2c50eda50
|
[Bugfix] Fix wrong mock attribute (#29704)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-29 10:30:41 +08:00 |
|
Andreas Karatzas
|
ea3370b428
|
[ROCm][Bugfix] Patch for the Multi-Modal Processor Test group (#29702)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-11-29 01:31:44 +00:00 |
|
Mert Unsal
|
c625d7b1c6
|
[Bugfix] Fix O(n²) multimodal string prompt processing (#29667)
Signed-off-by: mertunsall <mertunsal1905@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-28 16:10:39 -08:00 |
|
Huamin Li
|
3fd1fb0b60
|
Revert "[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971)" (#29697)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-11-28 15:26:52 -08:00 |
|
Cyrus Leung
|
7675ba30de
|
[Misc] Remove redundant ClassRegistry (#29681)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-28 15:24:47 -08:00 |
|
Benjamin Chislett
|
1986de1375
|
[Perf] Optimize EAGLE prepare_inputs_padded with triton kernels (#28597)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2025-11-28 22:25:05 +00:00 |
|
Yanan Cao
|
3461e7efd8
|
[Frontend] Remap -O to -cc commandline flag (#29557)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2025-11-28 21:51:12 +00:00 |
|
Harry Mellor
|
fecae12cd7
|
Remove all_special_tokens_extended from tokenizer code (#29686)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-28 20:26:51 +00:00 |
|
Cyrus Leung
|
8d9338fae4
|
[Chore] Rename Processor to InputProcessor (#29682)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-28 09:35:41 -08:00 |
|
Isotr0py
|
f946a8d743
|
[Chore]: Reorganize model repo operating functions in transformers_utils (#29680)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-28 08:46:51 -08:00 |
|
Nick Hill
|
8e7a891602
|
[BugFix] Fix spec decoding max_tokens scheduling perf issue (#29542)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-28 20:52:23 +08:00 |
|
Cyrus Leung
|
33b06a6f24
|
[Misc] Remove redundant attention var constants (#29650)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-28 04:35:19 -08:00 |
|
Julien Denize
|
b2c1d294fa
|
[BUGFIX] MistralTokenizer._call__ adds an invalid EOS token (#29607)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-28 16:44:47 +08:00 |
|
wang.yuqi
|
f4b76056ee
|
Improve enable chunked_prefill & prefix_caching logic. (#26623)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-27 22:05:48 -08:00 |
|
EanWang211123
|
37b15e97e8
|
[Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qwen3vl (#29594)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: EanWang211123 <wangyiheng@sangfor.com.cn>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-11-27 22:05:45 -08:00 |
|
maang-h
|
c7ba1f6bc7
|
[BugFix] Fix ValueError in NewRequestData repr methods (#29392)
Signed-off-by: maang <maang_h@163.com>
|
2025-11-28 13:42:30 +08:00 |
|
Xin Yang
|
745a3bae1a
|
[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#28971)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-28 10:48:28 +08:00 |
|
Nicolò Lucchesi
|
e5a621b724
|
[CI] Add batched audios Whisper test (#29308)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-27 19:31:52 +00:00 |
|
Matthew Bonanni
|
fc1d8be3dc
|
[Attention] Update attention imports (#29540)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-27 11:19:09 -05:00 |
|
Ryan Rock
|
bab438ff3e
|
[CI/Build] Skip ray tests on ROCm (#29556)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2025-11-27 07:01:37 -08:00 |
|
Jee Jee Li
|
2f5f9acd55
|
[LoRA] Continue optimizing MoE LoRA weight loading (#29322)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-27 05:56:28 -08:00 |
|
Cyrus Leung
|
e6d4f3c254
|
[Bugfix] Fix pre-commit (#29601)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-27 02:23:06 -08:00 |
|
Morrison Turnansky
|
0838b52e2e
|
[Frontend][torch.compile] CompilationConfig Overhaul (#20283): Set up -O infrastructure (#26847)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: adabeyta <aabeyta@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: adabeyta <aabeyta@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-27 01:55:58 -08:00 |
|
Micah Williamson
|
43c5792592
|
[ROCm][CI] Fix test_cpu_offloading for ROCm (#29548)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-11-27 07:54:44 +00:00 |
|
HDCharles
|
df01eda4dc
|
[Bugfix] Make compressed-tensors MoEs respect ignored layers (#28878)
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
|
2025-11-26 21:35:13 -05:00 |
|
Lucas Wilkinson
|
56539cddac
|
[Core] Refactor padding logic and pad for CUDA graphs before attention metadata building (#28579)
|
2025-11-26 14:07:13 -05:00 |
|
Matthew Bonanni
|
430dd4d9eb
|
[Attention] Remove imports from vllm/attention/__init__.py (#29342)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-26 10:53:15 -07:00 |
|
Wentao Ye
|
0b0aa874e8
|
[Perf] Optimize batch invariant BMM, 18.1% Throughput improvement, 10.7% TTFT improvement (#29345)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-26 09:38:52 -07:00 |
|
Huamin Li
|
70d5953f82
|
Revert "[Bugfix] Fix GPT-OSS AR+NORM fusion (#28841)" (#29483)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-11-26 22:27:26 +08:00 |
|
Yejing Lai
|
bb706d6048
|
Fix TeleChatForCausalLM not register issue (#29473)
Signed-off-by: Lai, Yejing <yejing.lai@intel.com>
|
2025-11-26 05:15:00 -08:00 |
|
Nick Hill
|
4e57c6587f
|
[Core] Support logprobs with spec decode + async scheduling (#29223)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-25 12:55:24 -08:00 |
|
Andrew Xia
|
b07555d26f
|
[responsesAPI][2] parse ResponseFunctionToolCallOutputItem (#29383)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-11-25 10:27:26 -08:00 |
|
Yifan Qiao
|
48ddb02b79
|
[Hybrid Allocator] Support KV cache groups with different block_size (#29143)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-11-25 10:30:57 -05:00 |
|
Injae Ryou
|
794029f012
|
[Feature]: Improve GGUF loading from HuggingFace user experience like repo_id:quant_type (#29137)
Signed-off-by: Injae Ryou <injaeryou@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-25 14:28:53 +00:00 |
|
Thomas Parnell
|
516c3f7847
|
[Bugfix] Fix logic for choosing default prefix caching setting (#29393)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-11-25 14:05:10 +00:00 |
|
Harry Mellor
|
51fc9e017a
|
Scheduled removal of CompilationConfig.use_inductor (#29323)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-25 12:55:42 +00:00 |
|
wang.yuqi
|
7a80b01889
|
[CI] Resettle pooling entrypoints tests. (#29370)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-11-25 10:39:10 +00:00 |
|
Ben Browning
|
e1dd706cd1
|
[Frontend] Respect Chat Completion parallel_tool_calls param (#26233)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-11-25 09:56:15 +00:00 |
|
wang.yuqi
|
67fc16cd8c
|
[Bugfix] If chunked_prefill is disabled, end the scheduling early. (#28911)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-11-25 16:06:09 +08:00 |
|
elvischenv
|
6330f9477d
|
[Bugfix] Fix GPT-OSS AR+NORM fusion (#28841)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-11-25 07:59:40 +00:00 |
|
Micah Williamson
|
ef1f7030f0
|
[ROCm][CI] Fix test_cudagraph_mode failure in AMD CI (#29367)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-11-25 07:55:09 +00:00 |
|