Luka Govedič
|
2dcd12d357
|
[torch.compile] Fix tests for torch==2.9 inductor partition (#26116)
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2025-10-14 19:55:02 -04:00 |
|
Tyler Michael Smith
|
579d2e5458
|
[WideEP][P/D] Add usage stats for DP+EP and KV Connector (#26836)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-10-14 23:51:54 +00:00 |
|
Ye Hu
|
0512c04aee
|
[frontend][gptoss] Add per turn stats into Harmony Context (#25061)
Signed-off-by: lacora <hyelacora@gmail.com>
Co-authored-by: Ye Hu <yehu@fb.com>
|
2025-10-14 16:48:13 -07:00 |
|
Michael Goin
|
7e0ef4084a
|
[CI Failure] Fix torchao dep failure for Quantization Test (#26824)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-14 16:41:43 -07:00 |
|
Nick Hill
|
4aed506b65
|
[Core] Streamline some structured output related code (#26737)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-14 23:27:44 +00:00 |
|
Boyuan Feng
|
a86b4c58e8
|
remove attn output view kernel (#26680)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
Signed-off-by: Boyuan Feng <fby.1994@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-14 22:53:10 +00:00 |
|
Nick Hill
|
ff4810ba73
|
[Minor] Group async_scheduling related fields in model runner init (#26736)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-14 14:46:37 -07:00 |
|
Nan Qin
|
9d6964926e
|
fix: response_format for completion (#23212)
Signed-off-by: Nan2018 <qinnanjoshua@gmail.com>
|
2025-10-14 21:23:22 +00:00 |
|
Dhruvil Bhatt
|
0e65818910
|
Added MoE configs for llama 4, H200 device with tp=4/8 tuning (#26837)
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
|
2025-10-14 14:21:03 -07:00 |
|
Jialin Ouyang
|
380f17527c
|
[Perf] Cache vllm.env.__getattr__ result to avoid recomputation (#26146)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-10-14 17:03:21 -04:00 |
|
HDCharles
|
b92ab3deda
|
Notice for deprecation of AutoAWQ (#26820)
Signed-off-by: HDCharles <39544797+HDCharles@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-14 13:39:59 -07:00 |
|
Jialin Ouyang
|
acaa2c0a4a
|
[Core] Reuse empty block lists whenever possible in KVCacheBlocks to mitigate GC costs (#24964)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-10-14 12:58:43 -07:00 |
|
Matthew Bonanni
|
82af928c41
|
[Attention][Spec Decode] FlashMLA spec decode support (#26541)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-14 19:38:20 +00:00 |
|
Huamin Li
|
87efc681db
|
llama4_vision_rope: add HIP override to accept (q, k) and avoid (positions, q, k) mismatch (#26790)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-10-14 11:54:12 -07:00 |
|
Michael Goin
|
c3a722fcb2
|
[CI Failure] Fix tests with missing TinyLlama-1.1B-Chat-v1.0-FP8-e2e (#26816)
Signed-off-by: mgoin <mgoin64@gmail.com>
v0.11.1rc1
|
2025-10-14 18:38:59 +00:00 |
|
Ze'ev Klapow
|
aba48f7db1
|
[Kernel][MoE] Add MoE tunings for GLM 4.6-FP8 and GLM 4.5 Air on NVidia B200 (#26818)
|
2025-10-14 11:20:39 -07:00 |
|
Michael Goin
|
04b5f9802d
|
[CI] Raise VLLM_MAX_SIZE_MB to 500 due to failing Build wheel - CUDA 12.9 (#26722)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-14 10:52:05 -07:00 |
|
Reza Barazesh
|
efc8f7d814
|
Update coveragerc and add codecov.yml for path fixes (#26435)
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com>
|
2025-10-14 09:45:06 -07:00 |
|
Wentao Ye
|
6d87a2838c
|
[Config] Remove Unused Environment Variable VLLM_DISABLE_PAD_FOR_CUDAGRAPH (#26743)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-14 11:47:49 -04:00 |
|
wang.yuqi
|
e6cdbd6792
|
Revert "[issues template] Encourage the author implement their own ideas" (#26814)
|
2025-10-14 08:37:34 -07:00 |
|
Chauncey
|
df850c4912
|
[Feature][Responses API] Stream Function Call - harmony (#24317)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-10-14 08:31:43 -07:00 |
|
Qier Li
|
720394de43
|
[KVConnector][Metrics] Aggregate scheduler-side KVConnectorStats (#26046)
Signed-off-by: Qier Li <kevin44036@gmail.com>
|
2025-10-14 14:38:07 +00:00 |
|
wang.yuqi
|
88a49745af
|
[issues template] Encourage the author implement their own ideas (#26671)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-10-14 22:32:36 +08:00 |
|
Boyuan Feng
|
ca683a2a72
|
use combo kernel to fuse qk-norm and qk-rope (#26682)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-10-14 09:40:59 -04:00 |
|
汪志鹏
|
e9f1b8c9e9
|
Adjusted the model order of the model registration file (#26798)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-10-14 13:26:11 +00:00 |
|
Jaya Yuan
|
ea97940d6c
|
[DCP] Support Decode Context Parallel (DCP) for GQA with FlashAttention (#24864)
Signed-off-by: yuanyongjie.yyj <yuanyongjie.yyj@antgroup.com>
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
Signed-off-by: Jaya Yuan <yuanyongjie.yyj@antgroup.com>
|
2025-10-14 13:07:50 +00:00 |
|
Jee Jee Li
|
fdd32750f0
|
[CI/Build] Cleanup LoRA test (#26752)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-14 12:06:35 +00:00 |
|
Vladislav Bronzov
|
c715ba3735
|
[Feature] Change vllm.py with pydantic validation (#26726)
Signed-off-by: Vladislav <vladislav.bronzov@gmail.com>
Signed-off-by: Vladislav Bronzov <58587565+VladOS95-cyber@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-14 12:00:54 +00:00 |
|
Cyrus Leung
|
9c4cb68339
|
[Chore] Remove SupportsV0Only interface and update supported models docs (#26783)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-14 04:55:10 -07:00 |
|
Chauncey
|
780eb03d9b
|
[CI] Fix test_tool_id_kimi_k2 (#26787)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-10-14 10:27:07 +00:00 |
|
Cyrus Leung
|
ef9676a1f1
|
[Doc] ruff format some Python examples (#26767)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-14 03:21:53 -07:00 |
|
Harry Mellor
|
70b1b330e1
|
Don't allow typos to fix by default (#26785)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-14 03:05:15 -07:00 |
|
Cyrus Leung
|
d1d063a588
|
[Chore] Use max_transformers_version for Qwen-VL test (#26792)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-14 03:03:46 -07:00 |
|
Chendi.Xue
|
7e6edb1469
|
[NIXL][HeteroTP] Enable KV transfer from HND prefill to NHD decode (#26556)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-10-14 09:46:05 +00:00 |
|
Cyrus Leung
|
74704d4553
|
[Model] Use merge_by_field_config for MM models (O-P) (#26776)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-14 09:42:45 +00:00 |
|
Cyrus Leung
|
d2f816d6ff
|
[Bugfix] Standardize merging multimodal embeddings (#26771)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-14 09:36:21 +00:00 |
|
wangxiyuan
|
577d498212
|
[Plugin] Make plugin group clear (#26757)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-10-14 07:49:59 +00:00 |
|
Max Wittig
|
fd85c9f426
|
[Bugfix][FE]: Always include usage with --enable-force-include-usage (#20983)
Signed-off-by: Max Wittig <max.wittig@siemens.com>
Signed-off-by: Antoine Auger <antoineauger@users.noreply.github.com>
Co-authored-by: Antoine Auger <antoineauger@users.noreply.github.com>
|
2025-10-14 09:17:39 +02:00 |
|
Ye (Charlotte) Qi
|
d32c611f45
|
[CI/Build] Use 127.0.0.1 instead of localhost in utils (#26750)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-10-14 07:04:00 +00:00 |
|
CSWYF3634076
|
01ad27faff
|
[Model][Bugfix]fix ernie45 load failed due to ernie45 eplb code (#26684)
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
|
2025-10-14 06:55:23 +00:00 |
|
Ryan Li
|
481545b397
|
scheduler.py: Update the name of the default scheduler. (#26758)
Signed-off-by: Ryan Li <ryanli@ryanli.org>
|
2025-10-14 06:52:21 +00:00 |
|
Alexei-V-Ivanov-AMD
|
d3cc8427c0
|
[ci] Adding the test-amd.yaml for test definitions for the AMD backend. (alternative PR) (#26718)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-10-13 23:10:23 -07:00 |
|
vllmellm
|
4821ac1b4d
|
[CI] [ROCm] Automate CC list for ROCm related issue (#26753)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-10-14 13:57:26 +08:00 |
|
XiongfeiWei
|
4497c8f821
|
Fix lora tests failure in TPU CI due to the removal of LoRA bias (#26723)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-10-14 13:04:23 +08:00 |
|
Michael Yao
|
2e36cdbe2b
|
[Docs] Add a start tag to build.inc.md (#26747)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-10-13 21:51:55 -07:00 |
|
Maximilien de Bayser
|
fe3edb4cf0
|
Add support for the /rerank endpoint in vllm bench serve (#26602)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-10-14 04:25:43 +00:00 |
|
Heng Guo
|
29350922c6
|
[Feature][Quantization] auto_round format add support for regex (#24024)
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: Heng Guo <heng.guo@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-14 03:03:16 +00:00 |
|
Varun Sundar Rabindranath
|
8ae169286f
|
[torch.compile] Unwrap fused_marlin_moe custom op (#26739)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-14 02:22:16 +00:00 |
|
youkaichao
|
8a0af6a561
|
[build][torch.compile] upgrade depyf version (#26702)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-10-14 10:12:09 +08:00 |
|
Jialin Ouyang
|
cfded80793
|
[Easy] Fix env type check errors from VLLM_DEBUG_LOG_API_SERVER_RESPONSE (#26742)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-10-14 01:46:44 +00:00 |
|