Commit Graph

14386 Commits

Author SHA1 Message Date
Isotr0py
444e2e7e1f [Misc] Bump opencv-python dependecy version to 4.13 (#32668)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-22 15:51:15 +00:00
Nick Hill
bc14663e6a [Cleanup] Move scheduler get_routed_experts logic to separate method (#32706)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-22 10:46:00 -05:00
Richard Zou
654a71fc3c [torch.compile] Improve Cold Start for MoEs (#32805)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2026-01-22 10:44:40 -05:00
Lucas Kabela
15e302dfce [Misc][BE] Turn on strict type coverage for vllm/compilation (#31756)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-01-22 15:12:26 +00:00
Cyrus Leung
d117a4d1a9 [Frontend] Introduce Renderer for processing chat messages (using ModelConfig) (#30200)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-22 12:44:22 +00:00
Or Ozeri
421012b63a OffloadingConnector: Support kernel_block_size != block_size (#30692)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-01-22 12:30:04 +00:00
Chauncey
841d53aaa8 [Frontend] add prompt_cache_key for openresponses (#32824)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-01-22 11:34:14 +00:00
Shengqi Chen
1752262e96 [CI] refactor release pipeline config into groups (#32833)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2026-01-22 11:27:21 +00:00
Nicolò Lucchesi
ea6102b85d [Bugfix] Fix Whisper/encoder-decoder GPU memory leak (#32789)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-01-22 10:50:37 +00:00
wang.yuqi
328cbb2773 [Frontend][2/n] Make pooling entrypoints request schema consensus | ChatRequest (#32574)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-01-22 10:32:44 +00:00
liranschour
64e3d67ac0 Enable Cross layers KV cache layout at NIXL Connector (#30207)
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: liranschour <liranschour@users.noreply.github.com>
Co-authored-by: Or Ozeri <or@ozery.com>
2026-01-22 10:12:58 +00:00
Nick Hill
098b2d66fe [Benchmark] Don't default to temperature==0 in vllm bench serve (#32723)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-22 10:03:15 +00:00
Isotr0py
8ebf271bb6 [Misc] Replace urllib's urlparse with urllib3's parse_url (#32746)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-22 16:37:15 +08:00
Alex Sun
49a1262267 [AMD][ROCm] MoRI EP: a high-performance all2all backend (#28664)
Signed-off-by: Alex Sun <alex.s@amd.com>
2026-01-22 16:33:18 +08:00
Cyrus Leung
2b8a38b6d6 [Model] Extend collect_children and no_init_weights contexts (#32757)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-22 08:20:27 +00:00
Kebe
1bf1a34b19 [bench] add start_times field to vllm bench serve json result (#32667)
Signed-off-by: Kebe <mail@kebe7jun.com>
2026-01-22 07:10:14 +00:00
Andreas Karatzas
a810299838 [ROCm][CI][Docs] Add comment explaining TRITON_ATTN fallback for ROCm (#32835)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-21 22:11:09 -08:00
Andreas Karatzas
eb1629da24 [ROCm][CI] Fix AITER test flakiness by using explicit attention backend (#32346)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
2026-01-22 13:55:25 +08:00
Micah Williamson
019e2c3b7c [ROCm][CI] Lower Acceptance Len Threshold For test_draft_model_quantization (#32731)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-01-22 05:47:33 +00:00
Huy Do
f5fdec8ce2 Upgrade transformers-4.57.5 (#32287)
Signed-off-by: Huy Do <huydhn@gmail.com>
2026-01-22 05:19:19 +00:00
Patrick von Platen
1579c9b5fd [Llama.py -> mistral.py] Extract mistral-only relevant code into separate file (#32780)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
2026-01-22 05:14:57 +00:00
Lucas Wilkinson
889722f3bf [FlashMLA] Update FlashMLA to expose new arguments (#32810)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-01-21 22:02:39 -07:00
Divakar Verma
49d9653852 [ROCm][CI] fix get_valid_backends (#32787)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2026-01-22 04:27:47 +00:00
Ifta khairul Alam Adil
a1d82466ea [Docs] Remove outdated async_scheduling limitation with speculative decoding (#32775)
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com>
2026-01-21 20:19:25 -08:00
Lucain
24a163ed77 Cleanup some huggingface_hub-related stuff (#32788) 2026-01-22 03:38:17 +00:00
knlnguyen1802
378385b90c [EC Connector] Optimize remote cache check in scheduler (#32585)
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
2026-01-22 03:30:59 +00:00
Matt
c5487e2b96 [Bugfix] Fix potential EAGLE spec decode segfault during graph capture (#32818)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
2026-01-22 03:11:55 +00:00
Wentao Ye
6437ff1fb9 [Deprecation] Remove deprecated environment variables (#32812)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-01-22 02:25:16 +00:00
Woosuk Kwon
5e00b561cd [Model Runner V2] Do not error on attention backends (#32820)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2026-01-21 17:02:48 -08:00
Woosuk Kwon
408195ec59 [Model Runner V2] Refactor Prompt Logprobs (#32811)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2026-01-21 15:12:20 -08:00
Xin Yang
63227accf5 [Kernel] Add topk_sigmoid kernel (#31246)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-01-21 22:49:51 +00:00
Yanan Cao
e675dda67b [Misc] Add Helion version check to collect_env (#32797)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2026-01-21 21:54:46 +00:00
Nick Hill
24dc30f7ff [ModelRunner V2] Don't pin reused flashinfer tensors (#32799)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-21 13:17:43 -08:00
Divakar Verma
180fba653e [ROCm] fix import for on_gfx9 (#32783)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2026-01-21 18:41:11 +00:00
danisereb
f999539869 Add missing import of fused_topk to benchmark_moe (#32784)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
2026-01-21 18:30:10 +00:00
Woosuk Kwon
e1da249c93 [Model Runner V2] Minor refactor for compute_slot_mappings (#32794)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2026-01-21 10:24:35 -08:00
Nick Hill
9b693d023c [Misc] Omit "disable NCCL for DP sync" startup log when not applicable (#32707)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-21 17:03:39 +00:00
elvischenv
808d6fd7b9 Bump Flashinfer to v0.6.1 (#30993)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2026-01-21 08:49:50 -08:00
whx
1861ae8aae [PluggableLayer][1/N] Define PluggableLayer (Fix ci) (#32744)
Signed-off-by: whx-sjtu <2952154980@qq.com>
2026-01-21 11:38:04 -05:00
Robert Shaw
4e31b7f228 [Quantization][Deprecation] Remove RTN (#32697)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-01-21 16:34:42 +00:00
Pleaplusone
6c20e89c02 [ROCm][Deepseekv3.2] Refactor Sparse Indexer as CustomOp (#29287)
Signed-off-by: ganyi <ygan@amd.com>
2026-01-21 23:16:30 +08:00
Robert Shaw
85f55c943c [Quantization][Deprecation] Deprecate HQQ (#32681)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-01-21 09:32:40 -05:00
Robert Shaw
cea3c754c4 [Quantization][Deprecation] Remove DeepSpeedFp8 (#32679)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-01-21 09:32:12 -05:00
Robert Shaw
42135d6898 [MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414) 2026-01-21 08:22:33 -05:00
Divakar Verma
e14467be43 [bugfix] Aria model (#32727)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2026-01-21 05:11:31 -08:00
Kim Hee Su
7727ce35c2 [Model] Add Eagle2.5-8B Vision-Language Model support (#32456)
Signed-off-by: kimheesu <wlskaka4@gmail.com>
2026-01-21 09:39:53 +00:00
Yanwen Lin
6bb2bc71e2 [Bugfix] Force using spawn multiprocess method when it's the WSL platform (#32749)
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>
2026-01-21 09:35:55 +00:00
Lucas Kabela
c80f92c14d [Documentation] Fix typo in docs/design/torch_compile_multimodal.md (#32741)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-01-20 23:54:20 -08:00
RickyChen / 陳昭儒
f23fb5a7c1 [Bugfix] Support HF sharded weights for Mistral3/Pixtral models (#32673)
Signed-off-by: ricky-chaoju <ricky.chen@infinirc.com>
Signed-off-by: vllm-dev <ricky.chen@infinirc.com>
2026-01-20 23:27:30 -08:00
Paco Xu
360aa93f8f [Docs] Fix GitHub handle in governance process (#32582)
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
2026-01-21 07:07:50 +00:00