knlnguyen1802
|
378385b90c
|
[EC Connector] Optimize remote cache check in scheduler (#32585)
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
|
2026-01-22 03:30:59 +00:00 |
|
Matt
|
c5487e2b96
|
[Bugfix] Fix potential EAGLE spec decode segfault during graph capture (#32818)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-01-22 03:11:55 +00:00 |
|
Wentao Ye
|
6437ff1fb9
|
[Deprecation] Remove deprecated environment variables (#32812)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-22 02:25:16 +00:00 |
|
Woosuk Kwon
|
5e00b561cd
|
[Model Runner V2] Do not error on attention backends (#32820)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2026-01-21 17:02:48 -08:00 |
|
Woosuk Kwon
|
408195ec59
|
[Model Runner V2] Refactor Prompt Logprobs (#32811)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2026-01-21 15:12:20 -08:00 |
|
Xin Yang
|
63227accf5
|
[Kernel] Add topk_sigmoid kernel (#31246)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-01-21 22:49:51 +00:00 |
|
Yanan Cao
|
e675dda67b
|
[Misc] Add Helion version check to collect_env (#32797)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2026-01-21 21:54:46 +00:00 |
|
Nick Hill
|
24dc30f7ff
|
[ModelRunner V2] Don't pin reused flashinfer tensors (#32799)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-21 13:17:43 -08:00 |
|
Divakar Verma
|
180fba653e
|
[ROCm] fix import for on_gfx9 (#32783)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2026-01-21 18:41:11 +00:00 |
|
danisereb
|
f999539869
|
Add missing import of fused_topk to benchmark_moe (#32784)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-01-21 18:30:10 +00:00 |
|
Woosuk Kwon
|
e1da249c93
|
[Model Runner V2] Minor refactor for compute_slot_mappings (#32794)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2026-01-21 10:24:35 -08:00 |
|
Nick Hill
|
9b693d023c
|
[Misc] Omit "disable NCCL for DP sync" startup log when not applicable (#32707)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-21 17:03:39 +00:00 |
|
elvischenv
|
808d6fd7b9
|
Bump Flashinfer to v0.6.1 (#30993)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2026-01-21 08:49:50 -08:00 |
|
whx
|
1861ae8aae
|
[PluggableLayer][1/N] Define PluggableLayer (Fix ci) (#32744)
Signed-off-by: whx-sjtu <2952154980@qq.com>
|
2026-01-21 11:38:04 -05:00 |
|
Robert Shaw
|
4e31b7f228
|
[Quantization][Deprecation] Remove RTN (#32697)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-21 16:34:42 +00:00 |
|
Pleaplusone
|
6c20e89c02
|
[ROCm][Deepseekv3.2] Refactor Sparse Indexer as CustomOp (#29287)
Signed-off-by: ganyi <ygan@amd.com>
|
2026-01-21 23:16:30 +08:00 |
|
Robert Shaw
|
85f55c943c
|
[Quantization][Deprecation] Deprecate HQQ (#32681)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-21 09:32:40 -05:00 |
|
Robert Shaw
|
cea3c754c4
|
[Quantization][Deprecation] Remove DeepSpeedFp8 (#32679)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-21 09:32:12 -05:00 |
|
Robert Shaw
|
42135d6898
|
[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414)
|
2026-01-21 08:22:33 -05:00 |
|
Divakar Verma
|
e14467be43
|
[bugfix] Aria model (#32727)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2026-01-21 05:11:31 -08:00 |
|
Kim Hee Su
|
7727ce35c2
|
[Model] Add Eagle2.5-8B Vision-Language Model support (#32456)
Signed-off-by: kimheesu <wlskaka4@gmail.com>
|
2026-01-21 09:39:53 +00:00 |
|
Yanwen Lin
|
6bb2bc71e2
|
[Bugfix] Force using spawn multiprocess method when it's the WSL platform (#32749)
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>
|
2026-01-21 09:35:55 +00:00 |
|
Lucas Kabela
|
c80f92c14d
|
[Documentation] Fix typo in docs/design/torch_compile_multimodal.md (#32741)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-01-20 23:54:20 -08:00 |
|
RickyChen / 陳昭儒
|
f23fb5a7c1
|
[Bugfix] Support HF sharded weights for Mistral3/Pixtral models (#32673)
Signed-off-by: ricky-chaoju <ricky.chen@infinirc.com>
Signed-off-by: vllm-dev <ricky.chen@infinirc.com>
|
2026-01-20 23:27:30 -08:00 |
|
Paco Xu
|
360aa93f8f
|
[Docs] Fix GitHub handle in governance process (#32582)
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
|
2026-01-21 07:07:50 +00:00 |
|
Netanel Haber
|
27ca95b3c9
|
[Bugfix] Fix Nemotron-Nano-v2-vlm static resolution (#32682)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-01-21 06:28:21 +00:00 |
|
Lucas Wilkinson
|
b4f64e5b02
|
Update FlashMLA (#32491)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-21 13:03:37 +08:00 |
|
shanjiaz
|
7ab80a8e37
|
Added qwen3 vision language moe support for speculative decoding (#32048)
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <43143795+shanjiaz@users.noreply.github.com>
|
2026-01-21 03:24:05 +00:00 |
|
gopalsarda
|
0900cedb3f
|
Enable Eagle3 speculative decoding for Pixtral (LlavaForConditionalGeneration) (#32542)
Signed-off-by: gopalsarda <gopal.sarda@servicenow.com>
|
2026-01-21 11:18:05 +08:00 |
|
Nick Hill
|
6f067b1fb7
|
[Cleanup] Remove unused KVConnectorModelRunnerMixin methods (#32077)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-21 11:16:37 +08:00 |
|
Alex Brooks
|
27b81e010d
|
[Bugfix] Fix Granite Vision / Don't use Siglip Pooling Head Nested Models by Default (#32299)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2026-01-21 11:11:52 +08:00 |
|
Or Ozeri
|
7013e9ac8f
|
OffloadingConnector: Prevent redundant loads (#29087)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-21 01:15:42 +00:00 |
|
Robert Shaw
|
c78ee240b3
|
Revert "[PluggableLayer][1/N] Define PluggableLayer" (#32725)
|
2026-01-21 00:21:06 +00:00 |
|
Vasiliy Kuznetsov
|
d2389c1262
|
fp8 online quant: split out Fp8OnlineLinearMethod (#32189)
|
2026-01-20 18:13:22 -05:00 |
|
Micah Williamson
|
22375f8d13
|
[ROCm][CI] Remove DS async eplb accuracy test from AMD CI (#32717)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-01-20 13:40:48 -08:00 |
|
TJian
|
9b67338b78
|
[Bugfix] Suppress log on non-ROCm platform (#32703)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-01-20 13:38:20 -08:00 |
|
Lucas Wilkinson
|
2261340806
|
[Misc] Remove pad_for_cudagraphs from config (#30143)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-20 15:05:48 -05:00 |
|
Shinichi Hemmi
|
86c69dc54c
|
[Bugfix] Fix byte fallback handling when using outlines (#31391)
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
Co-authored-by: Kenichi Maehashi <maehashi@preferred.jp>
|
2026-01-20 19:48:08 +00:00 |
|
dolpm
|
7c5dedc247
|
[AOT compilation] support torch.compile inductor artifacts in VllmCompiledFunction (#25205)
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>
|
2026-01-20 19:45:59 +00:00 |
|
Cyrus Leung
|
193069d129
|
[5/N] Initialize MM components in context managers (Q-Z) (#32695)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-20 19:10:23 +00:00 |
|
Rahul Tuli
|
f0feb1cf81
|
Test: added acceptance length tests (#32030)
Signed-off-by: rahul-tuli <rtuli@redhat.com>
|
2026-01-20 18:55:15 +00:00 |
|
Cyrus Leung
|
09194b90a5
|
[Doc] Update docs for MM model development with context usage (#32691)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-20 10:37:35 -08:00 |
|
Woosuk Kwon
|
9ab4388cd3
|
[Model Runner V2] Support FLASHINFER_MLA backend (#32709)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2026-01-20 10:26:17 -08:00 |
|
JJJYmmm
|
04a9e064db
|
[Bugfix] fix the ima issue of qwen-vit (#32687)
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
|
2026-01-20 17:21:25 +00:00 |
|
TJian
|
c025263ddd
|
[Doc] [ROCm] Update ROCm getting started doc (#32580)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Hongxia Yang <hongxia.yang@amd.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-20 09:20:08 -08:00 |
|
Wentao Ye
|
6c97b9b9b6
|
[Perf] Only clone when needed for moe_permute (#32273)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-20 11:34:39 -05:00 |
|
whx
|
4ca62a0dbd
|
[PluggableLayer][1/N] Define PluggableLayer (#32331)
Signed-off-by: whx-sjtu <2952154980@qq.com>
|
2026-01-20 16:19:21 +00:00 |
|
linhaifeng
|
7901109ea5
|
[Bugfix] Fix Off-by-one error in _num_tokens_to_min_blocks calculation (#32603)
Signed-off-by: linhaifeng <1371675203@qq.com>
|
2026-01-20 11:13:39 -05:00 |
|
YiSheng5
|
13f6630a9e
|
[XPU]Support AgRsAll2AllManager on XPU device (#32654)
Signed-off-by: yisheng <yi.sheng@intel.com>
|
2026-01-20 14:27:24 +00:00 |
|
Cyrus Leung
|
fda3f03eb2
|
[4/N] Initialize MM components in context managers (M-P) (#32663)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-20 14:06:32 +00:00 |
|