Nick Hill
|
1892993bc1
|
[BugFix][Spec Decoding] Fix negative accepted tokens metric crash (#33729)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
v0.15.1
v0.15.1rc1
|
2026-02-03 20:28:32 -05:00 |
|
Michael Goin
|
7d98f09b1c
|
cherry pick
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
|
2026-02-03 20:28:02 -05:00 |
|
Michael Goin
|
daa2784bb9
|
[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE (#33620)
Signed-off-by: mgoin <mgoin64@gmail.com>
(cherry picked from commit e346e2d056)
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
|
2026-02-03 20:17:37 -05:00 |
|
Richard Zou
|
e4bf6ed90d
|
[torch.compile] Don't do the fast moe cold start optimization if there is speculative decoding (#33624)
Signed-off-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
(cherry picked from commit 5eac9a1b34)
|
2026-02-03 01:16:42 -08:00 |
|
Richard Zou
|
611b18757e
|
[torch.compile] Speed up MOE handling in forward_context (#33184)
Signed-off-by: Richard Zou <zou3519@gmail.com>
(cherry picked from commit d9aa39a3bb)
|
2026-02-03 00:24:28 -08:00 |
|
Kiersten Stokes
|
eec3546bba
|
[Misc][Build] Lazy load cv2 in nemotron_parse.py (#33189)
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>
(cherry picked from commit 9e138cb01d)
|
2026-02-03 00:03:56 -08:00 |
|
zaristei2
|
7c023baf58
|
Patch Protobuf for CVE 2026-0994 (#33619)
Signed-off-by: Zachary Aristei <zaristei@nvidia.com>
Co-authored-by: Zachary Aristei <zaristei@nvidia.com>
|
2026-02-03 00:03:14 -08:00 |
|
zaristei2
|
099a787ee2
|
Patch aiohttp for CVE-2025-69223 (#33621)
Signed-off-by: Zachary Aristei <zaristei@nvidia.com>
Co-authored-by: Zachary Aristei <zaristei@nvidia.com>
|
2026-02-03 00:02:39 -08:00 |
|
Zhewen Li
|
31a64c63a8
|
[Release] Fix format and cherry-pick (#33618)
Signed-off-by: zhewenli <zhewen@inferact.ai>
Co-authored-by: zhewenli <zhewen@inferact.ai>
|
2026-02-02 16:19:05 -08:00 |
|
Zhewen Li
|
57eae2f891
|
[Release] patch step3p5 attention class in v0.15.1 release (#33602)
Signed-off-by: zhewenli <zhewen@inferact.ai>
Co-authored-by: zhewenli <zhewen@inferact.ai>
|
2026-02-02 14:54:08 -08:00 |
|
Yifan Qiao
|
f0d005864a
|
[Fix] prefix cache hit rate == 0 bug with gpt-oss style models (#33524)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
(cherry picked from commit a01ef3fa51)
|
2026-02-02 10:31:50 -08:00 |
|
Robert Shaw
|
94cbe0a328
|
[Nightly CI] Remove CT Model (#33530)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
(cherry picked from commit 318b120766)
|
2026-02-02 02:17:42 -08:00 |
|
csy0225
|
8b45c58fe9
|
[Models] Step-3.5-Flash (#33523)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: i-zhangmingming <i-zhangmingming@stepfun.com>
Co-authored-by: xiewuxun <xiewuxun@stepfun.com>
Co-authored-by: zetaohong <i-hongzetao@stepfun.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
(cherry picked from commit c3b40dc3e7)
|
2026-02-02 02:16:23 -08:00 |
|
Greg Pereira
|
c7039a80b8
|
pin LMCache to v0.3.9 or greater with vLLM v0.15.0 (#33440)
Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
(cherry picked from commit d6416fdde9)
|
2026-02-02 00:17:01 -08:00 |
|
René Honig
|
15ebd0cedf
|
fix: Add SM120 (RTX Blackwell) support for FlashInfer CUTLASS NVFP4 MoE kernels (#33417)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
(cherry picked from commit 079781177a)
|
2026-02-02 00:15:22 -08:00 |
|
Luka Govedič
|
2915268369
|
[fix][torch.compile] Fix cold-start compilation time increase by adding kv cache update to splitting ops (#33441)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Richard Zou <zou3519@gmail.com>
(cherry picked from commit 15f40b20aa)
|
2026-02-02 00:14:07 -08:00 |
|
Lucas Wilkinson
|
d984d664cc
|
[BugFix] Fix whisper FA2 + full cudagraphs (#33360)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
(cherry picked from commit 0a3c71e7e5)
|
2026-02-02 00:13:57 -08:00 |
|
Gregory Shtrasberg
|
5f45b0b7e0
|
[Bugfix][ROCm] Fixing the skinny gemm dispatch logic from #32831 (#33366)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
(cherry picked from commit 31aedfe7d6)
|
2026-02-02 00:13:45 -08:00 |
|
Kevin H. Luu
|
a2dba556db
|
[release] Minor fixes to release annotation and wheel upload (#33129)
Signed-off-by: khluu <khluu000@gmail.com>
(cherry picked from commit 2284461d02)
|
2026-02-02 00:13:34 -08:00 |
|
Michael Goin
|
6ff16b77f8
|
[Bugfix] Enable Triton MoE for FP8 per-tensor dynamic (#33300)
Signed-off-by: mgoin <mgoin64@gmail.com>
(cherry picked from commit bfb9bdaf3f)
|
2026-02-02 00:13:23 -08:00 |
|
wang.yuqi
|
1ed963d43a
|
[Bugfix] Fix Qwen3-VL-Reranker load. (#33298)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit abb34ac43a)
|
2026-02-02 00:13:12 -08:00 |
|
Michael Goin
|
39e8b49378
|
[Bugfix] Register fp8 cutlass_group_gemm as supported for only SM90+SM100 (#33285)
Signed-off-by: mgoin <mgoin64@gmail.com>
(cherry picked from commit 1bd47d6e5a)
|
2026-02-02 00:12:58 -08:00 |
|
TJian
|
f176443446
|
[Release] [CI] Optim release pipeline (#33156)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
(cherry picked from commit f9d03599ef)
v0.15.0
|
2026-01-28 22:47:10 -08:00 |
|
Or Ozeri
|
fe18ce4d3f
|
Revert "Enable Cross layers KV cache layout at NIXL Connector (#30207)" (#33241)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
(cherry picked from commit 2e8de86777)
v0.15.0rc3
|
2026-01-28 11:44:59 -08:00 |
|
Jeffrey Wang
|
5f7f9ea884
|
Relax protobuf library version constraints (#33202)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
(cherry picked from commit a97b5e206d)
v0.15.0rc2
|
2026-01-28 02:17:19 -08:00 |
|
Nick Hill
|
7779de34da
|
[BugFix] Fix P/D with non-MoE DP (#33037)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
(cherry picked from commit 0cd259b2d8)
|
2026-01-28 02:17:08 -08:00 |
|
Nicolò Lucchesi
|
0d8ce320a2
|
[Bugfix] Fix DeepseekV32 AssertionError: num_kv_heads == 1 (#33090)
Signed-off-by: NickLucche <nlucches@redhat.com>
(cherry picked from commit 492a7983dd)
|
2026-01-28 02:16:56 -08:00 |
|
Nicolò Lucchesi
|
d51e1f8b62
|
[Bugfix] Disable CG for Whisper+FA2 (#33164)
Signed-off-by: NickLucche <nlucches@redhat.com>
(cherry picked from commit 1f3a2c2944)
|
2026-01-28 02:16:41 -08:00 |
|
Roger Wang
|
5042815ab6
|
[Models] Kimi-K2.5 (#33131)
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn>
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit b539f988e1)
|
2026-01-28 02:16:28 -08:00 |
|
Chauncey
|
afb390ab02
|
[CI] Fix AssertionError: MCP tool call not found in output_messages (#33093)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
(cherry picked from commit a2393ed496)
|
2026-01-28 02:16:14 -08:00 |
|
Robert Shaw
|
cf1167e50b
|
[Bugfix] Fix Dtypes for Pynccl Wrapper (#33030)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
(cherry picked from commit 43a013c3a2)
v0.15.0rc0
|
2026-01-26 12:37:16 -08:00 |
|
Cyrus Leung
|
11b556878b
|
[Refactor] Use data parser for matching data items to multi-modal UUIDs (#32955)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-26 15:00:28 +08:00 |
|
Danielle Robinson
|
ee484b3f4b
|
Set splitk=1 for fused-moe-lora expand kernel (#32882)
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com>
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-25 22:52:34 -08:00 |
|
Woosuk Kwon
|
a9b53dd435
|
[Model Runner V2] Add LoRAState to consolidate lora logic (#33062)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-01-25 22:21:12 -08:00 |
|
Robert Shaw
|
254db42ede
|
[Tests] Remove Duplicates (#33032)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-26 05:23:54 +00:00 |
|
ltd0924
|
105d104576
|
[StepVL] support close img patch (#32923)
Signed-off-by: luotingdan <luotingdan@stepfun.com>
Signed-off-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
|
2026-01-25 20:56:39 -08:00 |
|
Lucas Wilkinson
|
566cdb6cfb
|
[CI] Fix MHA attention test failure (AttributeError when model_config is None in ViT attention backend) (#33033)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-25 19:49:53 -08:00 |
|
Woosuk Kwon
|
2f0d3ba745
|
[Model Runner V2] Minor simplification for finish_requests (#33048)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-01-25 18:35:02 -08:00 |
|
Woosuk Kwon
|
edf927bc9f
|
[Model Runner V2] Fix slot_mapping after #25954 (#33046)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-01-25 18:29:49 -08:00 |
|
Andreas Karatzas
|
22aeb43007
|
[Bugfix][VLM] Fix transformers backend embed_multimodal for Qwen2.5-VL profiling (#32969)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-26 08:34:05 +08:00 |
|
Itay Etelis
|
a698e8e7ad
|
[Model] Use mm_position to compute mrope positions for Qwen2.5-Omni (#32772)
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
|
2026-01-25 20:15:53 +08:00 |
|
zhanqiuhu
|
151e5451c2
|
[Doc] Add Qwen2.5 models to batch invariance tested models (#33016)
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>
|
2026-01-25 09:20:46 +00:00 |
|
Jee Jee Li
|
73b243463b
|
[BugFix] Add env variable to control PDL in LoRA (#32836)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-25 16:32:30 +08:00 |
|
JJJYmmm
|
7e67df5570
|
[Bugfix] fix encoder cache hang in Qwen3VL (#32684)
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-25 05:17:31 +00:00 |
|
7. Sun
|
ff6c1da4e6
|
[Docs] Fix Apple silicon include path in CPU installation docs (#32977)
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
|
2026-01-25 01:51:49 +00:00 |
|
Roberto L. Castro
|
fcb9df99bd
|
[Perf][Kernel] Optimize FP4 quantization kernels (SM100F) (#32520)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
|
2026-01-24 18:45:27 -07:00 |
|
TJian
|
1ebdff412a
|
[DOC] [ROCm] Update doc for v0.14.1 (#32998)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-01-25 09:13:21 +08:00 |
|
Joshua Deng
|
91601ff478
|
[Feature] add session based streaming input support to v1 (#28973)
Signed-off-by: Joshua Deng <joshuakdeng@gmail.com>
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-24 12:06:28 -08:00 |
|
yugong333
|
d4dbb7af63
|
Using max_loras + 1 to construct grid in fused_moe_lora (#32277)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
|
2026-01-24 12:39:30 -05:00 |
|
Maryam Tahhan
|
203d0bc0c2
|
[CPU] Improve CPU Docker build (#30953)
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-01-24 17:08:24 +00:00 |
|