Roy Wang
5c86a89805
[docs] Update governance process links ( #32995 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-01-23 23:32:44 -08:00
7. Sun
0ccecf8833
[Tests] Standardize RNG seed utility across test files ( #32982 )
...
Signed-off-by: 7. Sun <jhao.sun@gmail.com >
2026-01-24 06:47:14 +00:00
7. Sun
0b9a735e11
[Tests] Clarify pytest skip reasons with actionable context ( #32981 )
...
Signed-off-by: 7. Sun <jhao.sun@gmail.com >
2026-01-24 06:38:50 +00:00
7. Sun
14d03b8ddb
[Perf] Cache xpu_get_mem_info() result to avoid duplicate calls ( #32983 )
...
Signed-off-by: 7. Sun <jhao.sun@gmail.com >
2026-01-23 20:56:23 -08:00
Michael Goin
d0cbac5827
[Dev UX] Add auto-detection for VLLM_PRECOMPILED_WHEEL_VARIANT during install ( #32948 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Shengqi Chen <i@harrychen.xyz >
2026-01-23 19:15:17 -08:00
ruizcrp
c0d820457a
Auth_token added in documentation as it is required ( #32988 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-24 03:03:05 +00:00
monajafi-amd
97ef11dd34
[ROCm][ViT] Enable Flash Attention Triton backend on RDNA3/RDNA4 ( #32944 )
...
Signed-off-by: mohammad najafi <mohammad.najafi@amd.com >
2026-01-24 10:03:07 +08:00
Xin Yang
ecc3dd66cc
[Bugfix] Fix FusedMoE LoRA kernel offs_token out of bound value ( #32279 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-01-24 01:41:35 +00:00
Joe Runde
7e1f10d562
[Core][Bugfix] allow graceful worker termination ( #32965 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2026-01-23 17:28:45 -08:00
ElizaWszola
a28b94e6ef
[Performance] Split FlashAttn attention and cache update ( #25954 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Luka Govedič <luka.govedic@gmail.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <luka.govedic@gmail.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
2026-01-23 17:28:06 -08:00
dolpm
0118cdcc02
[fix] add VLLM_OBJECT_STORAGE_SHM_BUFFER_NAME to compile factors ( #32912 )
...
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com >
2026-01-23 22:53:10 +00:00
Shengqi Chen
136c499f6e
[CI] fix version comparsion and exclusion patterns in upload-release-wheels.sh ( #32971 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2026-01-23 22:21:49 +00:00
joninco
ebd0a17e0e
[Bugfix] Fix missing is_layer_skipped check for FusedMoE in AWQConfig ( #32935 )
...
Signed-off-by: jon <joninco@bullpoint.org >
2026-01-23 17:19:56 -05:00
Wentao Ye
37c9859fab
[Refactor] Clean up unused variables & func ( #32692 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-23 17:04:25 -05:00
Michael Goin
4561f13985
[Refactor] Rename gptq_marlin to marlin to match MoE ( #32952 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-23 16:48:12 -05:00
rasmith
6cc6d92be5
[CI][AMD][BugFix] Update wvSplitK (and other skinny_gemm wrappers) to ensure tensors passed will be made contiguous for the kernel ( #32831 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
2026-01-23 13:35:48 -08:00
Wentao Ye
dfab5f3764
[Bug] Fix benchmark script moe_permute_unpermute ( #32949 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-23 16:18:56 -05:00
Markus / Mark
586a57ad7e
fix: Add glm4_moe_lite to MLA detection ( #32614 )
...
Signed-off-by: marksverdhei <marksverdhei@hotmail.com >
Signed-off-by: Markus / Mark <46672778+marksverdhei@users.noreply.github.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2026-01-23 12:38:57 -08:00
Lucas Wilkinson
3a41459501
[cudagraphs] Refactor cudagraph capture loop ( #32946 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-01-23 13:22:20 -07:00
Nick Hill
8518b30447
[Model Runner V2] Add KV Connector support ( #32742 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-23 10:49:17 -08:00
Matthew Bonanni
2d6b537157
[Bugfix][CI] Fix pre-commit ( #32956 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-23 10:26:56 -08:00
Orion Reblitz-Richardson
68b0a6c1ba
[CI][torch nightlies] Use main Dockerfile with flags for nightly torch tests ( #30443 )
...
Signed-off-by: Orion Reblitz-Richardson <orionr@meta.com >
Signed-off-by: Orion Reblitz-Richardson <orionr@gmail.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
2026-01-23 10:22:56 -08:00
Harry Huang
5206e5e28c
[V1][Hybrid] Mamba Prefix Caching with align mode ( #30877 )
...
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
2026-01-23 09:56:48 -08:00
Matteo Fari
fec9da0af4
[Model] Enable LoRA support for internvl2 ( #32397 )
...
Signed-off-by: Matteo Fari <matteofari06@gmail.com >
2026-01-24 01:39:01 +08:00
Luka Govedič
bbbd696af9
[torch.compile][CI] Add back attn fusion on hopper/ada ( #32940 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2026-01-23 16:49:20 +00:00
sangbumlikeagod
9b77bb790d
[Frontend] add logprob, compression_rate to 'verbose_json' features ( #31059 )
...
Signed-off-by: sangbumlikeagod <oironese@naver.com >
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com >
2026-01-23 16:35:13 +00:00
Matt
305e53ade8
[Hardware][AMD][CI][Bugfix] Fix Kernels Attention Cache test ( #32904 )
...
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
2026-01-23 16:24:26 +00:00
Mark McLoughlin
1cb4341fbc
[ROCm][PD] Remove unused moriio connector proxy code ( #32939 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-01-23 15:59:04 +00:00
baonudesifeizhai
1fb648bf10
[Bugfix] Fix FP8 MoE EP Weight Loading for ModelOpt Llama4 ( #32886 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
2026-01-23 10:31:48 -05:00
Nicolò Lucchesi
7e22309755
[Misc] Postpone torch_profiler deprecation ( #32867 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-23 14:39:48 +00:00
Xin Yang
90c2007932
[Bugfix] Disable tma_aligned_scales in test_fusions_e2e ( #32916 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-01-23 14:34:30 +00:00
Raushan Turganbay
d95d650762
[Bugfix] Fix getting vision features in Transformer Multimodal backend ( #32933 )
...
Signed-off-by: raushan <raushan@huggingface.co >
2026-01-23 13:34:48 +00:00
tianshu-Michael-yu
13d8746c54
[Feature]: Remove DtoH Copy for lfm2_vl On Default Stream ( #32815 )
...
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com >
2026-01-23 13:20:30 +00:00
Fadi Arafeh
10e94c84f6
[CPU][Feat] Update PyTorch to v2.10 for CPU Backend ( #32869 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2026-01-23 21:13:06 +08:00
Isotr0py
243e78c20f
[Benchmark][Bugfix] Fix race condtion when starting server for sweep benchmark ( #32927 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-01-23 12:11:18 +00:00
Fadi Arafeh
aac0b817fa
[CPU Backend][BugFix] Fix failing CPU MoE test ( #32876 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-01-23 12:06:51 +00:00
wang.yuqi
05f3d714db
[Frontend][3/n] Make pooling entrypoints request schema consensus | EmbedRequest & ClassifyRequest ( #32905 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-23 12:03:44 +00:00
Patrick von Platen
3f3f89529d
[Voxtral] Add new streaming arch ( #32861 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-23 12:41:52 +01:00
Li, Jiang
5da4c7d789
[CI/Build][CPU] Fix failed pooling tests and macos smoke test ( #32907 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: Li, Jiang <bigpyj64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-23 10:48:20 +00:00
Nicolò Lucchesi
160c6fa387
[Misc] Add get_name to missing AttentionBackends ( #32698 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-23 10:35:44 +00:00
Andreas Karatzas
a8eb1182f1
[CI][Models] Add VLM Support for Sequence Classification Conversion ( #32885 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-23 16:22:51 +08:00
Karan Bansal
fa6e599a61
[Bugfix] Fix _CPU_MOE_ACT AssertionError when vLLM config not set ( #32777 )
...
Signed-off-by: Karan Bansal <karanb192@gmail.com >
2026-01-23 08:22:37 +00:00
Wentao Ye
7ef5873752
[CI] Fix mypy for vllm/v1/structured_output ( #32722 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-23 11:55:51 +08:00
Luka Govedič
5e4e0e51f4
[torch.compile] Compile CustomOp.forward_native for SiluAndMul and QuantFP8 to avoid raw torch ops inside opaque custom ops ( #32806 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-01-22 19:52:26 -08:00
Rishabh Saini
f61c9da711
[BugFix] deepseek_v32_encoding: Replace asserts with proper exceptions ( #32884 )
...
Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com >
2026-01-23 03:44:11 +00:00
Nick Hill
7fe255889e
[Misc] Log vLLM logo when starting server ( #32796 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-23 11:15:12 +08:00
bnellnm
dc917cceb8
[MoE Refactor] Move select_experts from FusedMoEQuantMethod -> FusedMoE ( #31996 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-01-22 18:21:35 -05:00
Fadi Arafeh
fc56f4a071
[BugFix] Fix invalid flashinfer_fused_moe_blockscale_fp8 op registration ( #32855 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2026-01-22 22:27:40 +00:00
Xin Yang
d08b356ee0
[Perf] Create TMA-aligned input scale tensor for DeepGemm on Hopper ( #32619 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-01-22 15:47:04 -05:00
Wentao Ye
f744810184
[Refactor] Remove unused tpu files ( #32610 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-22 15:35:18 -05:00