Wentao Ye
c4e744dbd4
[Perf] Optimize moe_permute for CUTLASS FP8 ( #32892 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-28 10:15:24 -08:00
Nicolò Lucchesi
8ebf372e9d
[CI] Whisper tests enforce_eager=False ( #33098 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-28 09:36:56 -08:00
cwazai
f210f0b7b1
[lora/moe] Avoid extra intermediate buffer & Python slicing in expand phase when split_k == 1 ( #32774 )
...
Signed-off-by: 陈建华 <1647430658@qq.com >
2026-01-29 00:22:45 +08:00
Bin Bao
392c5af4fe
[Benchmark] Add startup benchmarking to buildkite run ( #33183 )
...
Signed-off-by: Bin Bao <binbao@meta.com >
2026-01-28 16:03:07 +00:00
Robert Shaw
af9b69f977
[Quantization][Deprecation] Remove Marlin 24 ( #32688 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 15:54:59 +00:00
Chauncey
8e5e40daf4
[Misc] Provide a DeepSeek ReasoningParser with thinking enabled by default ( #33221 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-28 21:16:53 +08:00
Or Ozeri
2e8de86777
Revert "Enable Cross layers KV cache layout at NIXL Connector ( #30207 )" ( #33241 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
2026-01-28 04:36:00 -08:00
Robert Shaw
247d1a32ea
[Quantization][Deprecation] Remove BitBlas ( #32683 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-28 11:06:22 +00:00
Kevin H. Luu
ecb4f82209
[CI] Update job dependency syntax for Intel and AMD jobs ( #33240 )
...
Signed-off-by: khluu <khluu000@gmail.com >
2026-01-28 01:33:59 -08:00
Kevin H. Luu
5914090765
[CI] Update job dependency for hardware and CPU jobs ( #33237 )
...
Signed-off-by: khluu <khluu000@gmail.com >
2026-01-28 01:10:05 -08:00
Harry Mellor
f1acbd68c5
[CI] Enable mypy import following for vllm/compilation ( #33199 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 08:59:54 +00:00
Yan Ma
9581185d51
[XPU]disable test_acceptance_length UT ( #33226 )
2026-01-28 15:24:13 +08:00
Maryam Tahhan
2dd359f953
[Docs] Simplify CPU x86 Docker build documentation ( #33071 )
...
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com >
2026-01-28 06:37:09 +00:00
Gregory Shtrasberg
22ad649501
[ROCm] Enabling forward_includes_kv_cache on ROCm MHA backends ( #33106 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-01-28 14:36:14 +08:00
ramos
36d450e3b8
Adds FunAudioChat multimodal audio model support ( #2 ) ( #33058 )
...
Signed-off-by: ramos <49182011+nemoramo@users.noreply.github.com >
Signed-off-by: mayufeng <mayufeng@example.com >
Co-authored-by: mayufeng <mayufeng@example.com >
2026-01-28 05:18:09 +00:00
22quinn
a2b877df6c
[Bugfix] Lazy import NgramProposer in GPU model runner ( #32821 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2026-01-27 21:07:16 -08:00
Harry Mellor
35fb0b8613
Don't use min_pixels/max_pixels from Qwen2VL's processor ( #33208 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 05:02:08 +00:00
Harry Mellor
2eb673a088
Add flake8-implicit-str-concat rules to Ruff ( #33191 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 04:56:10 +00:00
Jeffrey Wang
a97b5e206d
Relax protobuf library version constraints ( #33202 )
...
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com >
2026-01-28 04:15:53 +00:00
Micah Williamson
911b51b69f
[ROCm][CI] Add TORCH_NCCL_BLOCKING_WAIT For Distributed Tests (A100) ( #32891 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-01-28 11:32:31 +08:00
Xinan Miao
604e3b87e8
[Feature]: Container image WORKDIR consistency ( #33159 )
...
Signed-off-by: SouthWest7 <am1ao@qq.com >
Co-authored-by: SouthWest7 <am1ao@qq.com >
2026-01-28 11:06:48 +08:00
Harry Mellor
706f123b23
[Docs] Use definition lists for CLI reference docs ( #33186 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com >
2026-01-28 02:22:48 +00:00
Angela Yi
fb7abfc1d0
[docs] Improve tlparse section ( #33211 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-01-28 02:07:37 +00:00
Kevin H. Luu
5d3d6e44e8
[CI] minor fixes to pipeline generator and tests ( #33151 )
...
Signed-off-by: khluu <khluu000@gmail.com >
2026-01-27 17:04:02 -08:00
Woosuk Kwon
46ec6d71c7
[Model Runner V2] Use a different stream for grammar bitmask h2d copy ( #33059 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Nick Hill <nhill@redhat.com >
2026-01-27 16:37:43 -08:00
Matthew Bonanni
e82fa448c4
Add attention benchmarking tools ( #26835 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Claude <noreply@anthropic.com >
2026-01-28 00:09:20 +00:00
Richard Zou
d9aa39a3bb
[torch.compile] Speed up MOE handling in forward_context ( #33184 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-01-27 15:17:54 -08:00
Wentao Ye
3a6d5cbefd
[Perf] Optimize dcp allocate tensor ( #33102 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-27 17:24:41 -05:00
linhaifeng
f5d7049cc1
[Bugfix] Fix display error (inconsistent with context) ( #33020 )
...
Signed-off-by: linhaifeng <1371675203@qq.com >
2026-01-27 20:33:29 +00:00
Alexei-V-Ivanov-AMD
3c3c547ce0
Enabling "2 node" distributed tests in the AMD CI pipeline. ( #32719 )
...
Signed-off-by: DCCS-4560 <alivanov@chi-mi325x-pod1-112.ord.vultr.cpe.ice.amd.com >
Co-authored-by: DCCS-4560 <alivanov@chi-mi325x-pod1-112.ord.vultr.cpe.ice.amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-01-27 19:13:21 +00:00
Matthew Bonanni
1cbccb6dba
[Attention] Use has_flashinfer helper ( #33177 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-27 18:33:17 +00:00
Iris
bd92089d33
feature: support eagle3 for HunyuanVL & Hunyuan ( #33035 )
...
Signed-off-by: irisliu10 <601012173@qq.com >
Signed-off-by: Iris <38269816+irisliu10@users.noreply.github.com >
2026-01-27 17:55:48 +00:00
Karan Bansal
a6760f1525
[Doc] Improve serve parameter documentation with meaningful defaults ( #33082 )
...
Signed-off-by: Karan Bansal <karanb192@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-27 09:19:37 -08:00
IriKa
66e601ef79
Support compress-tensors with nvfp4 or fp8 weights and modelopt with nvfp4 weights on Turing ( #33076 )
...
Signed-off-by: IriKa Qiu <qiujie.jq@gmail.com >
2026-01-27 11:04:05 -05:00
Nick Hill
0cd259b2d8
[BugFix] Fix P/D with non-MoE DP ( #33037 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-27 08:03:47 -08:00
danielafrimi
83fb2d09e8
Support heterogeneous NemotronHPuzzle model ( #32549 )
...
Signed-off-by: <dafrimi@nvidia.com >
Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com >
Signed-off-by: root <dafrimi@nvidia.com >
2026-01-27 10:55:54 -05:00
danisereb
f3a5ee705f
[LoRA][Spec Decode] Support LoRA for Nemotron-H MTP models ( #32265 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-27 07:53:26 -08:00
wang.yuqi
7cbbca9aaa
[Frontend] Cleanup api server ( #33158 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
2026-01-27 15:18:10 +00:00
omkhalil
5ec44056f7
[Metrics][MFU] Fix UnembedMetrics FLOP overcounting for prefill ( #33045 ) ( #33045 )
...
Fix UnembedMetrics to correctly count FLOPs for the unembedding (LM head) layer.
The bug: UnembedMetrics used total_num_tokens() which counts all tokens in the
batch for projection flops, vocab projections are run on just the last token for the
autoregressive use case.
Co-authored-by: Omar Mohamed Khalil <omarkhalil@meta.com >
2026-01-27 15:16:49 +00:00
Nicolò Lucchesi
492a7983dd
[Bugfix] Fix DeepseekV32 AssertionError: num_kv_heads == 1 ( #33090 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-27 15:03:20 +00:00
Matthew Bonanni
a608b4c6c2
[5/N][Attention] Finish eliminating vllm/attention folder ( #32064 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-27 10:02:51 -05:00
Nicolò Lucchesi
1f3a2c2944
[Bugfix] Disable CG for Whisper+FA2 ( #33164 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-27 21:46:51 +08:00
omerpaz95
7227d06156
[Metrics] [KVConnector] Add Offloading Connector metrics ( #27942 )
...
Added queries and hits metrics for the Offloading Connector.
Also added timing metrics for store and load operations, which take the
average time it takes to load/store, per-token.
The metrics are available from Prometheus and from the StatLogger.
Signed-off-by: omerpaz95 <omerpaz95@gmail.com >
Co-authored-by: Omer Paz <Omer.Paz@ibm.com >
2026-01-27 13:34:49 +00:00
Harry Mellor
14385c80fc
Fix weight mapping test for Transfomers v5 ( #33162 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-27 12:30:14 +00:00
wang.yuqi
76139d0801
[Frontend] Frontend will only attach supported tasks corresponding entrypoints. ( #33139 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-01-27 12:15:43 +00:00
Lifan Shen
da8d0c441a
[AMD][QWEN3-NEXT] FP8 Tunings ( #32042 )
...
Signed-off-by: Lifan Shen <lifans@meta.com >
2026-01-27 09:34:13 +00:00
rasmith
58996f3589
[AMD][Kernel][BugFix] Use correct scale in concat_and_cache_ds_mla_kernel when on gfx942 ( #32976 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
v0.15.0rc1
2026-01-27 07:16:43 +00:00
Roger Wang
b539f988e1
[Models] Kimi-K2.5 ( #33131 )
...
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn >
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn >
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-27 14:50:31 +08:00
Andreas Karatzas
6c00645712
[CI][Pooling] Stabilize ModernBERT test ( #32909 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-01-27 05:26:48 +00:00
Ning Xie
b781eeaa15
[code clean] remove duplicate code ( #33135 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2026-01-27 04:57:16 +00:00