Or Ozeri
a663b218ae
[Misc] Add orozery to CODEOWNERS (core, kv_transfer, kv_offload) ( #33227 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2026-01-29 04:24:20 +00:00
Michael Goin
1bd47d6e5a
[Bugfix] Register fp8 cutlass_group_gemm as supported for only SM90+SM100 ( #33285 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-28 18:40:59 -08:00
Michael Goin
141cd43967
[UX] Remove noisy CT UnquantizedLinearMethod warn ( #33273 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-01-28 16:09:30 -08:00
Nick Hill
6bf3b46d78
[ModelRunner V2] Misc code simplification and cleanup ( #33266 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-28 14:41:23 -08:00
Matthew Bonanni
77c4f45c6c
[7/N][Attention][Docs] Add documentation for attention backends ( #32477 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-28 17:20:22 -05:00
Michael Goin
ca1969186d
[UX] Enable nested configs in config yaml files ( #33193 )
2026-01-28 16:54:25 -05:00
Gregory Shtrasberg
ab597c869a
[Bugfix] Add missing encoder only guard for do_kv_cache_update ( #33269 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-01-28 21:25:07 +00:00
Angela Yi
4197168ea5
[ez] Remove checks for torch version <= 2.8 ( #33209 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-01-28 16:03:56 -05:00
Rohan Potdar
59bcc5b6f2
Use aiter triton fused_add_rmsnorm_pad for gpt-oss ( #30976 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-01-28 20:47:47 +00:00
Wentao Ye
3e440786af
[Feature] Fully support for async scheduling + PP, 30.8% E2E throughput improvement, 31.8% TPOT improvement ( #32618 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-01-28 20:30:32 +00:00
Kevin H. Luu
8bdd3979d8
[CI] Change GPU key to device key for B200 test ( #33275 )
...
Signed-off-by: khluu <khluu000@gmail.com >
2026-01-28 19:14:29 +00:00
Wentao Ye
c4e744dbd4
[Perf] Optimize moe_permute for CUTLASS FP8 ( #32892 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-28 10:15:24 -08:00
Nicolò Lucchesi
8ebf372e9d
[CI] Whisper tests enforce_eager=False ( #33098 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-01-28 09:36:56 -08:00
cwazai
f210f0b7b1
[lora/moe] Avoid extra intermediate buffer & Python slicing in expand phase when split_k == 1 ( #32774 )
...
Signed-off-by: 陈建华 <1647430658@qq.com >
2026-01-29 00:22:45 +08:00
Bin Bao
392c5af4fe
[Benchmark] Add startup benchmarking to buildkite run ( #33183 )
...
Signed-off-by: Bin Bao <binbao@meta.com >
2026-01-28 16:03:07 +00:00
Robert Shaw
af9b69f977
[Quantization][Deprecation] Remove Marlin 24 ( #32688 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 15:54:59 +00:00
Chauncey
8e5e40daf4
[Misc] Provide a DeepSeek ReasoningParser with thinking enabled by default ( #33221 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2026-01-28 21:16:53 +08:00
Or Ozeri
2e8de86777
Revert "Enable Cross layers KV cache layout at NIXL Connector ( #30207 )" ( #33241 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
2026-01-28 04:36:00 -08:00
Robert Shaw
247d1a32ea
[Quantization][Deprecation] Remove BitBlas ( #32683 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-01-28 11:06:22 +00:00
Kevin H. Luu
ecb4f82209
[CI] Update job dependency syntax for Intel and AMD jobs ( #33240 )
...
Signed-off-by: khluu <khluu000@gmail.com >
2026-01-28 01:33:59 -08:00
Kevin H. Luu
5914090765
[CI] Update job dependency for hardware and CPU jobs ( #33237 )
...
Signed-off-by: khluu <khluu000@gmail.com >
2026-01-28 01:10:05 -08:00
Harry Mellor
f1acbd68c5
[CI] Enable mypy import following for vllm/compilation ( #33199 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 08:59:54 +00:00
Yan Ma
9581185d51
[XPU]disable test_acceptance_length UT ( #33226 )
2026-01-28 15:24:13 +08:00
Maryam Tahhan
2dd359f953
[Docs] Simplify CPU x86 Docker build documentation ( #33071 )
...
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com >
2026-01-28 06:37:09 +00:00
Gregory Shtrasberg
22ad649501
[ROCm] Enabling forward_includes_kv_cache on ROCm MHA backends ( #33106 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-01-28 14:36:14 +08:00
ramos
36d450e3b8
Adds FunAudioChat multimodal audio model support ( #2 ) ( #33058 )
...
Signed-off-by: ramos <49182011+nemoramo@users.noreply.github.com >
Signed-off-by: mayufeng <mayufeng@example.com >
Co-authored-by: mayufeng <mayufeng@example.com >
2026-01-28 05:18:09 +00:00
22quinn
a2b877df6c
[Bugfix] Lazy import NgramProposer in GPU model runner ( #32821 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2026-01-27 21:07:16 -08:00
Harry Mellor
35fb0b8613
Don't use min_pixels/max_pixels from Qwen2VL's processor ( #33208 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 05:02:08 +00:00
Harry Mellor
2eb673a088
Add flake8-implicit-str-concat rules to Ruff ( #33191 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-28 04:56:10 +00:00
Jeffrey Wang
a97b5e206d
Relax protobuf library version constraints ( #33202 )
...
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com >
2026-01-28 04:15:53 +00:00
Micah Williamson
911b51b69f
[ROCm][CI] Add TORCH_NCCL_BLOCKING_WAIT For Distributed Tests (A100) ( #32891 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-01-28 11:32:31 +08:00
Xinan Miao
604e3b87e8
[Feature]: Container image WORKDIR consistency ( #33159 )
...
Signed-off-by: SouthWest7 <am1ao@qq.com >
Co-authored-by: SouthWest7 <am1ao@qq.com >
2026-01-28 11:06:48 +08:00
Harry Mellor
706f123b23
[Docs] Use definition lists for CLI reference docs ( #33186 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com >
2026-01-28 02:22:48 +00:00
Angela Yi
fb7abfc1d0
[docs] Improve tlparse section ( #33211 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-01-28 02:07:37 +00:00
Kevin H. Luu
5d3d6e44e8
[CI] minor fixes to pipeline generator and tests ( #33151 )
...
Signed-off-by: khluu <khluu000@gmail.com >
2026-01-27 17:04:02 -08:00
Woosuk Kwon
46ec6d71c7
[Model Runner V2] Use a different stream for grammar bitmask h2d copy ( #33059 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Nick Hill <nhill@redhat.com >
2026-01-27 16:37:43 -08:00
Matthew Bonanni
e82fa448c4
Add attention benchmarking tools ( #26835 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Claude <noreply@anthropic.com >
2026-01-28 00:09:20 +00:00
Richard Zou
d9aa39a3bb
[torch.compile] Speed up MOE handling in forward_context ( #33184 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-01-27 15:17:54 -08:00
Wentao Ye
3a6d5cbefd
[Perf] Optimize dcp allocate tensor ( #33102 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-01-27 17:24:41 -05:00
linhaifeng
f5d7049cc1
[Bugfix] Fix display error (inconsistent with context) ( #33020 )
...
Signed-off-by: linhaifeng <1371675203@qq.com >
2026-01-27 20:33:29 +00:00
Alexei-V-Ivanov-AMD
3c3c547ce0
Enabling "2 node" distributed tests in the AMD CI pipeline. ( #32719 )
...
Signed-off-by: DCCS-4560 <alivanov@chi-mi325x-pod1-112.ord.vultr.cpe.ice.amd.com >
Co-authored-by: DCCS-4560 <alivanov@chi-mi325x-pod1-112.ord.vultr.cpe.ice.amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-01-27 19:13:21 +00:00
Matthew Bonanni
1cbccb6dba
[Attention] Use has_flashinfer helper ( #33177 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-27 18:33:17 +00:00
Iris
bd92089d33
feature: support eagle3 for HunyuanVL & Hunyuan ( #33035 )
...
Signed-off-by: irisliu10 <601012173@qq.com >
Signed-off-by: Iris <38269816+irisliu10@users.noreply.github.com >
2026-01-27 17:55:48 +00:00
Karan Bansal
a6760f1525
[Doc] Improve serve parameter documentation with meaningful defaults ( #33082 )
...
Signed-off-by: Karan Bansal <karanb192@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-01-27 09:19:37 -08:00
IriKa
66e601ef79
Support compress-tensors with nvfp4 or fp8 weights and modelopt with nvfp4 weights on Turing ( #33076 )
...
Signed-off-by: IriKa Qiu <qiujie.jq@gmail.com >
2026-01-27 11:04:05 -05:00
Nick Hill
0cd259b2d8
[BugFix] Fix P/D with non-MoE DP ( #33037 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-01-27 08:03:47 -08:00
danielafrimi
83fb2d09e8
Support heterogeneous NemotronHPuzzle model ( #32549 )
...
Signed-off-by: <dafrimi@nvidia.com >
Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com >
Signed-off-by: root <dafrimi@nvidia.com >
2026-01-27 10:55:54 -05:00
danisereb
f3a5ee705f
[LoRA][Spec Decode] Support LoRA for Nemotron-H MTP models ( #32265 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-01-27 07:53:26 -08:00
wang.yuqi
7cbbca9aaa
[Frontend] Cleanup api server ( #33158 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
2026-01-27 15:18:10 +00:00
omkhalil
5ec44056f7
[Metrics][MFU] Fix UnembedMetrics FLOP overcounting for prefill ( #33045 ) ( #33045 )
...
Fix UnembedMetrics to correctly count FLOPs for the unembedding (LM head) layer.
The bug: UnembedMetrics used total_num_tokens() which counts all tokens in the
batch for projection flops, vocab projections are run on just the last token for the
autoregressive use case.
Co-authored-by: Omar Mohamed Khalil <omarkhalil@meta.com >
2026-01-27 15:16:49 +00:00