shanjiaz
|
5eeba80c74
|
Adding optional speculator tests for larger models (#32943)
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
|
2026-01-29 16:54:02 +08:00 |
|
whx
|
08b1195e62
|
[PluggableLayer][2/N] Apply PluggableLayer to linear layers (#33152)
Signed-off-by: whx-sjtu <2952154980@qq.com>
|
2026-01-29 16:53:15 +08:00 |
|
cmunley1
|
3bba2edb0f
|
support returning tokenids in responses api (#33212)
Signed-off-by: Christian Munley <cmunley@nvidia.com>
|
2026-01-29 16:52:39 +08:00 |
|
Ilya Markov
|
53fc166402
|
[BugFix] Fix EPLB fail for MoeFP4 model with Marlin backend (#33262)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2026-01-29 16:52:11 +08:00 |
|
Didier Durand
|
31b25f6516
|
[Doc]: fixing multiple typos in diverse files (#33256)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-29 16:52:03 +08:00 |
|
wang.yuqi
|
abb34ac43a
|
[Bugfix] Fix Qwen3-VL-Reranker load. (#33298)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-29 08:42:53 +00:00 |
|
Pengchao Wang
|
2515bbd027
|
[CI/Build][BugFix] fix cuda/compat loading order issue in docker build (#33116)
Signed-off-by: Pengchao Wang <wpc@fb.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2026-01-29 00:19:05 -08:00 |
|
TJian
|
c487a8eef4
|
[Release] [ROCm] Remove old build step (#33316)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-01-28 23:35:51 -08:00 |
|
Kiersten Stokes
|
9e138cb01d
|
[Misc][Build] Lazy load cv2 in nemotron_parse.py (#33189)
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>
|
2026-01-29 06:55:50 +00:00 |
|
TJian
|
f9d03599ef
|
[Release] [CI] Optim release pipeline (#33156)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-01-28 22:45:42 -08:00 |
|
wangln19
|
39037d258e
|
Fix tool call indexing double-counting (#33141)
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
|
2026-01-29 05:57:09 +00:00 |
|
Cyrus Leung
|
51550179fc
|
[Refactor] Define MM data parser in processing info instead of processor itself (#33260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-29 13:55:17 +08:00 |
|
Angela Yi
|
07ea184f00
|
[ez] Delete more torch version checks <= 2.8 (#33288)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-01-29 05:28:46 +00:00 |
|
Or Ozeri
|
a663b218ae
|
[Misc] Add orozery to CODEOWNERS (core, kv_transfer, kv_offload) (#33227)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-29 04:24:20 +00:00 |
|
Michael Goin
|
1bd47d6e5a
|
[Bugfix] Register fp8 cutlass_group_gemm as supported for only SM90+SM100 (#33285)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-28 18:40:59 -08:00 |
|
Michael Goin
|
141cd43967
|
[UX] Remove noisy CT UnquantizedLinearMethod warn (#33273)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-28 16:09:30 -08:00 |
|
Nick Hill
|
6bf3b46d78
|
[ModelRunner V2] Misc code simplification and cleanup (#33266)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-28 14:41:23 -08:00 |
|
Matthew Bonanni
|
77c4f45c6c
|
[7/N][Attention][Docs] Add documentation for attention backends (#32477)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-28 17:20:22 -05:00 |
|
Michael Goin
|
ca1969186d
|
[UX] Enable nested configs in config yaml files (#33193)
|
2026-01-28 16:54:25 -05:00 |
|
Gregory Shtrasberg
|
ab597c869a
|
[Bugfix] Add missing encoder only guard for do_kv_cache_update (#33269)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-01-28 21:25:07 +00:00 |
|
Angela Yi
|
4197168ea5
|
[ez] Remove checks for torch version <= 2.8 (#33209)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-01-28 16:03:56 -05:00 |
|
Rohan Potdar
|
59bcc5b6f2
|
Use aiter triton fused_add_rmsnorm_pad for gpt-oss (#30976)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-01-28 20:47:47 +00:00 |
|
Wentao Ye
|
3e440786af
|
[Feature] Fully support for async scheduling + PP, 30.8% E2E throughput improvement, 31.8% TPOT improvement (#32618)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-28 20:30:32 +00:00 |
|
Kevin H. Luu
|
8bdd3979d8
|
[CI] Change GPU key to device key for B200 test (#33275)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-01-28 19:14:29 +00:00 |
|
Wentao Ye
|
c4e744dbd4
|
[Perf] Optimize moe_permute for CUTLASS FP8 (#32892)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-28 10:15:24 -08:00 |
|
Nicolò Lucchesi
|
8ebf372e9d
|
[CI] Whisper tests enforce_eager=False (#33098)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-28 09:36:56 -08:00 |
|
cwazai
|
f210f0b7b1
|
[lora/moe] Avoid extra intermediate buffer & Python slicing in expand phase when split_k == 1 (#32774)
Signed-off-by: 陈建华 <1647430658@qq.com>
|
2026-01-29 00:22:45 +08:00 |
|
Bin Bao
|
392c5af4fe
|
[Benchmark] Add startup benchmarking to buildkite run (#33183)
Signed-off-by: Bin Bao <binbao@meta.com>
|
2026-01-28 16:03:07 +00:00 |
|
Robert Shaw
|
af9b69f977
|
[Quantization][Deprecation] Remove Marlin 24 (#32688)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-28 15:54:59 +00:00 |
|
Chauncey
|
8e5e40daf4
|
[Misc] Provide a DeepSeek ReasoningParser with thinking enabled by default (#33221)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-28 21:16:53 +08:00 |
|
Or Ozeri
|
2e8de86777
|
Revert "Enable Cross layers KV cache layout at NIXL Connector (#30207)" (#33241)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-01-28 04:36:00 -08:00 |
|
Robert Shaw
|
247d1a32ea
|
[Quantization][Deprecation] Remove BitBlas (#32683)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-28 11:06:22 +00:00 |
|
Kevin H. Luu
|
ecb4f82209
|
[CI] Update job dependency syntax for Intel and AMD jobs (#33240)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-01-28 01:33:59 -08:00 |
|
Kevin H. Luu
|
5914090765
|
[CI] Update job dependency for hardware and CPU jobs (#33237)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-01-28 01:10:05 -08:00 |
|
Harry Mellor
|
f1acbd68c5
|
[CI] Enable mypy import following for vllm/compilation (#33199)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-28 08:59:54 +00:00 |
|
Yan Ma
|
9581185d51
|
[XPU]disable test_acceptance_length UT (#33226)
|
2026-01-28 15:24:13 +08:00 |
|
Maryam Tahhan
|
2dd359f953
|
[Docs] Simplify CPU x86 Docker build documentation (#33071)
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
|
2026-01-28 06:37:09 +00:00 |
|
Gregory Shtrasberg
|
22ad649501
|
[ROCm] Enabling forward_includes_kv_cache on ROCm MHA backends (#33106)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-01-28 14:36:14 +08:00 |
|
ramos
|
36d450e3b8
|
Adds FunAudioChat multimodal audio model support (#2) (#33058)
Signed-off-by: ramos <49182011+nemoramo@users.noreply.github.com>
Signed-off-by: mayufeng <mayufeng@example.com>
Co-authored-by: mayufeng <mayufeng@example.com>
|
2026-01-28 05:18:09 +00:00 |
|
22quinn
|
a2b877df6c
|
[Bugfix] Lazy import NgramProposer in GPU model runner (#32821)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2026-01-27 21:07:16 -08:00 |
|
Harry Mellor
|
35fb0b8613
|
Don't use min_pixels/max_pixels from Qwen2VL's processor (#33208)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-28 05:02:08 +00:00 |
|
Harry Mellor
|
2eb673a088
|
Add flake8-implicit-str-concat rules to Ruff (#33191)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-28 04:56:10 +00:00 |
|
Jeffrey Wang
|
a97b5e206d
|
Relax protobuf library version constraints (#33202)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2026-01-28 04:15:53 +00:00 |
|
Micah Williamson
|
911b51b69f
|
[ROCm][CI] Add TORCH_NCCL_BLOCKING_WAIT For Distributed Tests (A100) (#32891)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-01-28 11:32:31 +08:00 |
|
Xinan Miao
|
604e3b87e8
|
[Feature]: Container image WORKDIR consistency (#33159)
Signed-off-by: SouthWest7 <am1ao@qq.com>
Co-authored-by: SouthWest7 <am1ao@qq.com>
|
2026-01-28 11:06:48 +08:00 |
|
Harry Mellor
|
706f123b23
|
[Docs] Use definition lists for CLI reference docs (#33186)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com>
|
2026-01-28 02:22:48 +00:00 |
|
Angela Yi
|
fb7abfc1d0
|
[docs] Improve tlparse section (#33211)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-01-28 02:07:37 +00:00 |
|
Kevin H. Luu
|
5d3d6e44e8
|
[CI] minor fixes to pipeline generator and tests (#33151)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-01-27 17:04:02 -08:00 |
|
Woosuk Kwon
|
46ec6d71c7
|
[Model Runner V2] Use a different stream for grammar bitmask h2d copy (#33059)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2026-01-27 16:37:43 -08:00 |
|
Matthew Bonanni
|
e82fa448c4
|
Add attention benchmarking tools (#26835)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-01-28 00:09:20 +00:00 |
|