Matthias Gehre
|
a889b7f584
|
[Bugfix] Pass drafter quant_config to ParallelLMHead in Eagle3 (#37280)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
|
2026-03-25 11:42:58 +00:00 |
|
Harry Mellor
|
ba2910f73a
|
Fix offline mode test for Transformers v5 (#38095)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-25 11:39:48 +00:00 |
|
Andreas Karatzas
|
f262a62aa1
|
[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (#37616)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-25 10:55:51 +00:00 |
|
Andreas Karatzas
|
9ac2fcafbb
|
[CI] Fix realtime WebSocket timeout deadlock and unhandled model validation errors (#37483)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-25 11:24:33 +01:00 |
|
Andreas Karatzas
|
04cec4f927
|
[ROCm][CI] Increase OpenAPI schema test timeouts (#38088)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-25 18:06:58 +08:00 |
|
Gregory Shtrasberg
|
189ddefbfd
|
[ROCm] Attention selector reordering (#36702)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
|
2026-03-25 17:42:56 +08:00 |
|
vllmellm
|
42e9547976
|
[ROCm][Test] Fix ROCM_AITER_UNIFIED_ATTN attn+quant fusion test (#37640)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-03-25 05:06:15 +00:00 |
|
Andreas Karatzas
|
679c6a3ecc
|
[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128 (#37787)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-25 08:17:33 +08:00 |
|
Terry Gao
|
82580b10ac
|
[Perf] Disable inductor runtime asserts by default for serving perfor… (#37485)
Signed-off-by: tianrengao <terrygao87@gmail.com>
Co-authored-by: Tianren Gao <tianren@fb.com>
|
2026-03-24 19:37:51 -04:00 |
|
liangel-02
|
8c47fdfdb1
|
[FlexAttention] allow custom mask mod (#37692)
Signed-off-by: Angel Li <liangel@meta.com>
|
2026-03-24 16:03:24 -04:00 |
|
Richard Zou
|
89f572dbc0
|
[BugFix] fix VLLM_USE_STANDALONE_COMPILE=0 (#38015)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-24 19:08:26 +00:00 |
|
Nick Cao
|
935c46dd9b
|
[Model] Add Granite 4.0 1B speech to supported models (#38019)
Signed-off-by: Nick Cao <ncao@redhat.com>
|
2026-03-24 18:23:41 +00:00 |
|
Dhruv Singal
|
4df5fa7439
|
[Bugfix] Force continuous usage stats when CLI override is enabled (#37923)
Signed-off-by: Your Name <you@example.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: OpenCode <noreply@openai.com>
|
2026-03-24 10:29:50 -07:00 |
|
Sungjae Lee
|
4731884796
|
[Feature] limit thinking tokens (hard limit) (#20859)
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Signed-off-by: Sungjae Lee <sung-jae.lee@navercorp.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-24 09:53:07 -07:00 |
|
wang.yuqi
|
1b6cb920e6
|
[Deprecate] Deprecate pooling multi task support. (#37956)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-03-24 14:07:47 +00:00 |
|
Ronen Schaffer
|
e3c6c10cad
|
[KV Offload] Refactor CPU offloading: pluggable CachePolicy, remove Backend abstraction, restructure into cpu/ package (#37874)
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
|
2026-03-24 07:02:51 +02:00 |
|
Wentao Ye
|
c59a132f96
|
[V0 Deprecation] Refactor kv cache from list to element (#37487)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-23 20:10:11 -07:00 |
|
roikoren755
|
56777b5c89
|
[Test] E2E Nemotron-3-Super tests (#36803)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-03-23 17:49:56 -07:00 |
|
Ranran
|
dc6908ac6a
|
[Bugfix] Register VLLM_BATCH_INVARIANT in envs.py to fix spurious unknown env var warning (#35007)
Signed-off-by: Ranran <1012869439@qq.com>
Signed-off-by: Ranran <hzz5361@psu.edu>
Signed-off-by: ran <hzz5361@psu.edu>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-03-23 18:31:14 -04:00 |
|
Kyle Sayers
|
38364a7e32
|
[Sparse24] [Deprecation] Remove Sparse24 CT integration and kernels (#36799)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-03-23 16:03:29 -04:00 |
|
Matthew Bonanni
|
fafe76b4af
|
[Async][Spec Decoding] Zero-bubble async scheduling + spec decoding (#32951)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Co-authored-by: zhrrr <43847754+izhuhaoran@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2026-03-23 15:37:22 -04:00 |
|
Nicolò Lucchesi
|
1cbbcfe8a3
|
[CI][PD] Add Hybrid SSM integration tests to CI (#37657)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-23 23:58:19 +08:00 |
|
Baorun (Lauren) Mu
|
f85e479e66
|
[Feature] ViT Full CUDA Graph (#35963)
Signed-off-by: Baorun Mu <bmu@nvidia.com>
|
2026-03-23 13:01:10 +08:00 |
|
Jee Jee Li
|
1f0d210641
|
[CI/Build][LoRA] Update Qwen35 LoRA testing (#37816)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-23 12:55:49 +08:00 |
|
Ben Browning
|
3bbe2e1e6e
|
[Test] Consolidate tool parser unit tests to tests/tool_parsers (#37834)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
|
2026-03-23 04:24:25 +00:00 |
|
Augusto Yao
|
6e04e79326
|
always use embed&token_classify for bge-m3 (#37632)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-23 03:10:57 +00:00 |
|
Lasha Koroshinadze
|
e7767eccae
|
Fix AudioFlamingo3/MusicFlamingo HF parity and RoTE handling (#37643)
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
|
2026-03-23 10:29:07 +08:00 |
|
Wentao Ye
|
eaf4978621
|
[Test] Only Run MLA model when user explicitly set for batch invariance (#37719)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-22 09:09:12 -04:00 |
|
Andreas Karatzas
|
cd1242d82a
|
[ROCm][CI] Stabilize ROCm speech-to-text translation test with lower min acc threshold (#37723)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 17:32:08 +08:00 |
|
Robert Shaw
|
4383f1532e
|
[MoE] Move PF Methods to Folder (#35927)
|
2026-03-22 02:42:59 -06:00 |
|
Andreas Karatzas
|
6ecba840d7
|
[ROCm][CI] get_cu_count was renamed to num_compute_units in #35042 (#37764)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 16:02:21 +08:00 |
|
Andreas Karatzas
|
3b06c55c78
|
[ROCm][CI] Fix MEGA_AOT_ARTIFACT fallback when PyTorch < 2.10.0 lacks AOT support (#37763)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 16:02:03 +08:00 |
|
Andreas Karatzas
|
c862481c02
|
[CI] Skip ISAAC multimodal tests due to broken upstream HF model weights (#37781)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 13:23:32 +08:00 |
|
Andreas Karatzas
|
c86b17cfe6
|
[ROCm][CI] Add large_gpu_mark to test_max_tokens_none for ROCm (#37717)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 12:25:16 +08:00 |
|
Andreas Karatzas
|
66f927f205
|
[Bugfix] Fix pooling non-determinism from pinned prompt_lens aliasing (#37775)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 03:22:24 +00:00 |
|
Robert Shaw
|
6b2fa3a762
|
[MoE] Move FlashInfer CuteDSL experts into fused_moe/experts/ (#37759)
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
|
2026-03-21 19:15:16 -04:00 |
|
Robert Shaw
|
eeee5b262d
|
[Quantization][Deprecation] Remove PTPC FP8 (#32700)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-03-21 22:10:16 +00:00 |
|
Brandon Pelfrey
|
80b70884eb
|
Add tensor IPC transfer mechanism for multimodal data (#32104)
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Signed-off-by: Brandon Pelfrey <brandonpelfrey@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-21 20:10:20 +00:00 |
|
Francesco Fusco
|
298e510848
|
[Hybrid] calling get_mamba_groups() once at MambaCopyBuffers.create() (#37318)
Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com>
|
2026-03-21 09:29:43 +00:00 |
|
Andreas Karatzas
|
0d50fa1db6
|
[ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 (#37610)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-21 12:57:25 +08:00 |
|
Simon Mo
|
1fa1e53a73
|
Revert "[compile] Initialize passes at VllmBackend init" (#37733)
|
2026-03-20 21:35:49 -07:00 |
|
Andreas Karatzas
|
3ffa52009f
|
[ROCm][CI] Guard CudaPlatform/RocmPlatform imports to fix test collection on cross-platform builds (#37617)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-21 11:58:58 +08:00 |
|
Yongye Zhu
|
87bd91892f
|
[MoE Refactor] Mxfp4 oracle rebased (#37128)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-21 03:37:04 +00:00 |
|
Isotr0py
|
c7f98b4d0a
|
[Frontend] Remove librosa from audio dependency (#37058)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-21 11:36:15 +08:00 |
|
Santino Ramos
|
85f671b8e1
|
[Model Runner V2] Support Streaming Inputs (#37028)
Signed-off-by: Santino Ramos <elsantinoramos@gmail.com>
|
2026-03-20 20:42:25 +00:00 |
|
Vadim Gimpelson
|
4f16ebbbd3
|
[Bugfix] Disable monolithic TRTLLM MoE for Renormalize routing (#37591) (#37605)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-03-20 12:19:26 -07:00 |
|
Angela Yi
|
12fd17eb51
|
[compile] Initialize passes at VllmBackend init (#35216)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-03-20 11:40:33 -07:00 |
|
Lucas Wilkinson
|
e1d85e5c24
|
[Attention] Support distinguishing between short extends and decodes (#37303)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-03-20 10:49:36 -07:00 |
|
Xin Yang
|
d0532bf38d
|
[Perf] Eliminate redundant SparseMatrix creation in gpt_oss_triton_kernels (#37683)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-20 11:28:41 -06:00 |
|
Andreas Karatzas
|
fb4e8bf442
|
[ROCm][CI] Fix accuracy for llama-nemotron-vl pooling tests (#37613)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-20 10:16:59 -07:00 |
|