roikoren755
|
56777b5c89
|
[Test] E2E Nemotron-3-Super tests (#36803)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-03-23 17:49:56 -07:00 |
|
Ranran
|
dc6908ac6a
|
[Bugfix] Register VLLM_BATCH_INVARIANT in envs.py to fix spurious unknown env var warning (#35007)
Signed-off-by: Ranran <1012869439@qq.com>
Signed-off-by: Ranran <hzz5361@psu.edu>
Signed-off-by: ran <hzz5361@psu.edu>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-03-23 18:31:14 -04:00 |
|
Kyle Sayers
|
38364a7e32
|
[Sparse24] [Deprecation] Remove Sparse24 CT integration and kernels (#36799)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-03-23 16:03:29 -04:00 |
|
Matthew Bonanni
|
fafe76b4af
|
[Async][Spec Decoding] Zero-bubble async scheduling + spec decoding (#32951)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Co-authored-by: zhrrr <43847754+izhuhaoran@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2026-03-23 15:37:22 -04:00 |
|
Nicolò Lucchesi
|
1cbbcfe8a3
|
[CI][PD] Add Hybrid SSM integration tests to CI (#37657)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-23 23:58:19 +08:00 |
|
Baorun (Lauren) Mu
|
f85e479e66
|
[Feature] ViT Full CUDA Graph (#35963)
Signed-off-by: Baorun Mu <bmu@nvidia.com>
|
2026-03-23 13:01:10 +08:00 |
|
Jee Jee Li
|
1f0d210641
|
[CI/Build][LoRA] Update Qwen35 LoRA testing (#37816)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-23 12:55:49 +08:00 |
|
Ben Browning
|
3bbe2e1e6e
|
[Test] Consolidate tool parser unit tests to tests/tool_parsers (#37834)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
|
2026-03-23 04:24:25 +00:00 |
|
Augusto Yao
|
6e04e79326
|
always use embed&token_classify for bge-m3 (#37632)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-23 03:10:57 +00:00 |
|
Lasha Koroshinadze
|
e7767eccae
|
Fix AudioFlamingo3/MusicFlamingo HF parity and RoTE handling (#37643)
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
|
2026-03-23 10:29:07 +08:00 |
|
Wentao Ye
|
eaf4978621
|
[Test] Only Run MLA model when user explicitly set for batch invariance (#37719)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-22 09:09:12 -04:00 |
|
Andreas Karatzas
|
cd1242d82a
|
[ROCm][CI] Stabilize ROCm speech-to-text translation test with lower min acc threshold (#37723)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 17:32:08 +08:00 |
|
Robert Shaw
|
4383f1532e
|
[MoE] Move PF Methods to Folder (#35927)
|
2026-03-22 02:42:59 -06:00 |
|
Andreas Karatzas
|
6ecba840d7
|
[ROCm][CI] get_cu_count was renamed to num_compute_units in #35042 (#37764)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 16:02:21 +08:00 |
|
Andreas Karatzas
|
3b06c55c78
|
[ROCm][CI] Fix MEGA_AOT_ARTIFACT fallback when PyTorch < 2.10.0 lacks AOT support (#37763)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 16:02:03 +08:00 |
|
Andreas Karatzas
|
c862481c02
|
[CI] Skip ISAAC multimodal tests due to broken upstream HF model weights (#37781)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 13:23:32 +08:00 |
|
Andreas Karatzas
|
c86b17cfe6
|
[ROCm][CI] Add large_gpu_mark to test_max_tokens_none for ROCm (#37717)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 12:25:16 +08:00 |
|
Andreas Karatzas
|
66f927f205
|
[Bugfix] Fix pooling non-determinism from pinned prompt_lens aliasing (#37775)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 03:22:24 +00:00 |
|
Robert Shaw
|
6b2fa3a762
|
[MoE] Move FlashInfer CuteDSL experts into fused_moe/experts/ (#37759)
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
|
2026-03-21 19:15:16 -04:00 |
|
Robert Shaw
|
eeee5b262d
|
[Quantization][Deprecation] Remove PTPC FP8 (#32700)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-03-21 22:10:16 +00:00 |
|
Brandon Pelfrey
|
80b70884eb
|
Add tensor IPC transfer mechanism for multimodal data (#32104)
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Signed-off-by: Brandon Pelfrey <brandonpelfrey@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-21 20:10:20 +00:00 |
|
Francesco Fusco
|
298e510848
|
[Hybrid] calling get_mamba_groups() once at MambaCopyBuffers.create() (#37318)
Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com>
|
2026-03-21 09:29:43 +00:00 |
|
Andreas Karatzas
|
0d50fa1db6
|
[ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 (#37610)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-21 12:57:25 +08:00 |
|
Simon Mo
|
1fa1e53a73
|
Revert "[compile] Initialize passes at VllmBackend init" (#37733)
|
2026-03-20 21:35:49 -07:00 |
|
Andreas Karatzas
|
3ffa52009f
|
[ROCm][CI] Guard CudaPlatform/RocmPlatform imports to fix test collection on cross-platform builds (#37617)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-21 11:58:58 +08:00 |
|
Yongye Zhu
|
87bd91892f
|
[MoE Refactor] Mxfp4 oracle rebased (#37128)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-21 03:37:04 +00:00 |
|
Isotr0py
|
c7f98b4d0a
|
[Frontend] Remove librosa from audio dependency (#37058)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-21 11:36:15 +08:00 |
|
Santino Ramos
|
85f671b8e1
|
[Model Runner V2] Support Streaming Inputs (#37028)
Signed-off-by: Santino Ramos <elsantinoramos@gmail.com>
|
2026-03-20 20:42:25 +00:00 |
|
Vadim Gimpelson
|
4f16ebbbd3
|
[Bugfix] Disable monolithic TRTLLM MoE for Renormalize routing (#37591) (#37605)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-03-20 12:19:26 -07:00 |
|
Angela Yi
|
12fd17eb51
|
[compile] Initialize passes at VllmBackend init (#35216)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-03-20 11:40:33 -07:00 |
|
Lucas Wilkinson
|
e1d85e5c24
|
[Attention] Support distinguishing between short extends and decodes (#37303)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-03-20 10:49:36 -07:00 |
|
Xin Yang
|
d0532bf38d
|
[Perf] Eliminate redundant SparseMatrix creation in gpt_oss_triton_kernels (#37683)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-20 11:28:41 -06:00 |
|
Andreas Karatzas
|
fb4e8bf442
|
[ROCm][CI] Fix accuracy for llama-nemotron-vl pooling tests (#37613)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-20 10:16:59 -07:00 |
|
Harry Mellor
|
6ade4bc5a5
|
Fix various config related issues for Transformers v5 (#37681)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-20 16:30:12 +00:00 |
|
Zhengxu Chen
|
2e089b96a8
|
[compile] Add compiled artifact counter for VLLM_USE_MEGA_AOT_ARTIFACT=1. (#37589)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-03-20 16:22:46 +00:00 |
|
Zhengxu Chen
|
c0f5fae601
|
[compile] Fix aot test failures with torch 2.12. (#37604)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-03-20 16:06:29 +00:00 |
|
L.B.R.
|
1779c09898
|
[ROCm] Enable wvSplitK skinny GEMM kernel for RDNA4/gfx1x decode (#34709)
Signed-off-by: L.B.R. <lbr@mmonad.com>
Co-authored-by: L.B.R. <lbr@mmonad.com>
|
2026-03-20 10:11:23 -05:00 |
|
Ilya Boytsov
|
8b6c6b9505
|
[Model] Add LFM2-ColBERT-350M support (#37528)
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>
|
2026-03-20 14:57:57 +00:00 |
|
Harry Mellor
|
9f6d9dd371
|
Fix attribute error in isaac_patch_hf_runner (#37685)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-20 14:49:40 +00:00 |
|
Flora Feng
|
b4c1aef21c
|
[Refactor] Relocate tests from tests/v1/entrypoints/ to tests/entrypoints/ (#37500)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-20 02:50:34 -07:00 |
|
Flora Feng
|
6050b93bed
|
[Refactor] Move serve entrypoint tests under tests/entrypoints/serve/ (#37595)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-20 02:10:47 -07:00 |
|
Andreas Karatzas
|
5a4a179591
|
[ROCm][CI] Fix granite_speech test for gfx90a by selecting compatible attention backend (#37611)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-20 17:07:26 +08:00 |
|
Andreas Karatzas
|
9cfd4ebb5e
|
[ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list (#37619)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-20 17:06:53 +08:00 |
|
wang.yuqi
|
ed359c497a
|
[Model] Deprecate the score task (this will not affect users). (#37537)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-20 08:07:56 +00:00 |
|
Flora Feng
|
e2d1c8b5e8
|
[Refactor] Relocate entrypoint tests to match serving code structure (#37593)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-20 05:31:23 +00:00 |
|
Flora Feng
|
9040151fe1
|
[V0 Deprecation] Deprecate --disable-frontend-multiprocessing (#37612)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-20 11:31:43 +08:00 |
|
Jee Jee Li
|
8fbe3f303f
|
[Bugfix][LoRA] Fix Qwen35 LoRA (#36976)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-20 11:09:32 +08:00 |
|
Xiao
|
ea2c148fa7
|
[compile][graph_partition]Add tensor size handling (#36038)
Signed-off-by: Xiao Fu <xiaofu@meta.com>
|
2026-03-19 19:55:25 -07:00 |
|
tianshu-Michael-yu
|
269bf46d99
|
fix: disambiguate multimodal prefix cache keys (#36708)
Signed-off-by: tianshu.yu <tianshuyu.formal@gmail.com>
|
2026-03-20 10:33:20 +08:00 |
|
Flora Feng
|
be12afd284
|
[Bugfix] Fix Deepseekv32 tool parser when stream interval > 1 (#36056)
|
2026-03-19 19:51:25 -04:00 |
|