Commit Graph

15143 Commits

Author SHA1 Message Date
Ben Browning
3bbe2e1e6e [Test] Consolidate tool parser unit tests to tests/tool_parsers (#37834)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
2026-03-23 04:24:25 +00:00
Augusto Yao
6e04e79326 always use embed&token_classify for bge-m3 (#37632)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-03-23 03:10:57 +00:00
Lasha Koroshinadze
e7767eccae Fix AudioFlamingo3/MusicFlamingo HF parity and RoTE handling (#37643)
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
2026-03-23 10:29:07 +08:00
Woosuk Kwon
43877a620b [MRV2] Enable PP CUDA graph test (#37830)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-03-22 16:30:25 -07:00
zhanqiuhu
63f49b8bd4 [Model Runner V2] Enable piecewise CUDA graphs for pipeline parallelism (#35162)
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
2026-03-22 20:48:25 +00:00
Woosuk Kwon
a5e9d511de [MRV2] Use FP64 for Gumbel noise (#37798)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-03-22 12:28:10 -07:00
Yongye Zhu
c058ff44d4 [Bigfix]fix lora test by pass padded size back to the layer (#37811) 2026-03-22 13:20:13 -06:00
Woosuk Kwon
ce9b1d76cf [MRV2] Skip hidden states allocation for PW CUDA graphs (#37818)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-03-22 11:47:21 -07:00
Netanel Haber
e74c17e153 Enable NemotronHPuzzle + NemotronHMTP (#37803) 2026-03-22 15:13:58 +00:00
Wentao Ye
eaf4978621 [Test] Only Run MLA model when user explicitly set for batch invariance (#37719)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-22 09:09:12 -04:00
Wentao Ye
77d24c4bfe [Bug] Fix fp8 deepgemm batch invariant (#37718)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-22 08:57:20 -04:00
Giancarlo Delfin
b3e846017d [Model Runner V2] Support multi-modal embeddings for spec decode model (#36097)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
2026-03-22 02:48:43 -07:00
Andreas Karatzas
cd1242d82a [ROCm][CI] Stabilize ROCm speech-to-text translation test with lower min acc threshold (#37723)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-22 17:32:08 +08:00
Robert Shaw
4383f1532e [MoE] Move PF Methods to Folder (#35927) 2026-03-22 02:42:59 -06:00
Andreas Karatzas
6eedec6e36 [ROCm][CI] Make some duplicated tests optional so that they are only evaluated in our nightly (#37780)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-22 16:03:18 +08:00
Andreas Karatzas
ffc8531524 [ROCm][CI] Added missing resampy dependency for MM audio tests (#37778)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-22 16:02:41 +08:00
Andreas Karatzas
6ecba840d7 [ROCm][CI] get_cu_count was renamed to num_compute_units in #35042 (#37764)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-22 16:02:21 +08:00
Andreas Karatzas
3b06c55c78 [ROCm][CI] Fix MEGA_AOT_ARTIFACT fallback when PyTorch < 2.10.0 lacks AOT support (#37763)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-22 16:02:03 +08:00
Yang Liu
b050700462 [Perf] Optimize glm4.xv VIT (#37779)
Signed-off-by: Yang <lymailforjob@gmail.com>
2026-03-22 06:12:34 +00:00
Andreas Karatzas
5dac719b2b [Bugfix] Handle libsndfile sf_error(NULL) race condition in audio fallback (#37782)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-22 13:37:29 +08:00
Andreas Karatzas
c862481c02 [CI] Skip ISAAC multimodal tests due to broken upstream HF model weights (#37781)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-22 13:23:32 +08:00
Andreas Karatzas
c86b17cfe6 [ROCm][CI] Add large_gpu_mark to test_max_tokens_none for ROCm (#37717)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-22 12:25:16 +08:00
Andreas Karatzas
66f927f205 [Bugfix] Fix pooling non-determinism from pinned prompt_lens aliasing (#37775)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-22 03:22:24 +00:00
Andreas Karatzas
e78bc74268 [ROCm][CI] close missing quote in kernels/moe block in run-amd-test.sh (#37774)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-22 09:42:34 +08:00
Robert Shaw
6b2fa3a762 [MoE] Move FlashInfer CuteDSL experts into fused_moe/experts/ (#37759)
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
2026-03-21 19:15:16 -04:00
Robert Shaw
eeee5b262d [Quantization][Deprecation] Remove PTPC FP8 (#32700)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-03-21 22:10:16 +00:00
Robert Shaw
5ad0446572 Revert "Consolidate AWQ quantization into single awq_marlin.py file" (#37768) 2026-03-21 17:20:41 -04:00
Robert Shaw
8cc700dd6a Consolidate AWQ quantization into single awq_marlin.py file
Merge awq.py and awq_marlin.py into a single file, eliminating the
circular import between them. awq.py becomes a backward-compat shim.
Follows the same structure as gptq_marlin.py.

Co-authored-by: Claude

Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
2026-03-21 17:09:17 -04:00
Brandon Pelfrey
80b70884eb Add tensor IPC transfer mechanism for multimodal data (#32104)
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com>
Signed-off-by: Brandon Pelfrey <brandonpelfrey@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-03-21 20:10:20 +00:00
Mohammad Miadh Angkad
61e381dcf0 [Perf] Add SM 10.3 (B300/GB300) all-reduce communicator tuning (#37756)
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
2026-03-21 19:43:47 +00:00
Mohammad Miadh Angkad
88f1b374f5 [Core] Enable allreduce fusion by default for SM 10.3 (B300/GB300) (#37755)
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
2026-03-21 19:40:37 +00:00
Francesco Fusco
298e510848 [Hybrid] calling get_mamba_groups() once at MambaCopyBuffers.create() (#37318)
Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com>
v0.18.1rc0
2026-03-21 09:29:43 +00:00
Chaitanya Sri Krishna Lolla
3982bc2cd0 [ROCm] Enable DeepEP ROCm as all2allbackend for AMD GPUs. (#34692)
Signed-off-by: Tej Kiran <vpolamre@amd.com>
Co-authored-by: Tej Kiran <vpolamre@amd.com>
2026-03-21 00:32:31 -07:00
Andreas Karatzas
02eec7ecbe [ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list (MI355) (#37721)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-21 15:27:12 +08:00
Bongwoo Bak
17ee641c45 [Responses API] Add kv_transfer_params for PD disaggregation (#37424)
Signed-off-by: bongwoobak <bongwoobak@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2026-03-21 13:48:54 +08:00
Andreas Karatzas
0d50fa1db6 [ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 (#37610)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-21 12:57:25 +08:00
Simon Mo
1fa1e53a73 Revert "[compile] Initialize passes at VllmBackend init" (#37733) 2026-03-20 21:35:49 -07:00
Andreas Karatzas
3ffa52009f [ROCm][CI] Guard CudaPlatform/RocmPlatform imports to fix test collection on cross-platform builds (#37617)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-21 11:58:58 +08:00
Yongye Zhu
87bd91892f [MoE Refactor] Mxfp4 oracle rebased (#37128)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 03:37:04 +00:00
Isotr0py
c7f98b4d0a [Frontend] Remove librosa from audio dependency (#37058)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-21 11:36:15 +08:00
tmm77
1c472f8fe1 Add get_device_uuid for rocm (#37694)
Signed-off-by: Tiffany Mintz <Tiffany.Mintz@amd.com>
2026-03-21 11:33:16 +08:00
Itay Alroy
c57d38d603 elastic_ep: Fix issues with repeated scale up/down cycles (#37131)
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>
2026-03-20 23:13:02 +00:00
Kaihang Jiang
e5ed6c6c13 [BugFix] Allow qk_nope_head_dim=192 in FlashInfer MLA backend checks (#37475)
Signed-off-by: Kaihang Jiang <kaihangj@nvidia.com>
2026-03-20 16:14:55 -06:00
Wentao Ye
b3d0b37908 [Refactor] Remove unused dead code (#36171)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-20 16:12:51 -06:00
Santino Ramos
85f671b8e1 [Model Runner V2] Support Streaming Inputs (#37028)
Signed-off-by: Santino Ramos <elsantinoramos@gmail.com>
2026-03-20 20:42:25 +00:00
Andreas Karatzas
8bc6b5cdb0 [ROCm][CI] Setting some mi325_4 tests back to optional (in parity with upstream) (#37711)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-20 12:25:08 -07:00
Vadim Gimpelson
4f16ebbbd3 [Bugfix] Disable monolithic TRTLLM MoE for Renormalize routing (#37591) (#37605)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2026-03-20 12:19:26 -07:00
Angela Yi
12fd17eb51 [compile] Initialize passes at VllmBackend init (#35216)
Signed-off-by: angelayi <yiangela7@gmail.com>
2026-03-20 11:40:33 -07:00
Cyrus Leung
37aadf6237 [Model] Update Kimi-K25 and Isaac processors to fit HF-style (#37693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-20 18:30:22 +00:00
Le Yang
d7d2b5e405 [Bugfix] Disable --calculate-kv-scales for hybrid GDN/Mamba+Attention… (#37565)
Signed-off-by: Young-Leo <562593859@qq.com>
2026-03-20 18:28:34 +00:00