Andreas Karatzas
6eedec6e36
[ROCm][CI] Make some duplicated tests optional so that they are only evaluated in our nightly ( #37780 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 16:03:18 +08:00
Andreas Karatzas
ffc8531524
[ROCm][CI] Added missing resampy dependency for MM audio tests ( #37778 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 16:02:41 +08:00
Andreas Karatzas
6ecba840d7
[ROCm][CI] get_cu_count was renamed to num_compute_units in #35042 ( #37764 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 16:02:21 +08:00
Andreas Karatzas
3b06c55c78
[ROCm][CI] Fix MEGA_AOT_ARTIFACT fallback when PyTorch < 2.10.0 lacks AOT support ( #37763 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 16:02:03 +08:00
Yang Liu
b050700462
[Perf] Optimize glm4.xv VIT ( #37779 )
...
Signed-off-by: Yang <lymailforjob@gmail.com >
2026-03-22 06:12:34 +00:00
Andreas Karatzas
5dac719b2b
[Bugfix] Handle libsndfile sf_error(NULL) race condition in audio fallback ( #37782 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 13:37:29 +08:00
Andreas Karatzas
c862481c02
[CI] Skip ISAAC multimodal tests due to broken upstream HF model weights ( #37781 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 13:23:32 +08:00
Andreas Karatzas
c86b17cfe6
[ROCm][CI] Add large_gpu_mark to test_max_tokens_none for ROCm ( #37717 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 12:25:16 +08:00
Andreas Karatzas
66f927f205
[Bugfix] Fix pooling non-determinism from pinned prompt_lens aliasing ( #37775 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 03:22:24 +00:00
Andreas Karatzas
e78bc74268
[ROCm][CI] close missing quote in kernels/moe block in run-amd-test.sh ( #37774 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 09:42:34 +08:00
Robert Shaw
6b2fa3a762
[MoE] Move FlashInfer CuteDSL experts into fused_moe/experts/ ( #37759 )
...
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
2026-03-21 19:15:16 -04:00
Robert Shaw
eeee5b262d
[Quantization][Deprecation] Remove PTPC FP8 ( #32700 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-03-21 22:10:16 +00:00
Robert Shaw
5ad0446572
Revert "Consolidate AWQ quantization into single awq_marlin.py file" ( #37768 )
2026-03-21 17:20:41 -04:00
Robert Shaw
8cc700dd6a
Consolidate AWQ quantization into single awq_marlin.py file
...
Merge awq.py and awq_marlin.py into a single file, eliminating the
circular import between them. awq.py becomes a backward-compat shim.
Follows the same structure as gptq_marlin.py.
Co-authored-by: Claude
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
2026-03-21 17:09:17 -04:00
Brandon Pelfrey
80b70884eb
Add tensor IPC transfer mechanism for multimodal data ( #32104 )
...
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com >
Signed-off-by: Brandon Pelfrey <brandonpelfrey@gmail.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-03-21 20:10:20 +00:00
Mohammad Miadh Angkad
61e381dcf0
[Perf] Add SM 10.3 (B300/GB300) all-reduce communicator tuning ( #37756 )
...
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com >
2026-03-21 19:43:47 +00:00
Mohammad Miadh Angkad
88f1b374f5
[Core] Enable allreduce fusion by default for SM 10.3 (B300/GB300) ( #37755 )
...
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com >
2026-03-21 19:40:37 +00:00
Francesco Fusco
298e510848
[Hybrid] calling get_mamba_groups() once at MambaCopyBuffers.create() ( #37318 )
...
Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com >
v0.18.1rc0
2026-03-21 09:29:43 +00:00
Chaitanya Sri Krishna Lolla
3982bc2cd0
[ROCm] Enable DeepEP ROCm as all2allbackend for AMD GPUs. ( #34692 )
...
Signed-off-by: Tej Kiran <vpolamre@amd.com >
Co-authored-by: Tej Kiran <vpolamre@amd.com >
2026-03-21 00:32:31 -07:00
Andreas Karatzas
02eec7ecbe
[ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list (MI355) ( #37721 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-21 15:27:12 +08:00
Bongwoo Bak
17ee641c45
[Responses API] Add kv_transfer_params for PD disaggregation ( #37424 )
...
Signed-off-by: bongwoobak <bongwoobak@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2026-03-21 13:48:54 +08:00
Andreas Karatzas
0d50fa1db6
[ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 ( #37610 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-21 12:57:25 +08:00
Simon Mo
1fa1e53a73
Revert "[compile] Initialize passes at VllmBackend init" ( #37733 )
2026-03-20 21:35:49 -07:00
Andreas Karatzas
3ffa52009f
[ROCm][CI] Guard CudaPlatform/RocmPlatform imports to fix test collection on cross-platform builds ( #37617 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-21 11:58:58 +08:00
Yongye Zhu
87bd91892f
[MoE Refactor] Mxfp4 oracle rebased ( #37128 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-03-21 03:37:04 +00:00
Isotr0py
c7f98b4d0a
[Frontend] Remove librosa from audio dependency ( #37058 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-03-21 11:36:15 +08:00
tmm77
1c472f8fe1
Add get_device_uuid for rocm ( #37694 )
...
Signed-off-by: Tiffany Mintz <Tiffany.Mintz@amd.com >
2026-03-21 11:33:16 +08:00
Itay Alroy
c57d38d603
elastic_ep: Fix issues with repeated scale up/down cycles ( #37131 )
...
Signed-off-by: Itay Alroy <ialroy@nvidia.com >
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com >
2026-03-20 23:13:02 +00:00
Kaihang Jiang
e5ed6c6c13
[BugFix] Allow qk_nope_head_dim=192 in FlashInfer MLA backend checks ( #37475 )
...
Signed-off-by: Kaihang Jiang <kaihangj@nvidia.com >
2026-03-20 16:14:55 -06:00
Wentao Ye
b3d0b37908
[Refactor] Remove unused dead code ( #36171 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-20 16:12:51 -06:00
Santino Ramos
85f671b8e1
[Model Runner V2] Support Streaming Inputs ( #37028 )
...
Signed-off-by: Santino Ramos <elsantinoramos@gmail.com >
2026-03-20 20:42:25 +00:00
Andreas Karatzas
8bc6b5cdb0
[ROCm][CI] Setting some mi325_4 tests back to optional (in parity with upstream) ( #37711 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-20 12:25:08 -07:00
Vadim Gimpelson
4f16ebbbd3
[Bugfix] Disable monolithic TRTLLM MoE for Renormalize routing ( #37591 ) ( #37605 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-03-20 12:19:26 -07:00
Angela Yi
12fd17eb51
[compile] Initialize passes at VllmBackend init ( #35216 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2026-03-20 11:40:33 -07:00
Cyrus Leung
37aadf6237
[Model] Update Kimi-K25 and Isaac processors to fit HF-style ( #37693 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-03-20 18:30:22 +00:00
Le Yang
d7d2b5e405
[Bugfix] Disable --calculate-kv-scales for hybrid GDN/Mamba+Attention… ( #37565 )
...
Signed-off-by: Young-Leo <562593859@qq.com >
2026-03-20 18:28:34 +00:00
SherryC41
6ec5e9fd37
refactor: abstract deepgemm support into platform ( #37519 )
...
Co-authored-by: sherryC41 <sherry.c.c41@gmail.com >
2026-03-20 17:54:08 +00:00
Lucas Wilkinson
e1d85e5c24
[Attention] Support distinguishing between short extends and decodes ( #37303 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2026-03-20 10:49:36 -07:00
Peter Pan
79eb9369c5
fix CUDAGraph memory being counted twice ( #37426 )
...
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io >
Signed-off-by: Peter Pan <peter.pan@daocloud.io >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
2026-03-20 17:36:32 +00:00
Woosuk Kwon
e80cfe575d
[MRV2] Avoid recompilation of _gather_block_tables_kernel ( #37645 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-20 10:31:45 -07:00
Xin Yang
d0532bf38d
[Perf] Eliminate redundant SparseMatrix creation in gpt_oss_triton_kernels ( #37683 )
...
Signed-off-by: Xin Yang <xyangx@amazon.com >
2026-03-20 11:28:41 -06:00
Andreas Karatzas
fb4e8bf442
[ROCm][CI] Fix accuracy for llama-nemotron-vl pooling tests ( #37613 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-20 10:16:59 -07:00
Harry Mellor
6ade4bc5a5
Fix various config related issues for Transformers v5 ( #37681 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-03-20 16:30:12 +00:00
Zhengxu Chen
2e089b96a8
[compile] Add compiled artifact counter for VLLM_USE_MEGA_AOT_ARTIFACT=1. ( #37589 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-03-20 16:22:46 +00:00
Martin Hickey
880be2b1b8
[Metrics] Some small refactoring for better maintainability ( #33898 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
2026-03-20 16:11:34 +00:00
Zhengxu Chen
c0f5fae601
[compile] Fix aot test failures with torch 2.12. ( #37604 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2026-03-20 16:06:29 +00:00
Rémi Delacourt
aa84e43ccb
[Pixtral] Enable Pixtral language model support Eagle3 ( #37182 )
...
Signed-off-by: remi <remi@mistral.ai >
2026-03-20 15:50:15 +00:00
Matthias Gehre
5e806bcf54
[Bugfix] Fix ConchLinearKernel channelwise quantization (group_size=-1) ( #37329 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
2026-03-20 10:32:21 -05:00
Matthias Gehre
56a62c310c
[Bugfix] Reject channelwise quantization (group_size <= 0) in ExllamaLinearKernel ( #37331 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
2026-03-20 10:31:57 -05:00
L.B.R.
1779c09898
[ROCm] Enable wvSplitK skinny GEMM kernel for RDNA4/gfx1x decode ( #34709 )
...
Signed-off-by: L.B.R. <lbr@mmonad.com >
Co-authored-by: L.B.R. <lbr@mmonad.com >
2026-03-20 10:11:23 -05:00