Mohammad Miadh Angkad
|
61e381dcf0
|
[Perf] Add SM 10.3 (B300/GB300) all-reduce communicator tuning (#37756)
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
|
2026-03-21 19:43:47 +00:00 |
|
Mohammad Miadh Angkad
|
88f1b374f5
|
[Core] Enable allreduce fusion by default for SM 10.3 (B300/GB300) (#37755)
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
|
2026-03-21 19:40:37 +00:00 |
|
Francesco Fusco
|
298e510848
|
[Hybrid] calling get_mamba_groups() once at MambaCopyBuffers.create() (#37318)
Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com>
v0.18.1rc0
|
2026-03-21 09:29:43 +00:00 |
|
Chaitanya Sri Krishna Lolla
|
3982bc2cd0
|
[ROCm] Enable DeepEP ROCm as all2allbackend for AMD GPUs. (#34692)
Signed-off-by: Tej Kiran <vpolamre@amd.com>
Co-authored-by: Tej Kiran <vpolamre@amd.com>
|
2026-03-21 00:32:31 -07:00 |
|
Andreas Karatzas
|
02eec7ecbe
|
[ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list (MI355) (#37721)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-21 15:27:12 +08:00 |
|
Bongwoo Bak
|
17ee641c45
|
[Responses API] Add kv_transfer_params for PD disaggregation (#37424)
Signed-off-by: bongwoobak <bongwoobak@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-03-21 13:48:54 +08:00 |
|
Andreas Karatzas
|
0d50fa1db6
|
[ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 (#37610)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-21 12:57:25 +08:00 |
|
Simon Mo
|
1fa1e53a73
|
Revert "[compile] Initialize passes at VllmBackend init" (#37733)
|
2026-03-20 21:35:49 -07:00 |
|
Andreas Karatzas
|
3ffa52009f
|
[ROCm][CI] Guard CudaPlatform/RocmPlatform imports to fix test collection on cross-platform builds (#37617)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-21 11:58:58 +08:00 |
|
Yongye Zhu
|
87bd91892f
|
[MoE Refactor] Mxfp4 oracle rebased (#37128)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-21 03:37:04 +00:00 |
|
Isotr0py
|
c7f98b4d0a
|
[Frontend] Remove librosa from audio dependency (#37058)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-21 11:36:15 +08:00 |
|
tmm77
|
1c472f8fe1
|
Add get_device_uuid for rocm (#37694)
Signed-off-by: Tiffany Mintz <Tiffany.Mintz@amd.com>
|
2026-03-21 11:33:16 +08:00 |
|
Itay Alroy
|
c57d38d603
|
elastic_ep: Fix issues with repeated scale up/down cycles (#37131)
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>
|
2026-03-20 23:13:02 +00:00 |
|
Kaihang Jiang
|
e5ed6c6c13
|
[BugFix] Allow qk_nope_head_dim=192 in FlashInfer MLA backend checks (#37475)
Signed-off-by: Kaihang Jiang <kaihangj@nvidia.com>
|
2026-03-20 16:14:55 -06:00 |
|
Wentao Ye
|
b3d0b37908
|
[Refactor] Remove unused dead code (#36171)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-20 16:12:51 -06:00 |
|
Santino Ramos
|
85f671b8e1
|
[Model Runner V2] Support Streaming Inputs (#37028)
Signed-off-by: Santino Ramos <elsantinoramos@gmail.com>
|
2026-03-20 20:42:25 +00:00 |
|
Andreas Karatzas
|
8bc6b5cdb0
|
[ROCm][CI] Setting some mi325_4 tests back to optional (in parity with upstream) (#37711)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-20 12:25:08 -07:00 |
|
Vadim Gimpelson
|
4f16ebbbd3
|
[Bugfix] Disable monolithic TRTLLM MoE for Renormalize routing (#37591) (#37605)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-03-20 12:19:26 -07:00 |
|
Angela Yi
|
12fd17eb51
|
[compile] Initialize passes at VllmBackend init (#35216)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-03-20 11:40:33 -07:00 |
|
Cyrus Leung
|
37aadf6237
|
[Model] Update Kimi-K25 and Isaac processors to fit HF-style (#37693)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-20 18:30:22 +00:00 |
|
Le Yang
|
d7d2b5e405
|
[Bugfix] Disable --calculate-kv-scales for hybrid GDN/Mamba+Attention… (#37565)
Signed-off-by: Young-Leo <562593859@qq.com>
|
2026-03-20 18:28:34 +00:00 |
|
SherryC41
|
6ec5e9fd37
|
refactor: abstract deepgemm support into platform (#37519)
Co-authored-by: sherryC41 <sherry.c.c41@gmail.com>
|
2026-03-20 17:54:08 +00:00 |
|
Lucas Wilkinson
|
e1d85e5c24
|
[Attention] Support distinguishing between short extends and decodes (#37303)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-03-20 10:49:36 -07:00 |
|
Peter Pan
|
79eb9369c5
|
fix CUDAGraph memory being counted twice (#37426)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-20 17:36:32 +00:00 |
|
Woosuk Kwon
|
e80cfe575d
|
[MRV2] Avoid recompilation of _gather_block_tables_kernel (#37645)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-20 10:31:45 -07:00 |
|
Xin Yang
|
d0532bf38d
|
[Perf] Eliminate redundant SparseMatrix creation in gpt_oss_triton_kernels (#37683)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-20 11:28:41 -06:00 |
|
Andreas Karatzas
|
fb4e8bf442
|
[ROCm][CI] Fix accuracy for llama-nemotron-vl pooling tests (#37613)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-20 10:16:59 -07:00 |
|
Harry Mellor
|
6ade4bc5a5
|
Fix various config related issues for Transformers v5 (#37681)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-20 16:30:12 +00:00 |
|
Zhengxu Chen
|
2e089b96a8
|
[compile] Add compiled artifact counter for VLLM_USE_MEGA_AOT_ARTIFACT=1. (#37589)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-03-20 16:22:46 +00:00 |
|
Martin Hickey
|
880be2b1b8
|
[Metrics] Some small refactoring for better maintainability (#33898)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
|
2026-03-20 16:11:34 +00:00 |
|
Zhengxu Chen
|
c0f5fae601
|
[compile] Fix aot test failures with torch 2.12. (#37604)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-03-20 16:06:29 +00:00 |
|
Rémi Delacourt
|
aa84e43ccb
|
[Pixtral] Enable Pixtral language model support Eagle3 (#37182)
Signed-off-by: remi <remi@mistral.ai>
|
2026-03-20 15:50:15 +00:00 |
|
Matthias Gehre
|
5e806bcf54
|
[Bugfix] Fix ConchLinearKernel channelwise quantization (group_size=-1) (#37329)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
|
2026-03-20 10:32:21 -05:00 |
|
Matthias Gehre
|
56a62c310c
|
[Bugfix] Reject channelwise quantization (group_size <= 0) in ExllamaLinearKernel (#37331)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
|
2026-03-20 10:31:57 -05:00 |
|
L.B.R.
|
1779c09898
|
[ROCm] Enable wvSplitK skinny GEMM kernel for RDNA4/gfx1x decode (#34709)
Signed-off-by: L.B.R. <lbr@mmonad.com>
Co-authored-by: L.B.R. <lbr@mmonad.com>
|
2026-03-20 10:11:23 -05:00 |
|
xuebwang-amd
|
44eea10f68
|
[ROCm][Quantization] make quark ocp mx dtype parser robust for weight-only quantization (#36232)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
|
2026-03-20 10:10:03 -05:00 |
|
Ilya Boytsov
|
8b6c6b9505
|
[Model] Add LFM2-ColBERT-350M support (#37528)
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>
|
2026-03-20 14:57:57 +00:00 |
|
Harry Mellor
|
9f6d9dd371
|
Fix attribute error in isaac_patch_hf_runner (#37685)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-20 14:49:40 +00:00 |
|
Jee Jee Li
|
dd20ee4e3e
|
[UX] Enable torch_profiler_with_stack (#37571)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-20 11:17:26 +00:00 |
|
Chauncey
|
0523449c9c
|
[Misc] Use logger.info_once for auto tool choice log message (#37661)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-20 10:40:36 +00:00 |
|
Flora Feng
|
b4c1aef21c
|
[Refactor] Relocate tests from tests/v1/entrypoints/ to tests/entrypoints/ (#37500)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-20 02:50:34 -07:00 |
|
Flora Feng
|
6050b93bed
|
[Refactor] Move serve entrypoint tests under tests/entrypoints/serve/ (#37595)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-20 02:10:47 -07:00 |
|
Andreas Karatzas
|
5a4a179591
|
[ROCm][CI] Fix granite_speech test for gfx90a by selecting compatible attention backend (#37611)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-20 17:07:26 +08:00 |
|
Andreas Karatzas
|
37cd9fc107
|
[ROCm][CI] Remove deepep DBO tests on gfx90a (#37614)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-20 17:07:07 +08:00 |
|
Andreas Karatzas
|
9cfd4ebb5e
|
[ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list (#37619)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-20 17:06:53 +08:00 |
|
wang.yuqi
|
ed359c497a
|
[Model] Deprecate the score task (this will not affect users). (#37537)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-20 08:07:56 +00:00 |
|
Giancarlo Delfin
|
dcee9be95a
|
[Model Runner V2] Fix draft logits not populated during cudagraph replay (#37639)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-03-20 07:43:47 +00:00 |
|
Andreas Karatzas
|
bd8c4c0752
|
[CI] Removing deprecated rlhf examples reference (#37585)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-20 15:20:33 +08:00 |
|
Wei Zhao
|
0140eafb15
|
[Bug] Fix FlashInfer allreduce fusion workspace uninitialized error (#37461)
Signed-off-by: root <root@prenyx0169.a51.clusters.nvidia.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: <>
Co-authored-by: root <root@prenyx0169.a51.clusters.nvidia.com>
Co-authored-by: root <root@prenyx0042.a51.clusters.nvidia.com>
|
2026-03-20 03:09:21 -04:00 |
|
Kunshang Ji
|
bdf6a0a57b
|
[XPU] bump vllm-xpu-kernels to v0.1.4 (#37641)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-20 15:04:38 +08:00 |
|