Kyle Sayers
|
648edcf729
|
[QeRL] Compose online quantization with quantized reloading (#38032)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-03-27 13:22:33 -07:00 |
|
Gregory Shtrasberg
|
b8665383df
|
[ROCm] Fix GPT-OSS import for triton 3.6 (#37453)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-03-27 18:00:57 +00:00 |
|
Rohan Potdar
|
0e9358c11d
|
{ROCm]: gpt-oss fusion/padding fixes (#38043)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com>
Co-authored-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-27 12:19:15 -04:00 |
|
Jonas M. Kübler
|
98e7f223b9
|
enable skipping of SW attention layers when using FP8 KV cache (#33695)
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
|
2026-03-27 07:25:02 -06:00 |
|
Bowen Bao
|
0ae89f18fd
|
[Refactor] Move FusedMoE hidden_size roundup to quant_method (#34285)
Signed-off-by: Bowen Bao <bowenbao@amd.com>
|
2026-03-26 23:38:26 -07:00 |
|
Li, Jiang
|
becaed6ec8
|
[CPU] Support CT W4A16 on CPU MP kernel (#38219)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-27 14:15:28 +08:00 |
|
Xiaoshuang Wang
|
a8eab8f30d
|
[Model] Extract GatedDeltaNetAttention into shared layer for Qwen3Next and Qwen3.5 (#37975)
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: Icey <1790571317@qq.com>
|
2026-03-27 14:13:21 +08:00 |
|
Andreas Karatzas
|
a8e48a7b85
|
[CI] Fix conch kernel crash on 3D input by reshaping to 2D before GEMM (#38178)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 11:46:03 -05:00 |
|
Chuan (Richard) Li
|
cb2263218e
|
[Bugfix][Minor] Fix potential NameError in mamba backend selector and misc typos (#35886)
Signed-off-by: Li <chuali@amd.com>
|
2026-03-26 11:59:24 -04:00 |
|
zhang-prog
|
0f5b526040
|
[Fix] Remove unused packing_position_embedding from PaddleOCRVL for better checkpoint compatibility (#38232)
Signed-off-by: zhangyue66 <zhangyue66@baidu.com>
|
2026-03-26 15:34:49 +00:00 |
|
Zhewen Li
|
be1a85b7a2
|
Revert "[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration" (#38050) (#38169)
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
|
2026-03-26 07:59:09 -07:00 |
|
Jared Wen
|
757eafcf37
|
[bug-fix] GLM OCR Patch Merger context_dim (#37962)
Signed-off-by: JaredforReal <w13431838023@gmail.com>
|
2026-03-26 05:11:21 -07:00 |
|
Andreas Karatzas
|
f2d16207c7
|
[ROCm][CI] Fix flaky GPTQ compile correctness test (#38161)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 19:57:00 +08:00 |
|
Cyrus Leung
|
502c41a8f6
|
[Model] Use helper function to run MM processors with token inputs (where applicable) (#38018)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-26 16:44:04 +08:00 |
|
Vadim Gimpelson
|
52069012fe
|
[Bugfix] Fix DeepGemm E8M0 accuracy degradation for Qwen3.5 FP8 on Blackwell (#38083)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-03-26 01:21:47 -07:00 |
|
Terry Gao
|
38de822310
|
[Model] Add torch.compile support for InternVL vision encoder (#38049)
Signed-off-by: tianrengao <terrygao87@gmail.com>
|
2026-03-25 23:52:29 -07:00 |
|
BadrBasowid
|
e6bf9f15ec
|
[Bugfix][CI] Fix Marlin FP8 Linear Kernel for Compressed Tensors Format (#38092)
Signed-off-by: BadrBasowid <Badr.Basowid@gmail.com>
Signed-off-by: BadrBasowid <61441185+BadrBasowid@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-25 21:11:43 -07:00 |
|
Xin Yang
|
9704a5c310
|
Disable dual stream execution of input projection for Qwen3 (#38152)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-26 01:20:39 +00:00 |
|
Wei Zhao
|
74056039b7
|
Fix minimax m2.5 nvfp4 kv scales weight loading (#37214)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-03-26 00:48:06 +00:00 |
|
Jacob Platin
|
d7d51a7ee5
|
[Bugfix] Fix Qwen3.5-FP8 Weight Loading Error on TPU (#37348)
Signed-off-by: Jacob Platin <jacobplatin@google.com>
|
2026-03-26 00:46:01 +00:00 |
|
Harry Mellor
|
3c3c084240
|
Various Transformers v5 fixes (#38127)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-26 00:10:08 +00:00 |
|
Ekagra Ranjan
|
7b54f60db0
|
[Cohere] Enable Cohere-Transcribe (#38120)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
2026-03-25 16:13:51 -07:00 |
|
Andreas Karatzas
|
7d6917bef5
|
[ROCm] Fix MoE kernel test failures on gfx950 (#37833)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-03-25 13:46:40 -05:00 |
|
Cyrus Leung
|
ba2f0acc2d
|
[Misc] Reorganize inputs (#35182)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-25 10:22:54 -07:00 |
|
Yongye Zhu
|
678b3c99e8
|
[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration (#38050)
|
2026-03-25 10:16:40 -07:00 |
|
grYe99
|
7ac48fd357
|
[Model] Add AutoWeightsLoader support for jais (#38074)
Signed-off-by: grYe99 <guorongye99@gmail.com>
Co-authored-by: grYe99 <guorongye99@gmail.com>
|
2026-03-25 12:38:40 +00:00 |
|
Harry Mellor
|
d6bb2a9d9a
|
Fix Plamo 2/3 & LFM2 for Transformers v5 (#38090)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-25 12:29:49 +00:00 |
|
Matthias Gehre
|
a889b7f584
|
[Bugfix] Pass drafter quant_config to ParallelLMHead in Eagle3 (#37280)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
|
2026-03-25 11:42:58 +00:00 |
|
Kunshang Ji
|
14771f7150
|
[XPU] support MLA model on Intel GPU (#37143)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-25 17:43:42 +08:00 |
|
Chauncey
|
09c3dc9186
|
[Revert] Remove CUDA torch fallbacks for fp8_mqa_logits/fp8_paged_mqa_logits_torch function (#37968)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-25 06:19:37 +00:00 |
|
Artem Perevedentsev
|
a93a53f8a1
|
[Performance] Auto-enable prefetch on NFS with RAM guard (#37673)
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
|
2026-03-24 17:31:14 -07:00 |
|
Andreas Karatzas
|
679c6a3ecc
|
[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128 (#37787)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-25 08:17:33 +08:00 |
|
Nick Cao
|
935c46dd9b
|
[Model] Add Granite 4.0 1B speech to supported models (#38019)
Signed-off-by: Nick Cao <ncao@redhat.com>
|
2026-03-24 18:23:41 +00:00 |
|
Harry Mellor
|
b3601da6e7
|
[Mypy] Fix mypy for vllm/model_executor (except vllm/model_executor/layers) (#37904)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-24 17:14:01 +00:00 |
|
Li, Jiang
|
352b90c4a4
|
[Bugfix] Add replacement of _compute_slot_mapping_kernel on CPU (#37987)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-24 07:00:20 -07:00 |
|
Wentao Ye
|
c59a132f96
|
[V0 Deprecation] Refactor kv cache from list to element (#37487)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-23 20:10:11 -07:00 |
|
Ranran
|
dc6908ac6a
|
[Bugfix] Register VLLM_BATCH_INVARIANT in envs.py to fix spurious unknown env var warning (#35007)
Signed-off-by: Ranran <1012869439@qq.com>
Signed-off-by: Ranran <hzz5361@psu.edu>
Signed-off-by: ran <hzz5361@psu.edu>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-03-23 18:31:14 -04:00 |
|
yzong-rh
|
e85f8f0932
|
[Bug][MoE] Strengthen _supports_current_device() checks in the TRTLLM FP8, NVFP4, and FlashInfer CuteDSL MoE experts (#36728)
Signed-off-by: Yifan Zong <yzong@redhat.com>
|
2026-03-23 17:02:57 -04:00 |
|
Robert Shaw
|
5bf3c42d4c
|
[Bug][MoE] Fix TRTLLM NVFP4 Routing Kernel Precision (#36725)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-03-23 20:19:06 +00:00 |
|
Kyle Sayers
|
38364a7e32
|
[Sparse24] [Deprecation] Remove Sparse24 CT integration and kernels (#36799)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-03-23 16:03:29 -04:00 |
|
Yufeng He
|
ec2280611a
|
[Bugfix] Fix RoBERTa position_ids accumulation on CUDA graph padding (#37884)
|
2026-03-23 15:15:12 +00:00 |
|
Kunshang Ji
|
debd6e768c
|
[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle (#37784)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-23 11:10:41 +00:00 |
|
Kunshang Ji
|
27d5ee3e6f
|
[FP8]add FP8 WoQ kernel abstraction. (#32929)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
|
2026-03-23 09:47:47 +00:00 |
|
Chuan (Richard) Li
|
e99fb98867
|
[ROCm] Fix fused_moe_fake signature mismatch and other AITER bugs (#36100)
Signed-off-by: Li <chuali@amd.com>
|
2026-03-23 15:48:31 +08:00 |
|
Artem Perevedentsev
|
a16133a0f1
|
[Perf] [Bugfix] Fix Triton autotuning in inference for Qwen3.5 (#37338)
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
|
2026-03-23 00:37:58 -07:00 |
|
Hojin Yang
|
54ab804e87
|
[Bugfix] Store Qwen3Next A_log in fp32 (#37810)
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-03-23 15:36:57 +08:00 |
|
r266-tech
|
02e6efe56d
|
[Bugfix] JAIS: Only apply ALiBi when position_embedding_type='alibi' (#37820)
Co-authored-by: r266-tech <r266-tech@users.noreply.github.com>
|
2026-03-23 07:36:34 +00:00 |
|
Matthias Gehre
|
410d300893
|
[ROCm][Refactor] Enable AWQMarlinConfig on ROCm to use choose_mp_linear_kernel (#36505)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-03-23 15:36:08 +08:00 |
|
Baorun (Lauren) Mu
|
f85e479e66
|
[Feature] ViT Full CUDA Graph (#35963)
Signed-off-by: Baorun Mu <bmu@nvidia.com>
|
2026-03-23 13:01:10 +08:00 |
|
Lasha Koroshinadze
|
e7767eccae
|
Fix AudioFlamingo3/MusicFlamingo HF parity and RoTE handling (#37643)
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
|
2026-03-23 10:29:07 +08:00 |
|