Wentao Ye
45bd5c8e75
[Mypy] Fix mypy for vllm/config ( #37808 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-23 14:33:59 +00:00
Zhaodong Bing
10a1018c12
[ROCm] fix sleep mode not releasing GPU memory problem on ROCm ( #37533 )
...
Signed-off-by: bingzhaodong <aaab8b@gmail.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-03-23 06:07:19 -07:00
Jee Jee Li
aec2dc6c0d
[Bugfix][LoRA] Fix incorrect LoRA Log ( #37877 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-03-23 11:42:52 +00:00
DorBernsohn
7938d12119
[Bugfix] Fix CPU backend crash in KV cache block zeroing ( #37550 )
...
Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com >
2026-03-23 11:35:45 +00:00
Kunshang Ji
debd6e768c
[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle ( #37784 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-03-23 11:10:41 +00:00
Andrew Xia
9ace378a63
[Frontend][Responses API] Fix arrival_time recording for TTFT on initial request ( #37498 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
2026-03-23 09:58:08 +00:00
Kunshang Ji
27d5ee3e6f
[FP8]add FP8 WoQ kernel abstraction. ( #32929 )
...
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com >
2026-03-23 09:47:47 +00:00
wangxiyuan
35141a7eed
[Misc]Update gitignore ( #37863 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2026-03-23 01:14:10 -07:00
Chuan (Richard) Li
e99fb98867
[ROCm] Fix fused_moe_fake signature mismatch and other AITER bugs ( #36100 )
...
Signed-off-by: Li <chuali@amd.com >
2026-03-23 15:48:31 +08:00
Artem Perevedentsev
a16133a0f1
[Perf] [Bugfix] Fix Triton autotuning in inference for Qwen3.5 ( #37338 )
...
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com >
2026-03-23 00:37:58 -07:00
Hojin Yang
54ab804e87
[Bugfix] Store Qwen3Next A_log in fp32 ( #37810 )
...
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2026-03-23 15:36:57 +08:00
r266-tech
02e6efe56d
[Bugfix] JAIS: Only apply ALiBi when position_embedding_type='alibi' ( #37820 )
...
Co-authored-by: r266-tech <r266-tech@users.noreply.github.com >
2026-03-23 07:36:34 +00:00
Matthias Gehre
410d300893
[ROCm][Refactor] Enable AWQMarlinConfig on ROCm to use choose_mp_linear_kernel ( #36505 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-03-23 15:36:08 +08:00
Yan Ma
d3fe857135
update doc for online fp8 quantization ( #37851 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2026-03-23 05:19:03 +00:00
Baorun (Lauren) Mu
f85e479e66
[Feature] ViT Full CUDA Graph ( #35963 )
...
Signed-off-by: Baorun Mu <bmu@nvidia.com >
2026-03-23 13:01:10 +08:00
Jee Jee Li
1f0d210641
[CI/Build][LoRA] Update Qwen35 LoRA testing ( #37816 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-03-23 12:55:49 +08:00
Ben Browning
3bbe2e1e6e
[Test] Consolidate tool parser unit tests to tests/tool_parsers ( #37834 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2026-03-23 04:24:25 +00:00
Augusto Yao
6e04e79326
always use embed&token_classify for bge-m3 ( #37632 )
...
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com >
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-03-23 03:10:57 +00:00
Lasha Koroshinadze
e7767eccae
Fix AudioFlamingo3/MusicFlamingo HF parity and RoTE handling ( #37643 )
...
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com >
2026-03-23 10:29:07 +08:00
Woosuk Kwon
43877a620b
[MRV2] Enable PP CUDA graph test ( #37830 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-22 16:30:25 -07:00
zhanqiuhu
63f49b8bd4
[Model Runner V2] Enable piecewise CUDA graphs for pipeline parallelism ( #35162 )
...
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu >
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-22 20:48:25 +00:00
Woosuk Kwon
a5e9d511de
[MRV2] Use FP64 for Gumbel noise ( #37798 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-22 12:28:10 -07:00
Yongye Zhu
c058ff44d4
[Bigfix]fix lora test by pass padded size back to the layer ( #37811 )
2026-03-22 13:20:13 -06:00
Woosuk Kwon
ce9b1d76cf
[MRV2] Skip hidden states allocation for PW CUDA graphs ( #37818 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-22 11:47:21 -07:00
Netanel Haber
e74c17e153
Enable NemotronHPuzzle + NemotronHMTP ( #37803 )
2026-03-22 15:13:58 +00:00
Wentao Ye
eaf4978621
[Test] Only Run MLA model when user explicitly set for batch invariance ( #37719 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-22 09:09:12 -04:00
Wentao Ye
77d24c4bfe
[Bug] Fix fp8 deepgemm batch invariant ( #37718 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-03-22 08:57:20 -04:00
Giancarlo Delfin
b3e846017d
[Model Runner V2] Support multi-modal embeddings for spec decode model ( #36097 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai >
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai >
2026-03-22 02:48:43 -07:00
Andreas Karatzas
cd1242d82a
[ROCm][CI] Stabilize ROCm speech-to-text translation test with lower min acc threshold ( #37723 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 17:32:08 +08:00
Robert Shaw
4383f1532e
[MoE] Move PF Methods to Folder ( #35927 )
2026-03-22 02:42:59 -06:00
Andreas Karatzas
6eedec6e36
[ROCm][CI] Make some duplicated tests optional so that they are only evaluated in our nightly ( #37780 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 16:03:18 +08:00
Andreas Karatzas
ffc8531524
[ROCm][CI] Added missing resampy dependency for MM audio tests ( #37778 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 16:02:41 +08:00
Andreas Karatzas
6ecba840d7
[ROCm][CI] get_cu_count was renamed to num_compute_units in #35042 ( #37764 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 16:02:21 +08:00
Andreas Karatzas
3b06c55c78
[ROCm][CI] Fix MEGA_AOT_ARTIFACT fallback when PyTorch < 2.10.0 lacks AOT support ( #37763 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 16:02:03 +08:00
Yang Liu
b050700462
[Perf] Optimize glm4.xv VIT ( #37779 )
...
Signed-off-by: Yang <lymailforjob@gmail.com >
2026-03-22 06:12:34 +00:00
Andreas Karatzas
5dac719b2b
[Bugfix] Handle libsndfile sf_error(NULL) race condition in audio fallback ( #37782 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 13:37:29 +08:00
Andreas Karatzas
c862481c02
[CI] Skip ISAAC multimodal tests due to broken upstream HF model weights ( #37781 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 13:23:32 +08:00
Andreas Karatzas
c86b17cfe6
[ROCm][CI] Add large_gpu_mark to test_max_tokens_none for ROCm ( #37717 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 12:25:16 +08:00
Andreas Karatzas
66f927f205
[Bugfix] Fix pooling non-determinism from pinned prompt_lens aliasing ( #37775 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 03:22:24 +00:00
Andreas Karatzas
e78bc74268
[ROCm][CI] close missing quote in kernels/moe block in run-amd-test.sh ( #37774 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-22 09:42:34 +08:00
Robert Shaw
6b2fa3a762
[MoE] Move FlashInfer CuteDSL experts into fused_moe/experts/ ( #37759 )
...
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
2026-03-21 19:15:16 -04:00
Robert Shaw
eeee5b262d
[Quantization][Deprecation] Remove PTPC FP8 ( #32700 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-03-21 22:10:16 +00:00
Robert Shaw
5ad0446572
Revert "Consolidate AWQ quantization into single awq_marlin.py file" ( #37768 )
2026-03-21 17:20:41 -04:00
Robert Shaw
8cc700dd6a
Consolidate AWQ quantization into single awq_marlin.py file
...
Merge awq.py and awq_marlin.py into a single file, eliminating the
circular import between them. awq.py becomes a backward-compat shim.
Follows the same structure as gptq_marlin.py.
Co-authored-by: Claude
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
2026-03-21 17:09:17 -04:00
Brandon Pelfrey
80b70884eb
Add tensor IPC transfer mechanism for multimodal data ( #32104 )
...
Signed-off-by: Brandon Pelfrey <bpelfrey@nvidia.com >
Signed-off-by: Brandon Pelfrey <brandonpelfrey@gmail.com >
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-03-21 20:10:20 +00:00
Mohammad Miadh Angkad
61e381dcf0
[Perf] Add SM 10.3 (B300/GB300) all-reduce communicator tuning ( #37756 )
...
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com >
2026-03-21 19:43:47 +00:00
Mohammad Miadh Angkad
88f1b374f5
[Core] Enable allreduce fusion by default for SM 10.3 (B300/GB300) ( #37755 )
...
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com >
2026-03-21 19:40:37 +00:00
Francesco Fusco
298e510848
[Hybrid] calling get_mamba_groups() once at MambaCopyBuffers.create() ( #37318 )
...
Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com >
v0.18.1rc0
2026-03-21 09:29:43 +00:00
Chaitanya Sri Krishna Lolla
3982bc2cd0
[ROCm] Enable DeepEP ROCm as all2allbackend for AMD GPUs. ( #34692 )
...
Signed-off-by: Tej Kiran <vpolamre@amd.com >
Co-authored-by: Tej Kiran <vpolamre@amd.com >
2026-03-21 00:32:31 -07:00
Andreas Karatzas
02eec7ecbe
[ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list (MI355) ( #37721 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-03-21 15:27:12 +08:00