Giancarlo Delfin
|
8f4824b664
|
[Model Runner V2] Gather multimodal embeddings before draft model postprocess (#37932)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-03-23 18:14:13 -07:00 |
|
roikoren755
|
56777b5c89
|
[Test] E2E Nemotron-3-Super tests (#36803)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-03-23 17:49:56 -07:00 |
|
Kevin H. Luu
|
2488a82f89
|
[CI] Split V1 Others into 3 separate jobs (#37016)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-24 06:44:38 +08:00 |
|
Ranran
|
dc6908ac6a
|
[Bugfix] Register VLLM_BATCH_INVARIANT in envs.py to fix spurious unknown env var warning (#35007)
Signed-off-by: Ranran <1012869439@qq.com>
Signed-off-by: Ranran <hzz5361@psu.edu>
Signed-off-by: ran <hzz5361@psu.edu>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-03-23 18:31:14 -04:00 |
|
yzong-rh
|
e85f8f0932
|
[Bug][MoE] Strengthen _supports_current_device() checks in the TRTLLM FP8, NVFP4, and FlashInfer CuteDSL MoE experts (#36728)
Signed-off-by: Yifan Zong <yzong@redhat.com>
|
2026-03-23 17:02:57 -04:00 |
|
Robert Shaw
|
5bf3c42d4c
|
[Bug][MoE] Fix TRTLLM NVFP4 Routing Kernel Precision (#36725)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-03-23 20:19:06 +00:00 |
|
Kyle Sayers
|
38364a7e32
|
[Sparse24] [Deprecation] Remove Sparse24 CT integration and kernels (#36799)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-03-23 16:03:29 -04:00 |
|
Matthew Bonanni
|
fafe76b4af
|
[Async][Spec Decoding] Zero-bubble async scheduling + spec decoding (#32951)
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Co-authored-by: zhrrr <43847754+izhuhaoran@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2026-03-23 15:37:22 -04:00 |
|
Woosuk Kwon
|
ffb5b32b5f
|
[MRV2] Consider spec decoding in warmup (#37812)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-23 17:45:43 +00:00 |
|
Kunshang Ji
|
91fd695b75
|
[CI] split Entrypoints Integration (API Server 1) into 3 jobs (#37882)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-23 10:37:56 -07:00 |
|
Nicolò Lucchesi
|
1cbbcfe8a3
|
[CI][PD] Add Hybrid SSM integration tests to CI (#37657)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-23 23:58:19 +08:00 |
|
Angela Yi
|
aceadb5ee1
|
Use lazy graph module during split_module to defer recompile() (#37609)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-03-23 11:21:29 -04:00 |
|
Yufeng He
|
ec2280611a
|
[Bugfix] Fix RoBERTa position_ids accumulation on CUDA graph padding (#37884)
|
2026-03-23 15:15:12 +00:00 |
|
yanghui1-arch
|
7151ae6528
|
[Bugfix] RoBERTa position_id accumulation in CUDA graph padding region (#37873)
Signed-off-by: dass90 <3053034939@qq.com>
|
2026-03-23 14:59:21 +00:00 |
|
Wentao Ye
|
45bd5c8e75
|
[Mypy] Fix mypy for vllm/config (#37808)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-23 14:33:59 +00:00 |
|
Zhaodong Bing
|
10a1018c12
|
[ROCm] fix sleep mode not releasing GPU memory problem on ROCm (#37533)
Signed-off-by: bingzhaodong <aaab8b@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-03-23 06:07:19 -07:00 |
|
Jee Jee Li
|
aec2dc6c0d
|
[Bugfix][LoRA] Fix incorrect LoRA Log (#37877)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-23 11:42:52 +00:00 |
|
DorBernsohn
|
7938d12119
|
[Bugfix] Fix CPU backend crash in KV cache block zeroing (#37550)
Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>
|
2026-03-23 11:35:45 +00:00 |
|
Kunshang Ji
|
debd6e768c
|
[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle (#37784)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-23 11:10:41 +00:00 |
|
Andrew Xia
|
9ace378a63
|
[Frontend][Responses API] Fix arrival_time recording for TTFT on initial request (#37498)
Signed-off-by: Andrew Xia <axia@meta.com>
|
2026-03-23 09:58:08 +00:00 |
|
Kunshang Ji
|
27d5ee3e6f
|
[FP8]add FP8 WoQ kernel abstraction. (#32929)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
|
2026-03-23 09:47:47 +00:00 |
|
wangxiyuan
|
35141a7eed
|
[Misc]Update gitignore (#37863)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2026-03-23 01:14:10 -07:00 |
|
Chuan (Richard) Li
|
e99fb98867
|
[ROCm] Fix fused_moe_fake signature mismatch and other AITER bugs (#36100)
Signed-off-by: Li <chuali@amd.com>
|
2026-03-23 15:48:31 +08:00 |
|
Artem Perevedentsev
|
a16133a0f1
|
[Perf] [Bugfix] Fix Triton autotuning in inference for Qwen3.5 (#37338)
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
|
2026-03-23 00:37:58 -07:00 |
|
Hojin Yang
|
54ab804e87
|
[Bugfix] Store Qwen3Next A_log in fp32 (#37810)
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-03-23 15:36:57 +08:00 |
|
r266-tech
|
02e6efe56d
|
[Bugfix] JAIS: Only apply ALiBi when position_embedding_type='alibi' (#37820)
Co-authored-by: r266-tech <r266-tech@users.noreply.github.com>
|
2026-03-23 07:36:34 +00:00 |
|
Matthias Gehre
|
410d300893
|
[ROCm][Refactor] Enable AWQMarlinConfig on ROCm to use choose_mp_linear_kernel (#36505)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-03-23 15:36:08 +08:00 |
|
Yan Ma
|
d3fe857135
|
update doc for online fp8 quantization (#37851)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2026-03-23 05:19:03 +00:00 |
|
Baorun (Lauren) Mu
|
f85e479e66
|
[Feature] ViT Full CUDA Graph (#35963)
Signed-off-by: Baorun Mu <bmu@nvidia.com>
|
2026-03-23 13:01:10 +08:00 |
|
Jee Jee Li
|
1f0d210641
|
[CI/Build][LoRA] Update Qwen35 LoRA testing (#37816)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-23 12:55:49 +08:00 |
|
Ben Browning
|
3bbe2e1e6e
|
[Test] Consolidate tool parser unit tests to tests/tool_parsers (#37834)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
|
2026-03-23 04:24:25 +00:00 |
|
Augusto Yao
|
6e04e79326
|
always use embed&token_classify for bge-m3 (#37632)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-23 03:10:57 +00:00 |
|
Lasha Koroshinadze
|
e7767eccae
|
Fix AudioFlamingo3/MusicFlamingo HF parity and RoTE handling (#37643)
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
|
2026-03-23 10:29:07 +08:00 |
|
Woosuk Kwon
|
43877a620b
|
[MRV2] Enable PP CUDA graph test (#37830)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-22 16:30:25 -07:00 |
|
zhanqiuhu
|
63f49b8bd4
|
[Model Runner V2] Enable piecewise CUDA graphs for pipeline parallelism (#35162)
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-22 20:48:25 +00:00 |
|
Woosuk Kwon
|
a5e9d511de
|
[MRV2] Use FP64 for Gumbel noise (#37798)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-22 12:28:10 -07:00 |
|
Yongye Zhu
|
c058ff44d4
|
[Bigfix]fix lora test by pass padded size back to the layer (#37811)
|
2026-03-22 13:20:13 -06:00 |
|
Woosuk Kwon
|
ce9b1d76cf
|
[MRV2] Skip hidden states allocation for PW CUDA graphs (#37818)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-22 11:47:21 -07:00 |
|
Netanel Haber
|
e74c17e153
|
Enable NemotronHPuzzle + NemotronHMTP (#37803)
|
2026-03-22 15:13:58 +00:00 |
|
Wentao Ye
|
eaf4978621
|
[Test] Only Run MLA model when user explicitly set for batch invariance (#37719)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-22 09:09:12 -04:00 |
|
Wentao Ye
|
77d24c4bfe
|
[Bug] Fix fp8 deepgemm batch invariant (#37718)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-22 08:57:20 -04:00 |
|
Giancarlo Delfin
|
b3e846017d
|
[Model Runner V2] Support multi-modal embeddings for spec decode model (#36097)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-22 02:48:43 -07:00 |
|
Andreas Karatzas
|
cd1242d82a
|
[ROCm][CI] Stabilize ROCm speech-to-text translation test with lower min acc threshold (#37723)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 17:32:08 +08:00 |
|
Robert Shaw
|
4383f1532e
|
[MoE] Move PF Methods to Folder (#35927)
|
2026-03-22 02:42:59 -06:00 |
|
Andreas Karatzas
|
6eedec6e36
|
[ROCm][CI] Make some duplicated tests optional so that they are only evaluated in our nightly (#37780)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 16:03:18 +08:00 |
|
Andreas Karatzas
|
ffc8531524
|
[ROCm][CI] Added missing resampy dependency for MM audio tests (#37778)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 16:02:41 +08:00 |
|
Andreas Karatzas
|
6ecba840d7
|
[ROCm][CI] get_cu_count was renamed to num_compute_units in #35042 (#37764)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 16:02:21 +08:00 |
|
Andreas Karatzas
|
3b06c55c78
|
[ROCm][CI] Fix MEGA_AOT_ARTIFACT fallback when PyTorch < 2.10.0 lacks AOT support (#37763)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 16:02:03 +08:00 |
|
Yang Liu
|
b050700462
|
[Perf] Optimize glm4.xv VIT (#37779)
Signed-off-by: Yang <lymailforjob@gmail.com>
|
2026-03-22 06:12:34 +00:00 |
|
Andreas Karatzas
|
5dac719b2b
|
[Bugfix] Handle libsndfile sf_error(NULL) race condition in audio fallback (#37782)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-22 13:37:29 +08:00 |
|