Kunshang Ji
|
14771f7150
|
[XPU] support MLA model on Intel GPU (#37143)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-25 17:43:42 +08:00 |
|
Gregory Shtrasberg
|
189ddefbfd
|
[ROCm] Attention selector reordering (#36702)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
|
2026-03-25 17:42:56 +08:00 |
|
Chauncey
|
09c3dc9186
|
[Revert] Remove CUDA torch fallbacks for fp8_mqa_logits/fp8_paged_mqa_logits_torch function (#37968)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-25 06:19:37 +00:00 |
|
vllmellm
|
42e9547976
|
[ROCm][Test] Fix ROCM_AITER_UNIFIED_ATTN attn+quant fusion test (#37640)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-03-25 05:06:15 +00:00 |
|
Chauncey
|
a32783bb35
|
[Bugfix] Fix IndexError when accessing prev_tool_call_arr in OpenAIToolParser (#37958)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-25 12:06:21 +08:00 |
|
Baorun (Lauren) Mu
|
9d0351c91d
|
[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc (#37914)
Signed-off-by: Baorun Mu <bmu@nvidia.com>
|
2026-03-24 19:53:24 -07:00 |
|
Artem Perevedentsev
|
a93a53f8a1
|
[Performance] Auto-enable prefetch on NFS with RAM guard (#37673)
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
|
2026-03-24 17:31:14 -07:00 |
|
Andreas Karatzas
|
679c6a3ecc
|
[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128 (#37787)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-25 08:17:33 +08:00 |
|
Andreas Karatzas
|
8bbb7c7f20
|
[ROCm][CI][PD] Add Hybrid SSM integration tests to CI (#37924)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-25 07:58:39 +08:00 |
|
Kevin H. Luu
|
af945615b5
|
[release] Move the rest of release jobs to release queue (#38044)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-03-24 16:40:58 -07:00 |
|
Terry Gao
|
82580b10ac
|
[Perf] Disable inductor runtime asserts by default for serving perfor… (#37485)
Signed-off-by: tianrengao <terrygao87@gmail.com>
Co-authored-by: Tianren Gao <tianren@fb.com>
|
2026-03-24 19:37:51 -04:00 |
|
Netanel Haber
|
a0d487b2e1
|
nano_nemotron_vl: suppress readonly torch.from_numpy() warning in image and video resize paths (#37903)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-03-24 23:25:56 +00:00 |
|
Junhao
|
b73b5b0629
|
Make microbatch optimization (DBO) work with general models (#37926)
Signed-off-by: Junhao Li <junhao@ubicloud.com>
|
2026-03-24 14:40:08 -07:00 |
|
Michael Goin
|
0f0e03890e
|
[UX] Add flashinfer-cubin as CUDA default dep (#37233)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-03-24 14:13:08 -07:00 |
|
Woosuk Kwon
|
4b53740d7f
|
[MRV2] Fix for DS v3.2 (#38030)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-24 14:03:24 -07:00 |
|
Nick Hill
|
4e824d1c83
|
[Model Runner V2][Minor] Simplify PP logic (#38031)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-24 13:57:17 -07:00 |
|
amey asgaonkar
|
0c1809c806
|
Add Ubuntu 24.04 support for Docker builds (#35386)
Signed-off-by: aasgaonkar <aasgaonkar@nvidia.com>
|
2026-03-24 13:34:44 -07:00 |
|
liangel-02
|
8c47fdfdb1
|
[FlexAttention] allow custom mask mod (#37692)
Signed-off-by: Angel Li <liangel@meta.com>
|
2026-03-24 16:03:24 -04:00 |
|
Javier De Jesus
|
54b0578ada
|
[Bugfix] Pass hf_token through config loading paths for gated model support (#37920)
Signed-off-by: javierdejesusda <javier.dejesusj9@gmail.com>
|
2026-03-24 15:22:05 -04:00 |
|
Richard Zou
|
89f572dbc0
|
[BugFix] fix VLLM_USE_STANDALONE_COMPILE=0 (#38015)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-24 19:08:26 +00:00 |
|
Richard Zou
|
71a4a2fbd0
|
[BugFix] Fix order of compile logging (#38012)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-24 18:58:18 +00:00 |
|
Nick Cao
|
935c46dd9b
|
[Model] Add Granite 4.0 1B speech to supported models (#38019)
Signed-off-by: Nick Cao <ncao@redhat.com>
|
2026-03-24 18:23:41 +00:00 |
|
Willy Hardy
|
057fc94cbd
|
[Bugfix] Fix structured output crash on CPU due to pin_memory=True (#37706)
Signed-off-by: Willy Hardy <whardy@redhat.com>
Signed-off-by: Will Hardy <whardy@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-24 17:44:17 +00:00 |
|
Vineeta Tiwari
|
b58c5f28aa
|
docs: fix broken offline inference paths in documentation (#37998)
Signed-off-by: Vineeta Tiwari <vineeta.tiwari2@ibm.com>
Signed-off-by: Vineeta Tiwari <vineetatiwari2000@gmail.com>
Co-authored-by: Vineeta Tiwari <vineeta.tiwari2@ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-24 17:35:14 +00:00 |
|
Ming Yang
|
c07e2ca6e0
|
Fix Mamba state corruption from referencing stale block table entries (#37728) (#37728) (#37728)
|
2026-03-24 10:29:59 -07:00 |
|
Dhruv Singal
|
4df5fa7439
|
[Bugfix] Force continuous usage stats when CLI override is enabled (#37923)
Signed-off-by: Your Name <you@example.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: OpenCode <noreply@openai.com>
|
2026-03-24 10:29:50 -07:00 |
|
sihao_li
|
a5416bc52e
|
[XPU] Support Intel XPU hardware information collection in usage stats (#37964)
Signed-off-by: sihao.li <sihao.li@intel.com>
|
2026-03-24 10:29:17 -07:00 |
|
Harry Mellor
|
b3601da6e7
|
[Mypy] Fix mypy for vllm/model_executor (except vllm/model_executor/layers) (#37904)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-24 17:14:01 +00:00 |
|
Dan Blanaru
|
dc78c2c933
|
[Core] add option to schedule requests based on full ISL (#37307)
Signed-off-by: Dan Blanaru <48605845+DanBlanaru@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-03-24 13:01:12 -04:00 |
|
Sungjae Lee
|
4731884796
|
[Feature] limit thinking tokens (hard limit) (#20859)
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Signed-off-by: Sungjae Lee <sung-jae.lee@navercorp.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-24 09:53:07 -07:00 |
|
Harry Mellor
|
8de5261e69
|
Update new contributor message (#37999)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-24 16:01:41 +00:00 |
|
wang.yuqi
|
1b6cb920e6
|
[Deprecate] Deprecate pooling multi task support. (#37956)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-03-24 14:07:47 +00:00 |
|
Li, Jiang
|
352b90c4a4
|
[Bugfix] Add replacement of _compute_slot_mapping_kernel on CPU (#37987)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-24 07:00:20 -07:00 |
|
Sage
|
1c0aabdeb0
|
[Bugfix] Suppress spurious CPU KV cache warning in launch render (#37911)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
2026-03-24 12:36:18 +00:00 |
|
Ilya Markov
|
14acf429ac
|
[EPLB] Remove main waits in case of slow EPLB (#36271)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2026-03-24 11:50:44 +00:00 |
|
Harry Mellor
|
ce57fd5557
|
[Docs] Fix build (#37991)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-24 03:20:49 -07:00 |
|
Flora Feng
|
2e67fa756d
|
Fix tool_parser_cls type annotation from Callable to type[ToolParser] (#37957)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-23 22:58:27 -07:00 |
|
Ronen Schaffer
|
e3c6c10cad
|
[KV Offload] Refactor CPU offloading: pluggable CachePolicy, remove Backend abstraction, restructure into cpu/ package (#37874)
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
|
2026-03-24 07:02:51 +02:00 |
|
jetxa
|
16a664df24
|
[Frontend][Bugfix] Pass default_chat_template_kwargs to AnthropicServingMessages (#37899)
Signed-off-by: jetxa <jetxzhang@outlook.com>
|
2026-03-24 05:00:12 +00:00 |
|
Kevin H. Luu
|
7281199a8c
|
[release] Move agent queue to Release cluster queues (#37783)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-03-23 20:36:47 -07:00 |
|
Kevin H. Luu
|
b2dd75eb48
|
Downsize CPU jobs to use small queue (#37913)
Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-03-23 20:36:37 -07:00 |
|
Wentao Ye
|
c59a132f96
|
[V0 Deprecation] Refactor kv cache from list to element (#37487)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-23 20:10:11 -07:00 |
|
Andreas Karatzas
|
de99d91ece
|
[ROCm][CI] Split Entrypoints Integration (API Server 1) into 3 jobs (#37906)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-24 09:48:37 +08:00 |
|
Wentao Ye
|
83c9d525b6
|
[CI] Add batch invariant test: Block FP8 + small MOE (#37895)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-23 21:16:14 -04:00 |
|
Giancarlo Delfin
|
8f4824b664
|
[Model Runner V2] Gather multimodal embeddings before draft model postprocess (#37932)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-03-23 18:14:13 -07:00 |
|
roikoren755
|
56777b5c89
|
[Test] E2E Nemotron-3-Super tests (#36803)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-03-23 17:49:56 -07:00 |
|
Kevin H. Luu
|
2488a82f89
|
[CI] Split V1 Others into 3 separate jobs (#37016)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-24 06:44:38 +08:00 |
|
Ranran
|
dc6908ac6a
|
[Bugfix] Register VLLM_BATCH_INVARIANT in envs.py to fix spurious unknown env var warning (#35007)
Signed-off-by: Ranran <1012869439@qq.com>
Signed-off-by: Ranran <hzz5361@psu.edu>
Signed-off-by: ran <hzz5361@psu.edu>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-03-23 18:31:14 -04:00 |
|
yzong-rh
|
e85f8f0932
|
[Bug][MoE] Strengthen _supports_current_device() checks in the TRTLLM FP8, NVFP4, and FlashInfer CuteDSL MoE experts (#36728)
Signed-off-by: Yifan Zong <yzong@redhat.com>
|
2026-03-23 17:02:57 -04:00 |
|
Robert Shaw
|
5bf3c42d4c
|
[Bug][MoE] Fix TRTLLM NVFP4 Routing Kernel Precision (#36725)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-03-23 20:19:06 +00:00 |
|