biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Flora Feng	29e48707e8	[Refactor] Consolidate Tool type alias in tool_parsers/utils.py (#38265 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-31 00:55:51 +00:00
sungsoo ha	4ac227222f	[Bugfix][DCP] Fix CUDA graph capture for Decode Context Parallelism (#36070 ) Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 20:20:43 -04:00
Vadim Gimpelson	bb51d5b40d	Add @vadiklyutiy as committer (#38589 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-03-31 07:50:04 +08:00
Prathmesh Bhatt	93b3ec1585	feat(attention): extract KV-cache update from FlashAttentionDiffKV ba… (#36466 ) Signed-off-by: Prathmesh Bhatt <71340361+Prathmesh234@users.noreply.github.com>	2026-03-30 23:16:09 +00:00
Netanel Haber	e812bf70bd	Restore non-hf processor path for Nano-Nemotron-VL (bypass `call_hf_processor_mm_only`) - fixes #38018 (#38567 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>	2026-03-30 21:56:52 +00:00
SandishKumarHN	bcc6f67447	[Bugfix] Use null block (0) for padded block table entries (#35431 ) Signed-off-by: SandishKumarHN <sandish@fb.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-30 14:02:51 -07:00
Asaf Gardin	1fc69f59bb	[Bug fix][Quantization] Fix dummy weight loading (#38478 ) Signed-off-by: Josephasafg <ajgard7@gmail.com>	2026-03-30 16:38:02 -04:00
Micah Williamson	d9c7db18da	[ROCm][CI] Pin test_hybrid test to TRITON_ATTN on ROCm (#38381 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-03-30 20:26:46 +00:00
Ilya Markov	12701e8af2	[EPLB] Optmize eplb mapping and record in router for prefill (#36261 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2026-03-30 19:48:33 +00:00
Benjamin Chislett	494636b29d	[Feat][Spec Decode] DFlash (#36847 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-30 15:03:15 -04:00
mikaylagawarecki	ab1a6a43fa	[3/n] Migrate cutlass/scaled_mm_entry.cu torch stable ABI (#37221 ) Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>	2026-03-30 11:20:13 -07:00
fangyuchu	b5e608258e	[Refactor] Unify engine process monitoring in engine manager and add Ray backend support (#35862 ) Signed-off-by: fangyuchu <fangyuchu@qq.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-03-30 10:16:09 -07:00
Matthew Bonanni	2c734ed0e0	[Bugfix][MLA] Change default SM100 MLA prefill backend back to TRT-LLM (#38562 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-30 09:51:24 -07:00
Chendi.Xue	3b1dbaad4e	[HMA]Fix corner case when hybrid page_size can not be evenly divided issue (blk_size=64,tp=4) (#37467 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-30 16:47:30 +00:00
Johnny	b4a2f3ac36	[NVIDIA] Bugfix NVFP4 DGX Spark and RTX50 (#38423 ) Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnynuca14@gmail.com>	2026-03-30 09:36:18 -07:00
roikoren755	8e6293e838	[Mamba] Add stochastic rounding support (#35753 ) Signed-off-by: Roi Koren <roik@nvidia.com>	2026-03-30 12:33:49 -04:00
Hongxia Yang	dbdd9ae067	[ROCm][Bugfix] fix exception related to trust_remote_code for MiniMax-M2.1-MXFP4 (#37698 ) Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com> Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>	2026-03-30 15:49:23 +00:00
Matthias Gehre	e8b055a5ac	[Bugfix] Handle ParallelLMHead in compressed-tensors get_quant_method (#37291 ) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-03-30 07:30:52 -07:00
tomeras91	246dc7d864	[Misc] Add @tomeras91 as a maintainer of Nemotron related code + mamba block (#38547 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2026-03-30 21:12:17 +08:00
Thomas Parnell	7c3f88b2a8	[Bugfix] Remove false-positive format mismatch warnings in FLA ops (#38255 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2026-03-30 12:32:26 +00:00
Li, Jiang	6557f4937f	[Bugfix][CPU] Skip set_num_threads after thread binding (#38535 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-03-30 20:13:00 +08:00
Andreas Karatzas	677424c7ac	[Core][CI] Add opt-in media URL caching via VLLM_MEDIA_CACHE (#37123 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-30 04:58:53 -07:00
Collin McCarthy	1031c84c36	Fix ambiguous num_blocks for hybrid attn mamba (#37236 ) Signed-off-by: Collin McCarthy <cmccarthy@nvidia.com> Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-03-30 11:09:45 +00:00
aliialsaeedii	7e76af14fa	[Bugfix][Frontend] Return 400 for corrupt/truncated image inputs instead of 500 (#38253 ) Signed-off-by: aliialsaeedii <ali.al-saeedi@nscale.com>	2026-03-30 10:26:46 +00:00
yzong-rh	3683fe6c06	[Bugfix] Fix shared-object aliasing in n>1 streaming with tool calls (#38158 ) Signed-off-by: Yifan Zong <yzong@redhat.com> Signed-off-by: Yifan <yzong@redhat.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2026-03-30 10:12:13 +00:00
Nicolò Lucchesi	cc06b4e86b	[Mamba][Bugfix] Raise on insufficient cache blocks instead of silently capping cudagraph sizes (#38270 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-30 09:41:50 +00:00
TJian	03ac6ca895	[ROCm] [DOC] Update the Documentation to include ROCm Nightly Wheel support (#38457 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2026-03-30 02:25:46 -07:00
haosdent	a08b7733fd	[CI] Fix SPLADE pooler test broken by #38139 (#38495 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-03-30 07:48:33 +00:00
Tan Pin Siang	85c0950b1f	[ROCm] Enable MORI EP for unquantized MoE with AITER backend (#37529 ) Signed-off-by: Tan Pin Siang <pinsiang.tan@amd.com>	2026-03-30 15:19:33 +08:00
Juan Pérez de Algaba	57861ae48d	(security) Fix SSRF in batch runner download_bytes_from_url (#38482 ) Signed-off-by: jperezde <jperezde@redhat.com>	2026-03-30 07:10:01 +00:00
Jee Jee Li	ac30a8311e	[Bugfix][Model] Fix PixtralForConditionalGeneration LoRA (#36963 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-03-29 23:59:42 -07:00
PikaPikachu	63babd17f1	[Model][Quantization] Add GGUF support for MiniMax-M2.1 (#36965 ) Signed-off-by: kangletian <Letian.Kang@amd.com>	2026-03-30 14:24:06 +08:00
Kevin H. Luu	fec5aeca12	[ci] Soft fail and disable retry for AMD build image job (#38505 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2026-03-29 23:05:26 -07:00
Jaewon	d816834c1a	[MoE] Add RoutingMethodType.Simulated to TRT-LLM FP8/NVFP4 kernel allowlists (#38329 ) Signed-off-by: Jaewon Lee <jaewon@meta.com>	2026-03-29 22:53:43 -07:00
Roger Wang	92f0db57a8	[Misc] Always use `forward_mulmat` for `Conv3d` on newer versions of torch. (#38487 )	2026-03-30 05:39:41 +00:00
Andreas Karatzas	bea23536f6	[CI] Add temperature=0.0, reduce max_tokens, and add debug prints to audio_in_video tests (#38492 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-30 05:36:45 +00:00
Jiangyun Zhu	c133f33746	Add @ZJY0516 to CODEOWNERS (#38497 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-03-29 21:10:00 -07:00
Stanislav Kirillov	a6db99ba02	[Bugfix] Support multi-type params parsing for DeepSeek v3.2 (#33703 ) Signed-off-by: Stanislav Kirillov <stas@nebius.com> Co-authored-by: Stanislav Kirillov <stas@nebius.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2026-03-30 04:07:28 +00:00
Andreas Karatzas	4f2ed5fddb	[ROCm][CI] Enable hybrid chunked prefill test (#38317 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-30 10:30:26 +08:00
Kyle Sayers	d28d86e8a3	[QeRL] Fix online quantized reloading (#38442 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2026-03-29 14:56:41 -06:00
Wentao Ye	995dea1354	[Perf] Remove redundant device copies for CPU-only pooling token IDs, 48.9% E2E throughput improvement (#38139 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-29 18:12:50 +00:00
allgather	8c0b6267d7	[Transformers v5] fix missing pixtral/voxtral multimodal dispatch (#38410 ) Signed-off-by: allgather <all2allops@gmail.com>	2026-03-29 09:59:06 +00:00
Andreas Karatzas	43cc5138e5	[ROCm][CI] Fix cross-attention dispatch for encoder-decoder models (#38450 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-28 22:08:03 -07:00
Shubhra Pandit	5b8c30d62b	[Spec Decode, BugFix] Propagate norm_before_fc from Eagle3 speculator (#38111 ) Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>	2026-03-29 00:42:06 +00:00
haosdent	d39b8daf5f	[Feature] Add Qwen3-ForcedAligner support via token classification pooling (#35367 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-03-29 00:27:52 +00:00
Walter Beller-Morales	fafca38adc	[BugFix][Frontend] apply task instruction as system prompt in cohere v2/embed (#38362 ) Signed-off-by: walterbm <walter.beller.morales@gmail.com>	2026-03-28 18:30:54 +00:00
Kunshang Ji	aa4eb0db78	[CI]revert initialize_model context manager (#38426 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-28 16:56:50 +00:00
Andreas Karatzas	af89140efc	[ROCm][CI] Fix UV install in Dockerfile.rocm to detect curl failures and retry (#38415 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-29 00:47:42 +08:00
haosdent	b2bc736b12	[CI] Fix Ernie4.5-VL initialization test (#38429 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-03-28 22:43:24 +08:00
whyiug	58c959a767	[Misc]: clean up non-core lint issues (#37049 ) Signed-off-by: whyiug <whyiug@hotmail.com>	2026-03-28 10:28:16 -04:00

1 2 3 4 5 ...

15375 Commits