biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Sergey Zinchenko	5a2d420c17	[Bugfix] Use dedicated MM processor cache in /tokenize to prevent sender-cache pollution (#38545 ) Signed-off-by: Sergey Zinchenko <sergey.zinchenko.rnd@gmail.com>	2026-04-01 21:14:49 -07:00
Kevin H. Luu	1785dc5501	Revert "[Bugfix] Fix Qwen3CoderToolParser anyOf/oneOf type resolution for nullable params (#37831 )" (#38751 )	2026-04-02 06:34:28 +08:00
Jeffrey Wang	de5e6c44c6	[Feat][Executor] Introduce RayExecutorV2 (#36836 ) Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>	2026-04-01 14:34:29 -07:00
Monishver	c09ad767cd	Feature/silu block quant fusion v1 (#32996 ) Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>	2026-04-01 18:50:43 +00:00
Chauncey	cbe7d18096	[Misc] Rename think_start_str/think_end_str to reasoning_start_str/reasoning_end_str (#38242 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-04-01 09:56:45 -07:00
Michael Goin	db5d0719e1	[Kernel] Add MXFP8 to Marlin GEMM/MoE and refactor Mxfp8LinearOp (#34664 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-04-01 09:41:42 -07:00
yzong-rh	dc0428ebb8	[NIXL][BUG] Fix Triton heterogeneous TP (#37940 ) Signed-off-by: Yifan <yzong@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-04-01 17:23:15 +02:00
bnellnm	7cf56a59a2	[MoE Refactor] Make SharedExperts class for use with DefaultMoERunner (#35153 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-04-01 09:44:08 -04:00
손세정	582340f273	[Bugfix] Fix Qwen3CoderToolParser anyOf/oneOf type resolution for nullable params (#37831 ) Signed-off-by: AAISSJ <maze0717@g.skku.edu> Signed-off-by: <> Co-authored-by: 세덩 <saison@sedeong-ui-MacBookAir.local>	2026-04-01 20:22:29 +08:00
Juan Pérez de Algaba	58ee614221	(security) Enforce frame limit in VideoMediaIO (#38636 ) Signed-off-by: jperezde <jperezde@redhat.com>	2026-04-01 10:23:45 +00:00
Zhanda Zhu	c75a313824	[Perf] triton bilinear_pos_embed kernel for ViT (#37948 ) Signed-off-by: Zhanda Zhu <zhandazhu@gmail.com>	2026-04-01 01:52:02 -07:00
Lukas Geiger	4f6eed3bd4	[Core] Simplify multimodal masking (#34246 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2026-04-01 01:18:22 -07:00
Li, Jiang	36d7f19897	[CPU] Support head_size 512 in cpu_attn (#38676 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-04-01 05:42:27 +00:00
Augusto Yao	ef53395e2c	[bugfix] do not add extra linebreak for score/rerank with chat template (#38617 ) Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: wang.yuqi <noooop@126.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-04-01 04:50:07 +00:00
Lucas Wilkinson	eb47454987	[Bugfix][MLA] Add logits size budget to sparse indexer prefill chunking (#36178 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-04-01 00:15:53 -04:00
HarshRathva	17b72fd1c8	Fix priority preemption regression test in scheduler (#37051 ) Signed-off-by: HarshRathva <harshrathvaai@gmail.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-04-01 06:36:12 +03:00
Ben Browning	cb0b443274	[Misc] Add 20 regression tests for 11 tool parser bug fixes (#38172 ) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2026-04-01 03:00:31 +00:00
Luka Govedič	40bb175027	[vLLM IR] 1/N Implement IR skeleton and rms_norm op (#33825 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com> Signed-off-by: chzhang <chaojun.zhang@intel.com> Signed-off-by: Luka Govedic <luka.govedic@gmail.com> Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com> Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com> Co-authored-by: Luka Govedič <ProExpertProg@h100-01.nemg-001.lab.rdu2.dc.redhat.com>	2026-03-31 22:15:05 -04:00
Yifan Qiao	91e4521f9f	[Feat][v1] Simple yet General CPU KV Cache Offloading (#37160 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>	2026-03-31 17:58:37 -07:00
Vedant V Jhaveri	2e56975657	Generative Scoring (#34539 ) Signed-off-by: Vedant Jhaveri <vjhaveri@linkedin.com> Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-03-31 16:02:11 -07:00
Yanan Cao	cc671cb110	[Kernel] [Helion] [17/N] Add Helion kernel torch.compile support (#38592 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>	2026-03-31 17:06:42 -04:00
Wentao Ye	856589ed9a	[Refactor] Remove dead code in kv connector and model runner (#38383 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-31 17:05:23 -04:00
yzong-rh	d9b90a07ac	[MoE Refactor] Migrate Unquantized to Full Oracle Flow (#36286 ) Signed-off-by: Yifan Zong <yzong@redhat.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: yzong-rh <yzong@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-03-31 15:43:33 -04:00
Olya Kozlova	598190aac3	[fix] Remove trtllm ragged mla prefills (#36540 ) Signed-off-by: Olya Kozlova <okozlova@nvidia.com>	2026-03-31 12:30:27 -07:00
BadrBasowid	077a9a8e37	[torch.compile] Refactor Attention Quant Fusion Pass and Remove Boilerplate (#37373 ) Signed-off-by: BadrBasowid <badr.basowid@gmail.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-03-31 14:15:50 -04:00
SandishKumarHN	3896e021a0	[Bugfix] Fix FusedMoE weight loading with padded hidden dimensions (#37010 ) Signed-off-by: SandishKumarHN <sandish@fb.com>	2026-03-31 12:22:26 -04:00
Matthew Bonanni	757068dc65	[Bugfix][Async] Fix async spec decoding with hybrid models (#38556 ) Signed-off-by: SandishKumarHN <sandishkumarhn@gmail.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: SandishKumarHN <sandishkumarhn@gmail.com>	2026-03-31 11:08:54 -04:00
wliao2	4dfad17ed1	replace cuda_device_count_stateless() to current_platform.device_count() (#37841 ) Signed-off-by: Liao, Wei <wei.liao@intel.com> Signed-off-by: wliao2 <wei.liao@intel.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-31 22:32:54 +08:00
Nicolò Lucchesi	7430389669	[Bugfix][CI] Skip flaky `test_eagle` test (#38566 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-03-31 09:42:37 -04:00
Jiangyun Zhu	ea7bfde6e4	[CI] fix LM Eval Qwen3.5 Models (B200) (#38632 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-03-31 13:20:08 +00:00
Ilya Markov	abdbb68386	[EPLB] Add alternative communication for EPLB weight exchange (#33176 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Markov Ilya <markovilya19@gmail.com> Co-authored-by: Markov Ilya <markovilya19@gmail.com>	2026-03-31 08:17:12 -04:00
wang.yuqi	719735d6c5	[CI Failure] pin colmodernvbert revision (#38612 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-31 10:54:54 +00:00
Matthew Bonanni	7d65463528	[WIP][CI][Bugfix] Fix `test_run_eagle_dp` (#38584 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-31 12:30:25 +02:00
wang.yuqi	d9d21eb8e3	[Frontend][3/n] Improve pooling entrypoints \| scoring. (#28631 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-31 07:52:00 +00:00
Yintong Lu	f09daea261	[CPU] Support int8 compute mode in CPU AWQ (#35697 ) Signed-off-by: Yintong Lu <yintong.lu@intel.com>	2026-03-31 15:27:37 +08:00
zhangyiming	1ac6694297	[OOT] Add OOT support for linear kernel. (#37989 ) Signed-off-by: menogrey <1299267905@qq.com>	2026-03-31 14:33:21 +08:00
Flora Feng	d53cb9cb8e	[Tool Parser][2/3] Use self.tools instead of request.tools in tool parsers (#38189 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-31 13:41:36 +08:00
Andreas Karatzas	b9cdc85207	[ROCm][CI] Fix Whisper translation test attention backend selection (#38508 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-31 13:21:49 +08:00
SandishKumarHN	bcc6f67447	[Bugfix] Use null block (0) for padded block table entries (#35431 ) Signed-off-by: SandishKumarHN <sandish@fb.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-30 14:02:51 -07:00
Micah Williamson	d9c7db18da	[ROCm][CI] Pin test_hybrid test to TRITON_ATTN on ROCm (#38381 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-03-30 20:26:46 +00:00
Ilya Markov	12701e8af2	[EPLB] Optmize eplb mapping and record in router for prefill (#36261 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2026-03-30 19:48:33 +00:00
Benjamin Chislett	494636b29d	[Feat][Spec Decode] DFlash (#36847 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-30 15:03:15 -04:00
Chendi.Xue	3b1dbaad4e	[HMA]Fix corner case when hybrid page_size can not be evenly divided issue (blk_size=64,tp=4) (#37467 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-30 16:47:30 +00:00
Johnny	b4a2f3ac36	[NVIDIA] Bugfix NVFP4 DGX Spark and RTX50 (#38423 ) Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnynuca14@gmail.com>	2026-03-30 09:36:18 -07:00
roikoren755	8e6293e838	[Mamba] Add stochastic rounding support (#35753 ) Signed-off-by: Roi Koren <roik@nvidia.com>	2026-03-30 12:33:49 -04:00
Hongxia Yang	dbdd9ae067	[ROCm][Bugfix] fix exception related to trust_remote_code for MiniMax-M2.1-MXFP4 (#37698 ) Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com> Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>	2026-03-30 15:49:23 +00:00
Matthias Gehre	e8b055a5ac	[Bugfix] Handle ParallelLMHead in compressed-tensors get_quant_method (#37291 ) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-03-30 07:30:52 -07:00
Andreas Karatzas	677424c7ac	[Core][CI] Add opt-in media URL caching via VLLM_MEDIA_CACHE (#37123 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-30 04:58:53 -07:00
Collin McCarthy	1031c84c36	Fix ambiguous num_blocks for hybrid attn mamba (#37236 ) Signed-off-by: Collin McCarthy <cmccarthy@nvidia.com> Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-03-30 11:09:45 +00:00
aliialsaeedii	7e76af14fa	[Bugfix][Frontend] Return 400 for corrupt/truncated image inputs instead of 500 (#38253 ) Signed-off-by: aliialsaeedii <ali.al-saeedi@nscale.com>	2026-03-30 10:26:46 +00:00

1 2 3 4 5 ...

5038 Commits