biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Vadim Gimpelson	05d96d7991	merge Signed-off-by: khluu <khluu000@gmail.com>	2026-03-26 01:25:41 -07:00
Dimitrios Bariamis	ccbc5ac449	[Bugfix] Fix mock.patch resolution failure for standalone_compile.FakeTensorMode on Python <= 3.10 (#37158 ) Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> (cherry picked from commit `1204cf0a9d`)	2026-03-24 17:59:17 -07:00
khluu	bcf2be9612	[cherry-pick][Bugfix] Disable monolithic TRTLLM MoE for Renormalize routing (#37591 )#37605 Signed-off-by: khluu <khluu000@gmail.com> v0.18.0	2026-03-19 15:06:38 -07:00
Elvir Crnčević	89138b21cc	[Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442 ) Signed-off-by: Elvir Crncevic <elvircrn@gmail.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> (cherry picked from commit `ef2c4f778d`) v0.18.0rc2	2026-03-18 18:44:16 -07:00
JartX	6edd43de3c	[Bugfix][ROCm] Fix worker startup OOM on ROCm by skipping unreliable cudagraph memory profiling (#36720 ) Signed-off-by: JartX <sagformas@epdcenter.es> (cherry picked from commit `e8f9dbc369`)	2026-03-18 18:43:52 -07:00
Andreas Karatzas	16c971dbc7	[CI] Fix PaddleOCR-VL HF test failure due to create_causal_mask API rename (#37328 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> (cherry picked from commit `eaf7c9b976`)	2026-03-18 11:04:33 -07:00
khluu	262ddd0d81	[cherry-pick][Bugfix] Fix EP weight filter breaking EPLB and NVFP4 accuracy #37322 Signed-off-by: khluu <khluu000@gmail.com> v0.18.0rc1	2026-03-18 01:48:32 -07:00
Li, Jiang	e60c1674b3	[Bugfix] Avoid OpenMP thread reallocation in CPU torch compile (#37391 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> (cherry picked from commit `261801242f`)	2026-03-18 01:41:42 -07:00
Roy Wang	faa80947f5	[Performance] Add --enable-ep-weight-filter CLI option (#37351 ) Signed-off-by: esmeetu <jasonailu87@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> (cherry picked from commit `761e0aa7a0`)	2026-03-18 01:41:25 -07:00
Terry Gao	eeabf740bb	[Custom Ops] Add functional + out variant for scaled_fp4_quant (#34389 ) Signed-off-by: tianrengao <terrygao87@gmail.com> (cherry picked from commit `3e6a1e1686`)	2026-03-18 01:41:09 -07:00
Elvir Crnčević	cdcffafef8	Fix eplb nvfp4 experts hook (#37217 ) Signed-off-by: Elvir Crncevic <elvircrn@gmail.com> Signed-off-by: Elvir Crncevic <elvir@anthropic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> (cherry picked from commit `fd4d96302a`)	2026-03-18 01:40:57 -07:00
Walter Beller-Morales	4d22667c32	[Feature][Frontend] add support for Cohere Embed v2 API (#37074 ) Signed-off-by: walterbm <walter.beller.morales@gmail.com> (cherry picked from commit `061980c36a`)	2026-03-16 22:05:47 -07:00
Andreas Karatzas	1fe3932c8b	[ROCm] Fix AttributeError for torch.compiler.skip_all_guards_unsafe on older PyTorch (#37219 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> (cherry picked from commit `54a62a79f7`)	2026-03-16 21:03:49 -07:00
zhanqiuhu	2dccb38f73	[Bugfix][MultiConnector] Fix MultiConnector for SupportsHMA sub-connectors (#36549 ) v0.18.0rc0	2026-03-16 20:51:04 +00:00
Kunshang Ji	d157216093	[BUGFIX][Mamba] Use uint64 for address in KVBlockZeroer (#37197 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-16 21:39:56 +01:00
Matthew Bonanni	93f3c8e531	[Misc] Add `float16` to `CacheDType` (#37199 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-16 13:24:48 -07:00
rasmith	2cc26c3a99	[CI][BugFix][MORI][AMD] Add transfer_id to kv transfer params for test (#37213 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-03-16 13:22:57 -07:00
Flora Feng	dfa8852db2	[Refactor] Consolidate GPT-OSS reasoning parser tests (#36915 ) Signed-off-by: sfeng33 <4florafeng@gmail.com> Signed-off-by: Flora Feng <4florafeng@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-16 15:53:07 -04:00
Lucas Kabela	714c6e0eab	[torch.compile][BE] Modify cudagraph callable to check for is_forward_context_set (#36288 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-03-16 19:42:34 +00:00
Sage	0fefd00e6c	[Bugfix] Fix render server crash for quantized models on CPU-only hosts (#37215 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-16 18:59:01 +00:00
Nicolò Lucchesi	f5c081d432	[PD][Nixl] Add support for hybrid SSM-FA models (#36687 )	2026-03-16 19:58:06 +01:00
Matthew Bonanni	c88ea8338b	[MTP][Sparse MLA] Take advantage of native MTP support in indexer when possible (#36982 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-16 13:51:21 -04:00
Max de Bayser	9f9ecff4cd	Add simple granite4 tool parser (#36827 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2026-03-16 10:49:09 -07:00
haosdent	ca1954d58c	[Bugfix] Disable cross-layer KV cache for MLA attention backends (#37090 ) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-03-16 19:03:10 +02:00
Raushan Turganbay	55e6d3d5c0	[Bugfix] Make siglip/clip compatible with transformers v5 (#37200 ) Signed-off-by: raushan <raushan@huggingface.co>	2026-03-16 16:48:18 +00:00
Chauncey	6682c231fa	[Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing (#37148 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-16 16:27:47 +00:00
Itay Etelis	5ae685c1c8	[Bugfix] Relax TRTLLM KV cache contiguity assertion for cross-layer layout (#34158 ) Signed-off-by: Itay Etelis <itay.etelis@ibm.com> Co-authored-by: Itay Etelis <itay.etelis@ibm.com>	2026-03-16 11:20:51 -04:00
Wentao Ye	ce8cf9161d	[Compile] Fix compile warning `st256_cs` in `cuda_vec_utils.cuh` (#36693 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-16 11:12:15 -04:00
xjx	18be11fd59	[BUGFIX]fix CUDA OOM ERROR : invalid argument at cumem_allocator.cpp:119 (#35594 ) Signed-off-by: xjx <493337577@qq.com>	2026-03-16 15:10:42 +00:00
Yuanheng Zhao	8d8855fdae	[Bugfix] Add safety check and fallback for null scaling factor (#36106 ) Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-16 14:27:29 +00:00
Wentao Ye	e855d380fa	[Compile] Fix compile warning in `moe_permute` (#36529 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-16 10:16:14 -04:00
Benjamin Bartels	0e5a9382af	[Bugfix] accept redacted thinking blocks in Anthropic messages (#36992 ) Signed-off-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local> Signed-off-by: bbartels <benjamin@bartels.dev> Co-authored-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local>	2026-03-16 22:01:57 +08:00
Fynn Schmitt-Ulms	04bf5a35fa	[Spec Decode] Update extract_hidden_states to use deferred kv_connector clear (#37013 )	2026-03-16 14:53:45 +01:00
Tianyu Guo	43a73f853b	Remove unused EVS functions in qwen3_vl.py (#37183 ) Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>	2026-03-16 13:09:09 +00:00
Julien Denize	ffbc2e5bdb	Patch Mistral config (#37104 ) Signed-off-by: juliendenize <julien.denize@mistral.ai>	2026-03-16 12:22:18 +00:00
Lukas Geiger	f9e6db3034	[Models][Qwen3 ViT] Keep `max_seqlen` on CPU to prevent D2H sync (#37139 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-16 12:11:59 +00:00
elvischenv	d61d2b08e9	[Build] Fix API rate limit exceeded when using `VLLM_USE_PRECOMPILED=1` (#36229 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-16 12:09:27 +00:00
Artem Perevedentsev	f5e59ee7a6	[Performance] Add prefetch for checkpoints to OS page cache (#36012 ) Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>	2026-03-16 11:32:02 +00:00
Harry Mellor	9b005edc48	[Docs] Make the link to hardware plugins clearer (#37174 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-16 04:12:58 -07:00
Robin Nabel	bf9a185395	GLM4 tool parser: fix streaming mode (#35208 ) Signed-off-by: Robin Nabel <opensource@nabel.co> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2026-03-16 18:48:52 +08:00
Harry Mellor	ad041c79db	Fix text only inputs for MRoPE models with the Transformers modelling backend (#37055 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-16 10:31:16 +00:00
Kunshang Ji	747b068136	[Hardware] Replace memory related torch.cuda APIs (#37031 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>	2026-03-16 10:24:48 +00:00
Harry Mellor	122f75d939	Fix pipeline parallel with multimodal models with the Transformers modelling backend (#37057 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-16 10:20:37 +00:00
SoluMilken	d8f8a7aad2	[Misc] Sync pre-commit to 4.5.1 in workflows and docs (#36675 ) Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-16 10:03:21 +00:00
Roy Wang	0115e957d4	[Frontend][Misc] Remove unused log in `/is_sleeping` (#37093 ) Signed-off-by: esmeetu <jasonailu87@gmail.com>	2026-03-16 17:46:28 +08:00
haosdent	116ed130f4	[Bugfix] Fix GDN attention crash with mixed decode/spec-decode batches (#34871 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-03-16 10:30:23 +01:00
Vadim Gimpelson	8374387bd8	[FlashInfer] Revert block_size 16 + head_size 256 workaround on Blackwell (#36987 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-03-16 09:04:29 +00:00
Isotr0py	912fbe9555	[Bugfix] Fix Qwen2.5-Omni/Qwen3-Omni use_audio_in_video with multi-video inputs (#37147 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-16 08:56:06 +00:00
Laith Sakka	52131f88d9	use skip_all_guards_unsafe to drop global_state and torch_function_mode_stack guards instead of previous hacks (#36204 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2026-03-16 08:52:31 +00:00
Roy Wang	821eb80c0d	[Performance][Model Loader] Skip non-local expert weights during EP model loading (#37136 ) Signed-off-by: esmeetu <jasonailu87@gmail.com>	2026-03-16 01:33:36 -07:00

1 2 3 4 5 ...

14902 Commits