biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Lucas Wilkinson	70406eb1dc	[Attention][V0 Deprecation] Deprecate accept output buffer (#39125 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-04-07 17:14:58 -04:00
Yubo Wang	08bfedc152	[Bugfix] Fix extract_hidden_states crash with quantized KV cache dtype (#39160 ) Signed-off-by: Yubo Wang <yubowang2019@gmail.com>	2026-04-07 11:18:33 -07:00
Rishapveer Singh	da4c0e4db9	[Model] Use AutoWeightsLoader for FalconH1 (#39092 ) Signed-off-by: Rishapveer Singh <215205492+rishaps@users.noreply.github.com>	2026-04-07 16:25:17 +08:00
Netanel Haber	a9a0e0551f	nano-nemotron-vl: get_mm_max_tokens_per_item for audio, video, image == seq_len (#38727 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-04-07 00:23:29 -07:00
Netanel Haber	dfa5062a8f	NemotronH default mamba_ssm_cache_dtype=float32; enable auto-hook for NemotronHNanoVLV2Config (#39032 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-04-06 19:47:46 +00:00
bnellnm	93bada494f	[MoE Refactor] Split of DefaultMoERunner class (#35326 ) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-04-06 12:41:59 -04:00
Wentao Ye	4ae218c122	[Refactor] Remove unused dead code (#38842 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-04-06 11:52:05 -04:00
Lucas Wilkinson	47e605092b	[Gemma4] Enable Fast Prefill Optimization (#38879 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-04-06 11:19:39 -04:00
bhargav-patel-29	c5e3454e5a	[Model] Add support for BharatGen's Param2MoE model (#38000 ) Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-04-06 16:19:56 +08:00
liuchenbing2026	f6983f01de	MiniMax-M2: add Eagle3 speculative decoding support (#37512 ) Signed-off-by: liuchenbing <chenliumail@163.com> Signed-off-by: liucb <liuchengbao_work@163.com> Co-authored-by: liuchenbing <chenliumail@163.com>	2026-04-05 19:50:18 -07:00
Netanel Haber	d56e952239	nano_nemotron_vl: fix tensor device mismatch exception when video profiling (#39029 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-04-05 22:23:45 +00:00
Greg Pereira	4dd49b06f8	[Bug] Fix Import paths for `encoder_cudagraph` modules (#38997 ) Signed-off-by: greg pereira <grpereir@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-04-05 19:11:58 +00:00
lalit10	93726b2a1c	Refactor Arctic loading to use AutoWeightsLoader (#38955 ) Signed-off-by: Lalit Laxminarayan Bangad <lalitbangad@gmail.com> Co-authored-by: Lalit Laxminarayan Bangad <lalitbangad@meta.com>	2026-04-04 05:01:09 +00:00
Yongye Zhu	8617f8676b	[Bugfix] Fix DSV32 weight loading (#38870 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2026-04-03 19:57:52 -07:00
elenalil-aws	81994e1d0e	[Bugfix][LoRA] Fix missing in_proj_z in Qwen3_5ForConditionalGenerati… (#38927 ) Signed-off-by: elenalil-aws <elenalil@amazon.com>	2026-04-03 23:30:09 +00:00
Netanel Haber	fa9e68022d	Fix Nano Nemotron VL regressions (#38655 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-04-03 15:22:06 +08:00
Isotr0py	5506435419	[Misc] Clean up Gemma4 implementation (#38872 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-04-03 05:47:02 +00:00
Varun Sundar Rabindranath	2ad7c0335f	[Model] Add Phi4ForCausalLMV for microsoft/Phi-4-reasoning-vision-15B (#38306 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2026-04-02 21:14:57 -07:00
Vadim Gimpelson	771913e4a0	[Bugfix] Fix NVFP4+MTP crash: force unquantized mtp.fc for Qwen3.5 (#38832 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-04-03 04:45:57 +04:00
1096125073	71a9125c67	[New Model]: add support for telechat3 (#38510 ) Signed-off-by: xiayongqiang <xiayq1@chinatelecom.cn> Co-authored-by: xiayongqiang <xiayq1@chinatelecom.cn>	2026-04-03 08:26:22 +08:00
Nicolò Lucchesi	66e86f1dbd	[Kernel] Mamba support different layout for Conv state (#37416 )	2026-04-03 01:50:09 +02:00
Luciano Martins	08ed2b9688	feat(models): implement Google Gemma 4 architecture support (MoE, Multimodal, Reasoning, Tool-Use) (#38826 ) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Signed-off-by: Luciano Martins <lucianomartins@google.com> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2026-04-02 11:13:28 -07:00
bsliu	c0817e4d39	[Model] Add support for Cheers multimodal model (#38788 ) Signed-off-by: bsliu <1187291748@qq.com> Signed-off-by: 吴炳贤 <wubingxian24@mails.ucas.ac.cn>	2026-04-02 21:01:40 +08:00
Harry Mellor	dfe5e31689	Don't compile vision encoder for Transformers backend (#30518 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-04-02 12:42:29 +00:00
Xin Yang	9bd7231106	Revert "[Kernel] Add gpt-oss Router GEMM kernel (#37205 )" (#38778 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-04-01 22:02:32 -07:00
Benjamin Chislett	5f96f9aff1	[Perf] DSV3.2 Indexer Fused Weights Projection (#38684 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-04-02 03:34:49 +00:00
bnellnm	7cf56a59a2	[MoE Refactor] Make SharedExperts class for use with DefaultMoERunner (#35153 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-04-01 09:44:08 -04:00
Zhanda Zhu	c75a313824	[Perf] triton bilinear_pos_embed kernel for ViT (#37948 ) Signed-off-by: Zhanda Zhu <zhandazhu@gmail.com>	2026-04-01 01:52:02 -07:00
Lukas Geiger	4f6eed3bd4	[Core] Simplify multimodal masking (#34246 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2026-04-01 01:18:22 -07:00
Matthew Bonanni	116f4be405	[1/N][Cleanup] Standardize on use of `is_quantized_kv_cache` (#38659 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-04-01 04:08:01 +00:00
zhang-prog	b6e636c12c	[Fix] handle PaddleOCR-VL image processor max_pixels across Transformers v4/v5 (#38629 ) Signed-off-by: zhangyue66 <zhangyue66@baidu.com>	2026-03-31 15:50:41 +00:00
Netanel Haber	e812bf70bd	Restore non-hf processor path for Nano-Nemotron-VL (bypass `call_hf_processor_mm_only`) - fixes #38018 (#38567 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>	2026-03-30 21:56:52 +00:00
Benjamin Chislett	494636b29d	[Feat][Spec Decode] DFlash (#36847 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-30 15:03:15 -04:00
Chendi.Xue	3b1dbaad4e	[HMA]Fix corner case when hybrid page_size can not be evenly divided issue (blk_size=64,tp=4) (#37467 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-30 16:47:30 +00:00
roikoren755	8e6293e838	[Mamba] Add stochastic rounding support (#35753 ) Signed-off-by: Roi Koren <roik@nvidia.com>	2026-03-30 12:33:49 -04:00
Jee Jee Li	ac30a8311e	[Bugfix][Model] Fix PixtralForConditionalGeneration LoRA (#36963 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-03-29 23:59:42 -07:00
PikaPikachu	63babd17f1	[Model][Quantization] Add GGUF support for MiniMax-M2.1 (#36965 ) Signed-off-by: kangletian <Letian.Kang@amd.com>	2026-03-30 14:24:06 +08:00
Wentao Ye	995dea1354	[Perf] Remove redundant device copies for CPU-only pooling token IDs, 48.9% E2E throughput improvement (#38139 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-29 18:12:50 +00:00
allgather	8c0b6267d7	[Transformers v5] fix missing pixtral/voxtral multimodal dispatch (#38410 ) Signed-off-by: allgather <all2allops@gmail.com>	2026-03-29 09:59:06 +00:00
haosdent	d39b8daf5f	[Feature] Add Qwen3-ForcedAligner support via token classification pooling (#35367 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-03-29 00:27:52 +00:00
Xiaoshuang Wang	a8eab8f30d	[Model] Extract GatedDeltaNetAttention into shared layer for Qwen3Next and Qwen3.5 (#37975 ) Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: Icey <1790571317@qq.com>	2026-03-27 14:13:21 +08:00
Chuan (Richard) Li	cb2263218e	[Bugfix][Minor] Fix potential NameError in mamba backend selector and misc typos (#35886 ) Signed-off-by: Li <chuali@amd.com>	2026-03-26 11:59:24 -04:00
zhang-prog	0f5b526040	[Fix] Remove unused packing_position_embedding from PaddleOCRVL for better checkpoint compatibility (#38232 ) Signed-off-by: zhangyue66 <zhangyue66@baidu.com>	2026-03-26 15:34:49 +00:00
Jared Wen	757eafcf37	[bug-fix] GLM OCR Patch Merger context_dim (#37962 ) Signed-off-by: JaredforReal <w13431838023@gmail.com>	2026-03-26 05:11:21 -07:00
Cyrus Leung	502c41a8f6	[Model] Use helper function to run MM processors with token inputs (where applicable) (#38018 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-26 16:44:04 +08:00
Terry Gao	38de822310	[Model] Add torch.compile support for InternVL vision encoder (#38049 ) Signed-off-by: tianrengao <terrygao87@gmail.com>	2026-03-25 23:52:29 -07:00
Xin Yang	9704a5c310	Disable dual stream execution of input projection for Qwen3 (#38152 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-03-26 01:20:39 +00:00
Wei Zhao	74056039b7	Fix minimax m2.5 nvfp4 kv scales weight loading (#37214 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-03-26 00:48:06 +00:00
Harry Mellor	3c3c084240	Various Transformers v5 fixes (#38127 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-26 00:10:08 +00:00
Ekagra Ranjan	7b54f60db0	[Cohere] Enable Cohere-Transcribe (#38120 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>	2026-03-25 16:13:51 -07:00

1 2 3 4 5 ...

2522 Commits