biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
zhang-prog	b6e636c12c	[Fix] handle PaddleOCR-VL image processor max_pixels across Transformers v4/v5 (#38629 ) Signed-off-by: zhangyue66 <zhangyue66@baidu.com>	2026-03-31 15:50:41 +00:00
Netanel Haber	e812bf70bd	Restore non-hf processor path for Nano-Nemotron-VL (bypass `call_hf_processor_mm_only`) - fixes #38018 (#38567 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>	2026-03-30 21:56:52 +00:00
Benjamin Chislett	494636b29d	[Feat][Spec Decode] DFlash (#36847 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-30 15:03:15 -04:00
Chendi.Xue	3b1dbaad4e	[HMA]Fix corner case when hybrid page_size can not be evenly divided issue (blk_size=64,tp=4) (#37467 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-30 16:47:30 +00:00
roikoren755	8e6293e838	[Mamba] Add stochastic rounding support (#35753 ) Signed-off-by: Roi Koren <roik@nvidia.com>	2026-03-30 12:33:49 -04:00
Jee Jee Li	ac30a8311e	[Bugfix][Model] Fix PixtralForConditionalGeneration LoRA (#36963 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-03-29 23:59:42 -07:00
PikaPikachu	63babd17f1	[Model][Quantization] Add GGUF support for MiniMax-M2.1 (#36965 ) Signed-off-by: kangletian <Letian.Kang@amd.com>	2026-03-30 14:24:06 +08:00
Wentao Ye	995dea1354	[Perf] Remove redundant device copies for CPU-only pooling token IDs, 48.9% E2E throughput improvement (#38139 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-29 18:12:50 +00:00
allgather	8c0b6267d7	[Transformers v5] fix missing pixtral/voxtral multimodal dispatch (#38410 ) Signed-off-by: allgather <all2allops@gmail.com>	2026-03-29 09:59:06 +00:00
haosdent	d39b8daf5f	[Feature] Add Qwen3-ForcedAligner support via token classification pooling (#35367 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-03-29 00:27:52 +00:00
Xiaoshuang Wang	a8eab8f30d	[Model] Extract GatedDeltaNetAttention into shared layer for Qwen3Next and Qwen3.5 (#37975 ) Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: Icey <1790571317@qq.com>	2026-03-27 14:13:21 +08:00
Chuan (Richard) Li	cb2263218e	[Bugfix][Minor] Fix potential NameError in mamba backend selector and misc typos (#35886 ) Signed-off-by: Li <chuali@amd.com>	2026-03-26 11:59:24 -04:00
zhang-prog	0f5b526040	[Fix] Remove unused packing_position_embedding from PaddleOCRVL for better checkpoint compatibility (#38232 ) Signed-off-by: zhangyue66 <zhangyue66@baidu.com>	2026-03-26 15:34:49 +00:00
Jared Wen	757eafcf37	[bug-fix] GLM OCR Patch Merger context_dim (#37962 ) Signed-off-by: JaredforReal <w13431838023@gmail.com>	2026-03-26 05:11:21 -07:00
Cyrus Leung	502c41a8f6	[Model] Use helper function to run MM processors with token inputs (where applicable) (#38018 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-26 16:44:04 +08:00
Terry Gao	38de822310	[Model] Add torch.compile support for InternVL vision encoder (#38049 ) Signed-off-by: tianrengao <terrygao87@gmail.com>	2026-03-25 23:52:29 -07:00
Xin Yang	9704a5c310	Disable dual stream execution of input projection for Qwen3 (#38152 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-03-26 01:20:39 +00:00
Wei Zhao	74056039b7	Fix minimax m2.5 nvfp4 kv scales weight loading (#37214 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-03-26 00:48:06 +00:00
Harry Mellor	3c3c084240	Various Transformers v5 fixes (#38127 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-26 00:10:08 +00:00
Ekagra Ranjan	7b54f60db0	[Cohere] Enable Cohere-Transcribe (#38120 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>	2026-03-25 16:13:51 -07:00
Cyrus Leung	ba2f0acc2d	[Misc] Reorganize inputs (#35182 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-25 10:22:54 -07:00
grYe99	7ac48fd357	[Model] Add AutoWeightsLoader support for jais (#38074 ) Signed-off-by: grYe99 <guorongye99@gmail.com> Co-authored-by: grYe99 <guorongye99@gmail.com>	2026-03-25 12:38:40 +00:00
Harry Mellor	d6bb2a9d9a	Fix Plamo 2/3 & LFM2 for Transformers v5 (#38090 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-25 12:29:49 +00:00
Matthias Gehre	a889b7f584	[Bugfix] Pass drafter quant_config to ParallelLMHead in Eagle3 (#37280 ) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>	2026-03-25 11:42:58 +00:00
Nick Cao	935c46dd9b	[Model] Add Granite 4.0 1B speech to supported models (#38019 ) Signed-off-by: Nick Cao <ncao@redhat.com>	2026-03-24 18:23:41 +00:00
Wentao Ye	c59a132f96	[V0 Deprecation] Refactor kv cache from list to element (#37487 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-23 20:10:11 -07:00
Yufeng He	ec2280611a	[Bugfix] Fix RoBERTa position_ids accumulation on CUDA graph padding (#37884 )	2026-03-23 15:15:12 +00:00
Artem Perevedentsev	a16133a0f1	[Perf] [Bugfix] Fix Triton autotuning in inference for Qwen3.5 (#37338 ) Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>	2026-03-23 00:37:58 -07:00
Hojin Yang	54ab804e87	[Bugfix] Store Qwen3Next A_log in fp32 (#37810 ) Signed-off-by: effortprogrammer <yhjhoward7@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-03-23 15:36:57 +08:00
r266-tech	02e6efe56d	[Bugfix] JAIS: Only apply ALiBi when position_embedding_type='alibi' (#37820 ) Co-authored-by: r266-tech <r266-tech@users.noreply.github.com>	2026-03-23 07:36:34 +00:00
Baorun (Lauren) Mu	f85e479e66	[Feature] ViT Full CUDA Graph (#35963 ) Signed-off-by: Baorun Mu <bmu@nvidia.com>	2026-03-23 13:01:10 +08:00
Lasha Koroshinadze	e7767eccae	Fix AudioFlamingo3/MusicFlamingo HF parity and RoTE handling (#37643 ) Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>	2026-03-23 10:29:07 +08:00
Netanel Haber	e74c17e153	Enable `NemotronHPuzzle` + `NemotronHMTP` (#37803 )	2026-03-22 15:13:58 +00:00
Yang Liu	b050700462	[Perf] Optimize glm4.xv VIT (#37779 ) Signed-off-by: Yang <lymailforjob@gmail.com>	2026-03-22 06:12:34 +00:00
Isotr0py	c7f98b4d0a	[Frontend] Remove librosa from audio dependency (#37058 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-21 11:36:15 +08:00
Santino Ramos	85f671b8e1	[Model Runner V2] Support Streaming Inputs (#37028 ) Signed-off-by: Santino Ramos <elsantinoramos@gmail.com>	2026-03-20 20:42:25 +00:00
Cyrus Leung	37aadf6237	[Model] Update Kimi-K25 and Isaac processors to fit HF-style (#37693 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-20 18:30:22 +00:00
Le Yang	d7d2b5e405	[Bugfix] Disable --calculate-kv-scales for hybrid GDN/Mamba+Attention… (#37565 ) Signed-off-by: Young-Leo <562593859@qq.com>	2026-03-20 18:28:34 +00:00
Rémi Delacourt	aa84e43ccb	[Pixtral] Enable Pixtral language model support Eagle3 (#37182 ) Signed-off-by: remi <remi@mistral.ai>	2026-03-20 15:50:15 +00:00
Ilya Boytsov	8b6c6b9505	[Model] Add LFM2-ColBERT-350M support (#37528 ) Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>	2026-03-20 14:57:57 +00:00
wang.yuqi	ed359c497a	[Model] Deprecate the score task (this will not affect users). (#37537 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-20 08:07:56 +00:00
Wangbei25	0674d1fee7	[PluggableLayer][MM] Add PluggableLayer for CustomQwen2Decoder (#37293 ) Signed-off-by: Wangbei25 <wangbei41@huawie.com> Signed-off-by: Wangbei25 <wangbei41@huawei.com> Co-authored-by: Wangbei25 <wangbei41@huawie.com>	2026-03-20 06:24:07 +00:00
Cyrus Leung	30108fc8b0	[Model] Refactor Step3-VL processor to HF style (#37579 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-20 06:05:08 +00:00
Jee Jee Li	8fbe3f303f	[Bugfix][LoRA] Fix Qwen35 LoRA (#36976 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-03-20 11:09:32 +08:00
Jim Smith	4120a05ff1	Fix AttributeError in Qwen3.5 GDN layers with quantized models (#37448 ) Signed-off-by: Jim Smith <jim@joshua8.ai> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>	2026-03-19 19:21:14 -04:00
Lucas Kabela	7769b58307	[torch.compile][BE][Multimodal] Remove requirement to set_model_tag to avoid cache conflict (#37345 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-03-19 17:26:12 +00:00
Cyrus Leung	657855ab41	[Misc] Cleanup more configs and processors (#37560 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 15:45:23 +00:00
Cyrus Leung	9515c20868	[Misc] Clean up processing logic (#37541 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 13:30:20 +00:00
Cyrus Leung	7a6ebcbfcf	[Model] Remove unnecessary `get_language_model` (#37545 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 20:00:36 +08:00
Cyrus Leung	765e461065	[Bugfix] Fix Nemotron Parse loading (#37407 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 09:55:29 +00:00

1 2 3 4 5 ...

2492 Commits