biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Harry Mellor	f4ae58b38b	Remove unused config field from Gemma2 (#36672 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 01:51:19 -07:00
Hongbin Guo	4bf533623b	[Doc] Fix duplicate words in comments (#36713 ) Signed-off-by: Hongbin10 <jdmjdm1998@163.com>	2026-03-10 21:28:31 -07:00
tunglinwood	42fadebecb	[Model] Add support for moonshotai/Kimi-Audio-7B-Instruct (#36127 ) Signed-off-by: tunglinwood <tunglinwood@gmail.com> Signed-off-by: tunglinwood <tomwu.tunglin@gmail.com> Signed-off-by: tunglinwood <113751333+tunglinwood@users.noreply.github.com>	2026-03-10 21:24:48 -07:00
AllenDou	aefc59f088	FunASR model bugfix (#36633 ) Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>	2026-03-10 08:14:21 -07:00
wang.yuqi	a3189a08b0	[Model] Consolidate score logic by introduce score_type (#36479 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-10 13:32:25 +00:00
Hojin Yang	0836be3b03	[Model] Add HyperCLOVAX-SEED-Think-32B vision-language model support (#31471 ) Signed-off-by: effortprogrammer <yhjhoward7@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-03-10 10:59:19 +08:00
Ajay Anubolu	4e95ec111c	[Bugfix] Fix Qwen3-Next in_proj_ba weight sharding with TP > 1 (#36242 ) Signed-off-by: AjAnubolu <anuboluajay@gmail.com>	2026-03-09 19:16:26 -07:00
Lucas Kabela	3fd03f1ec2	[BE] Rename `should_torch_compile_mm_vit` to `should_torch_compile_mm_encoder` (#36281 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-03-09 18:22:05 +00:00
SoluMilken	55d27cca55	[Misc] fix typo: dependant -> dependent (2 lines change) (#36511 ) Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>	2026-03-09 10:00:12 -07:00
Matthew Bonanni	77a73458e3	Reapply [Attention] Refactor `check_and_update_config` (#35122 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-09 07:17:14 -07:00
Tianyu Guo	5578f2a4d3	Support online use_audio_in_video (#36319 ) Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-09 07:16:44 -07:00
Xin Yang	dc6b578466	[Kernel] Add fused_sigmoid_gating_delta_rule_update kernel for Qwen3 Next (#35777 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-03-08 23:41:01 -07:00
Cyrus Leung	d62856b928	[Misc] Move processors to `transformers_utils` (#35953 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-09 11:31:39 +08:00
Alex Brooks	bd2659a566	Increase Flexibility for OOV Multimodal Token Handling (#34858 ) Signed-off-by: Alex Brooks <albrooks@redhat.com>	2026-03-08 20:30:49 -07:00
nvnbagrov	b7332b058c	[Model] Nano Nemotron VL - fast media preprocessing (#35657 ) Signed-off-by: Natan Bagrov <nbagrov@nvidia.com>	2026-03-08 03:04:05 -07:00
Wei Zhao	379689d533	[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891 )	2026-03-07 13:51:54 -08:00
rahul-sarvam	85f50eb41f	Adding support to Sarvam's MoE models (#33942 ) Signed-off-by: rahul-sarvam <140298821+rahul-sarvam@users.noreply.github.com>	2026-03-08 01:16:24 +08:00
vllmellm	ee8a29511f	[Bugfix] Fix compressed-tensors quantization failure for DeepSeek-R1 on MI300x (#36247 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-03-07 09:26:59 +00:00
Isotr0py	1d0c0d209c	[Misc] Lazy import registered processors (#36024 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-03-06 06:06:45 -08:00
Andreas Karatzas	2a00d3241f	[CI][MM] Gate vision encoder attention mask to MiniCPM only, fixing Aria regression (#36206 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-06 01:17:08 -08:00
Russell Bryant	00bd08edee	[Security] Respect user trust_remote_code setting in NemotronVL and KimiK25 (#36192 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-03-05 22:15:19 -08:00
Yanhong Li	a911f4dd20	[Model] Add support for OLMo Hybrid (#32550 )	2026-03-05 14:51:06 -05:00
Netanel Haber	b93a9e6f6d	ParakeetProjection.norm = RMSNorm instead of nn.LayerNorm (#36133 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-03-05 17:29:30 +00:00
Avery Miao	e998fa76b9	[BUGFIX]Fix Qwen-Omni models audio max_token_per_item estimation error leading to encoder_cache_size is 0 (#35994 ) Signed-off-by: Miao, Avery <avery.miao@intel.com>	2026-03-05 09:16:29 -08:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
AllenDou	3ee68590c7	refactor funasr model. (#36108 ) Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-05 08:07:37 -08:00
Cyrus Leung	7196348157	[Bugfix] Fix Qwen-VL tokenizer implementation (#36140 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-05 08:07:19 -08:00
Harry Mellor	ecde7af9c4	Fix import that was moved in Transformers 5.2.0 (#36120 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 13:59:44 +00:00
Hanjun Cho	f600d5192e	[Bugfix] Fix score layer quantization for sequence classification models - Qwen3 (VL) Reranker (#35849 ) Signed-off-by: Hanjun Cho <gkswns0531@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-04 20:57:20 -08:00
Andrii Skliar	0a12cea25f	Order `config.py` in Lexicographical order (#35866 ) Signed-off-by: Andrii Skliar <askliar@nvidia.com> Co-authored-by: Andrii Skliar <askliar@nvidia.com>	2026-03-04 20:56:47 -08:00
daje0601	3b23d57c96	[Model] Add LoRA support for Whisper models (#29856 ) Signed-off-by: daje0601 <englishmt4118@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-05 10:38:25 +08:00
tc-mb	bfdb512f11	fix minicpmo4.5: fix attn_mask in vit attn && fix resampler pos_emb i… (#34127 ) Signed-off-by: tc-mb <caitianchi@modelbest.cn> Co-authored-by: hezhihui <hezhihui@modelbest.cn>	2026-03-04 17:46:17 +00:00
Yan Ma	58cfe0dc44	Fix phi4-mm and remove cuda binding (#35964 ) Signed-off-by: Yan Ma <yan.ma@intel.com>	2026-03-05 01:08:05 +08:00
Netanel Haber	289fc48ab7	Use MMEncoderAttention (=use FlashAttention) instead of torch.sdpa in radio.py (#35653 )	2026-03-04 08:43:13 -08:00
Raghavan	c8c3935b70	[Bugfix][Model] Fix FP8 k_scale/v_scale not loaded for Qwen3-MoE (#35656 ) Signed-off-by: raghavan <oneraghavan@gmail.com>	2026-03-04 13:15:38 +00:00
Nathan Price	36bf213181	[Bugfix] Add missing dynamic_arg_dims for Qwen3-ASR torch.compile (#35869 ) Signed-off-by: Nathan Price <nathan@abridge.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-04 08:29:01 +00:00
Andrii Skliar	5d199ac8f2	Support Audio Extraction from MP4 Video for Nemotron Nano VL (#35539 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Andrii <askliar@nvidia.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Andrii Skliar <askliar@oci-nrt-cs-001-vscode-01.cm.cluster> Co-authored-by: Andrii <askliar@nvidia.com> Co-authored-by: root <root@pool0-03748.cm.cluster> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: root <root@pool0-02416.cm.cluster> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: root <root@pool0-04880.cm.cluster>	2026-03-03 23:20:33 -08:00
Andreas Karatzas	edba15045a	[Bugfix] Guard mm_token_type_ids kwarg in get_mrope_input_positions (#35711 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-04 04:12:51 +00:00
Isotr0py	6e9f21e8a2	[Chore] Remove debug code in model implementation (#35883 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-03 19:50:58 -08:00
AllenDou	c1d963403c	[model] support FireRedASR2 (#35727 ) Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-03 19:41:30 -08:00
Shanshan Shen	77e6dcbbfa	[PluggableLayer][MM] Add PluggableLayer for RelPosAttention (#33753 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2026-03-03 19:41:27 -08:00
William Zhang	70c73df69e	[Bugfix] Fix EVS implementation for Qwen3 VL (#33607 ) Signed-off-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com>	2026-03-04 02:18:11 +00:00
Isotr0py	8ea8ba275e	[V0 deprecation] Remove Swin model (#35821 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-02 20:03:41 -08:00
Jakub Zakrzewski	c8b678e53e	[Model] Add support for nvidia/llama-nemotron-rerank-vl-1b-v2 (#35735 ) Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>	2026-03-03 08:32:14 +08:00
Robert Shaw	9319044ee9	[MoE][Perf] Wrap DSV3 QKVAProj GEMM in custom op for torch.compile (#35751 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-03-02 23:03:49 +00:00
Ye (Charlotte) Qi	fa6a6be519	[Bugfix] Fix missing sequence_lengths in qwen3_omni_moe_thinker (#35741 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2026-03-02 21:11:56 +00:00
Fynn Schmitt-Ulms	9433acb8df	[Spec Decode] Add hidden states extraction system (#33736 ) Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>	2026-03-02 14:29:09 -05:00
CSWYF3634076	2a9e3347e9	[BugFix][Model]Fix the garbled code in Ernie4.5-VL caused by fast_moe_cold_start (#35587 ) Signed-off-by: wangyafeng <wangyafeng@baidu.com>	2026-03-02 18:56:33 +00:00
lin-shh	a9ec392c86	Fix typo: implictly -> implicitly in isaac.py docstring (#35646 )	2026-02-28 23:34:37 -08:00
lailoo	afd089f231	[Bugfix][Model] Fix Qwen3.5/Qwen3Next ignoring --dtype flag on older GPUs (#35617 )	2026-03-01 03:27:37 +00:00

1 2 3 4 5 ...

2385 Commits