biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Lucas Kabela	714c6e0eab	[torch.compile][BE] Modify cudagraph callable to check for is_forward_context_set (#36288 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-03-16 19:42:34 +00:00
Tianyu Guo	43a73f853b	Remove unused EVS functions in qwen3_vl.py (#37183 ) Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>	2026-03-16 13:09:09 +00:00
Lukas Geiger	f9e6db3034	[Models][Qwen3 ViT] Keep `max_seqlen` on CPU to prevent D2H sync (#37139 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-16 12:11:59 +00:00
Harry Mellor	ad041c79db	Fix text only inputs for MRoPE models with the Transformers modelling backend (#37055 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-16 10:31:16 +00:00
Harry Mellor	122f75d939	Fix pipeline parallel with multimodal models with the Transformers modelling backend (#37057 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-16 10:20:37 +00:00
Vadim Gimpelson	8374387bd8	[FlashInfer] Revert block_size 16 + head_size 256 workaround on Blackwell (#36987 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-03-16 09:04:29 +00:00
Isotr0py	912fbe9555	[Bugfix] Fix Qwen2.5-Omni/Qwen3-Omni use_audio_in_video with multi-video inputs (#37147 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-16 08:56:06 +00:00
bigshanedogg	2390d44209	[Model] Add HyperCLOVAX-SEED-Think-14B language model support (#37107 ) Signed-off-by: bigshanedogg <bigshane319@gmail.com>	2026-03-16 06:40:05 +00:00
Jiangyun Zhu	697e4ff352	[GDN] add a config for gdn kernel selection (#36647 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-03-16 00:40:17 +08:00
Isotr0py	a8e8d62dd8	[Misc] Clean up Kimi-audio whisper encoder loading (#36903 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-14 23:37:52 +08:00
Harry Mellor	ffa5d74f15	Enable loading of fused expert weights in the Transformers modelling backend (#36997 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-14 07:01:06 +00:00
Dimitrios Bariamis	367cf5cd3e	[Feat][Bugfix] Enable additional dimension for Flashinfer MLA and fix routing dtype (#36931 ) Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>	2026-03-13 16:41:16 -07:00
Benjamin Chislett	8b346309a5	[Refactor] Consolidate SupportsEagle (#36063 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-13 23:22:40 +00:00
Harry Mellor	0005d2a3c9	Use Transformers v5 `WeightRenaming` for Transformers modeling backend (#31545 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-13 20:49:08 +00:00
Isotr0py	abf61aaa8e	[Bugfix] Fix Qwen2.5-omni/Qwen3-omni mm_processor cache for audio_in_video request (#36800 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-13 18:16:05 +00:00
bigmoyan	4508532fbd	[Bugfix] fix paddleocr crash on some image shape (#36959 ) Signed-off-by: wangzhengtao <wangzhengtao@msh.team> Signed-off-by: bigmoyan <moyan_work@foxmail.com> Co-authored-by: wangzhengtao <wangzhengtao@msh.team> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-13 13:46:55 +00:00
Thomas Parnell	f296a1966d	[Bugfix] Fix FlashInfer GDN warmup ValueError on SM90 GPUs (#36876 )	2026-03-13 07:09:39 +01:00
whyiug	1ce13cf992	[Model] Add support for BERT-like Chinese ERNIE pooling models (#36385 ) Signed-off-by: whyiug <whyiug@hotmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-13 03:23:53 +00:00
Nikita	10f08dedfa	[Model] Add ColPali late interaction model for multi-modal retrieval (#36818 ) Signed-off-by: Nikita Sukharev <kaonael@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-03-13 02:18:57 +00:00
Shubhra Pandit	87985077a4	[Speculative Decoding] Add `norm_before_fc` for gpt-oss draft models (#36545 ) Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-12 23:03:32 +00:00
caozuoba	9e19f8338b	[Perf] add packed recurrent fast path for decode (#36596 ) Signed-off-by: hdj <1293066020@qq.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-03-12 04:01:57 -07:00
Shanshan Shen	f0d3658c0f	[MM][OOT] Support CPU `seq_lens` for OOT MMEncoderAttention kernels (#36605 ) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-12 03:28:23 -07:00
Xu Jinyang	3e64fe4a18	[Bugfix] Warm up Triton autotuner for GDN layers during V1 profiling (#36599 ) Signed-off-by: AuYang <459461160@qq.com>	2026-03-12 00:51:09 -07:00
István Ketykó	00726c74c9	[Bugfix][Model] Fix DeepSeek-OCR TensorSchema crash on empty images_crop (#36670 ) Signed-off-by: István Ketykó <istvan.ketyko@gmail.com>	2026-03-12 15:35:54 +08:00
Harry Mellor	65986db6ba	Make Gemma and Gemma 2 accept `inputs_embeds` like Gemma 3 (#36787 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 18:12:43 +00:00
tianshu-Michael-yu	741f4e046b	fix: align lfm2 thumbnail token counting with HF (#36707 )	2026-03-11 10:28:38 -07:00
Cyrus Leung	196802dfa6	[Misc] Clean up renderers (#36770 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-11 16:39:29 +00:00
Julien Denize	afebeffbfb	Add support to Mistral large 3 eagle with dense layers (#36163 ) Signed-off-by: juliendenize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-11 15:42:56 +00:00
Jhao-Ting Chen	5573894737	Kimi k2.5 MLA based eagle3 (#36361 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Co-authored-by: Izzy Putterman <iputterman@nvidia.com>	2026-03-11 11:36:11 -04:00
Weiguang Li	724759684c	[Bugfix] Fix Qwen3-VL timestamp mismatch when using num_frames without fps (#36136 ) Signed-off-by: OiPunk <codingpunk@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 03:13:06 -07:00
Rahul Tuli	9d07a3d6e4	Add: Eagle3 support for Qwen3.5 (#36658 ) Signed-off-by: Rahul-Tuli <rtuli@redhat.com>	2026-03-11 03:07:42 -07:00
Cyrus Leung	646b85544b	[Refactor] Remove Molmo2 processor wrapper (#36667 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-11 03:07:20 -07:00
tc-mb	4286cc5ec2	fix(minicpmv): fix audio inference by handling meta device in init_re… (#36751 ) Signed-off-by: caitianchi <caitianchi@modelbest.cn>	2026-03-11 03:06:28 -07:00
LoganJane	545d18d81b	[Bugfix] Support other quantization methods in glm41v (#36321 ) Signed-off-by: g00887675/loganJane <g00887675/loganJane73@hotmail.com> Co-authored-by: g00887675/loganJane <g00887675/loganJane73@hotmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-11 09:48:05 +00:00
Harry Mellor	f4ae58b38b	Remove unused config field from Gemma2 (#36672 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 01:51:19 -07:00
Hongbin Guo	4bf533623b	[Doc] Fix duplicate words in comments (#36713 ) Signed-off-by: Hongbin10 <jdmjdm1998@163.com>	2026-03-10 21:28:31 -07:00
tunglinwood	42fadebecb	[Model] Add support for moonshotai/Kimi-Audio-7B-Instruct (#36127 ) Signed-off-by: tunglinwood <tunglinwood@gmail.com> Signed-off-by: tunglinwood <tomwu.tunglin@gmail.com> Signed-off-by: tunglinwood <113751333+tunglinwood@users.noreply.github.com>	2026-03-10 21:24:48 -07:00
AllenDou	aefc59f088	FunASR model bugfix (#36633 ) Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>	2026-03-10 08:14:21 -07:00
wang.yuqi	a3189a08b0	[Model] Consolidate score logic by introduce score_type (#36479 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-10 13:32:25 +00:00
Hojin Yang	0836be3b03	[Model] Add HyperCLOVAX-SEED-Think-32B vision-language model support (#31471 ) Signed-off-by: effortprogrammer <yhjhoward7@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-03-10 10:59:19 +08:00
Ajay Anubolu	4e95ec111c	[Bugfix] Fix Qwen3-Next in_proj_ba weight sharding with TP > 1 (#36242 ) Signed-off-by: AjAnubolu <anuboluajay@gmail.com>	2026-03-09 19:16:26 -07:00
Lucas Kabela	3fd03f1ec2	[BE] Rename `should_torch_compile_mm_vit` to `should_torch_compile_mm_encoder` (#36281 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-03-09 18:22:05 +00:00
SoluMilken	55d27cca55	[Misc] fix typo: dependant -> dependent (2 lines change) (#36511 ) Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>	2026-03-09 10:00:12 -07:00
Matthew Bonanni	77a73458e3	Reapply [Attention] Refactor `check_and_update_config` (#35122 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-09 07:17:14 -07:00
Tianyu Guo	5578f2a4d3	Support online use_audio_in_video (#36319 ) Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-09 07:16:44 -07:00
Xin Yang	dc6b578466	[Kernel] Add fused_sigmoid_gating_delta_rule_update kernel for Qwen3 Next (#35777 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-03-08 23:41:01 -07:00
Cyrus Leung	d62856b928	[Misc] Move processors to `transformers_utils` (#35953 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-09 11:31:39 +08:00
Alex Brooks	bd2659a566	Increase Flexibility for OOV Multimodal Token Handling (#34858 ) Signed-off-by: Alex Brooks <albrooks@redhat.com>	2026-03-08 20:30:49 -07:00
nvnbagrov	b7332b058c	[Model] Nano Nemotron VL - fast media preprocessing (#35657 ) Signed-off-by: Natan Bagrov <nbagrov@nvidia.com>	2026-03-08 03:04:05 -07:00
Wei Zhao	379689d533	[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891 )	2026-03-07 13:51:54 -08:00

1 2 3 4 5 ...

2419 Commits