biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Kata Coder	5719a4e4e6	[Frontend] Support multimodal inputs for late-interaction scoring (ColQwen3) + NewModel: nvidia/nemotron-colembed (#34574 ) Signed-off-by: craftsangjae <craftsangjae@gmail.com>	2026-02-20 20:01:40 -08:00
pougetat	11be2c74dc	[Realtime] Add Qwen3-ASR realtime streaming support (#34613 ) Signed-off-by: Thomas Pouget-Abadie <thomaspou@microsoft.com> Co-authored-by: Thomas Pouget-Abadie <thomaspou@microsoft.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-02-20 19:59:42 -08:00
Lucas Wilkinson	0e22cd618b	Revert "[Llama4,Quantization] Simplify and generalize logic for Q/K permutations in quantized self-attn layers " (#34997 )	2026-02-20 17:19:19 -08:00
Wei Zhao	ea5f903f80	Bump Flashinfer Version and Re-enable DeepSeek NVFP4 AR+Norm Fusion (#34899 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-20 13:37:31 -08:00
tianshu-Michael-yu	ea37530b47	[Models] LFM2: Support LoRA (#34921 ) Co-authored-by: Piotr Mazurek <piotr635@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-19 22:07:23 -08:00
Wentao Ye	c683d11c94	[Refactor] Deprecate `head_first` for `chunk_gated_delta_rule` (#34263 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-19 13:23:49 -05:00
roikoren755	3eff45d793	Revert "[NemotronH] Do not force router to run in fp32 (#34582 )" (#34808 ) Signed-off-by: Roi Koren <roik@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-19 09:47:05 -08:00
Robert Shaw	4685a630a2	[Model Bash][DeepSeekR1] Remove Shared Expert Clone (#34344 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-02-19 07:56:14 -08:00
Eldar Kurtić	ee1d25f199	[Llama4,Quantization] Simplify and generalize logic for Q/K permutations in quantized self-attn layers (#34471 ) Signed-off-by: Your Name <you@example.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-19 07:55:41 -08:00
Linda	6fff24f30f	[Bugfix] Qwen3.5 kv-scale weight remapping (#34719 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2026-02-19 04:13:37 -08:00
Tal Nir	f75b61a9e9	[Voxtral Realtime] Fix engine crash on empty multimodal embeddings (#34862 ) Signed-off-by: Tal Nir <tal@nervexneurotech.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 23:21:47 -08:00
Wei Zhao	7f51e93864	[Bug] Fix DeepSeek V3 weight loading caused by incorrect prefix (#34876 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-02-18 23:20:30 -08:00
Alex Brooks	4611af1663	[Bugfix] Add Quant Config to Llava Next Projector (#34847 ) Signed-off-by: Alex Brooks <albrooks@redhat.com>	2026-02-18 23:18:23 -08:00
Manrique Vargas	ad5aa6bd9f	fix(docs): fix typos in comments and docstrings (#34836 ) Signed-off-by: machov <mv1742@nyu.edu>	2026-02-18 23:17:41 -08:00
Isotr0py	c0bd8b13da	[Bugfix] Redo Qwen3.5/Qwen3-Next GDN projector fusion (#34697 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>	2026-02-18 09:46:53 -08:00
Robert Shaw	6874638bc4	[Model Bash] DeepSeek R1 BF16 Min Latency QKV A GEMM (0.5% E2E Speedup) (#34758 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-02-18 07:42:36 -08:00
Michael Goin	909b147197	[Bugfix] Fix prefix creation for Qwen3.5 (#34723 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-17 23:39:15 -08:00
Cyrus Leung	a0d8d944e2	[Renderer] Move MM Hash parsing into Renderer (#34711 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-17 19:18:55 -08:00
Cyrus Leung	574fe75245	[Renderer] Move InputPreprocessor into Renderer (2/2) (#34560 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-17 05:29:01 -08:00
Jiangyun Zhu	1d65283e95	Revert "[Models] Fuse Qwen3.5 GDN's qkvz_proj and ba_proj" (#34683 )	2026-02-17 01:29:27 -08:00
roikoren755	3b30e61507	[NemotronH] Do not force router to run in fp32 (#34582 ) Signed-off-by: Roi Koren <roik@nvidia.com>	2026-02-16 10:15:32 -08:00
Andreas Karatzas	03a8770a6d	[ROCm][CI] Fix plugins test group; updating terratorch and dependencies (#34589 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-16 07:33:42 -08:00
Isotr0py	3bb4e4311c	[Models] Fuse Qwen3.5 GDN's qkvz_proj and ba_proj (#34492 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-16 07:32:51 -08:00
Cyrus Leung	ec17bdd894	[Renderer] Move InputPreprocessor into Renderer (1.5/2) (#34598 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-15 23:46:33 -08:00
Isotr0py	91ac5d9bfd	[CI/Build] Enable tests for recent day-0 new models (#34585 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-15 18:17:04 -08:00
Luka Govedič	23d825aba1	[torch.compile] Disable ar-rms fusion for ds3-fp4 & DP, fix CI test (#34392 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-15 06:33:57 -08:00
Isotr0py	71cd89264f	[MM Encoder] Add Triton ViT attention backend (#32183 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-15 06:32:47 -08:00
Cyrus Leung	73391a1baa	[Renderer] Move InputPreprocessor into Renderer (1/2) (#34510 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-14 10:14:21 -08:00
Kata Coder	d1ea65d0a1	[new model] add COLQwen3 code & Inference (#34398 ) Signed-off-by: craftsangjae <craftsangjae@gmail.com> Signed-off-by: katacoder <craftsangjae@gmail.com>	2026-02-14 12:15:19 +08:00
Andreas Karatzas	de42abb366	[CI] Heavy refactoring of Voxtral multimodal audio model tests (#34294 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-13 20:04:29 -08:00
Wei Zhao	b37b679770	[Feature][Perf] Support Selective CPU Weight Offloading (#34535 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-02-13 20:02:24 -08:00
Harry Huang	c027541eaf	[Hybrid] Enable spec decoding in mamba cache align mode (#33705 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>	2026-02-13 13:02:28 -08:00
Wei Zhao	59d53066d8	[Feature] Support CPU Offloading without Pytorch Pinned Memory that leads to doubled allocation (#32993 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-13 08:11:26 -08:00
LoganJane	4a9952ec1b	[Bugfix] Add quant_config in ViT of Kimi-K2.5 (#34501 ) Signed-off-by: LoganJane <LoganJane73@hotmail.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-13 16:05:34 +00:00
Roger Wang	5885e330ef	[Misc] Port Qwen3.5 Configs (#34512 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-13 05:24:25 -08:00
Ilya Boytsov	071d863e20	Extend ColBERT support to non-standard BERT backbones (#34170 ) Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com>	2026-02-13 09:53:09 +00:00
myselvess	bcf0731aa0	[New Model] support new model ovis2.6 (#34426 ) Signed-off-by: myselvess <23743269+myselvess@users.noreply.github.com>	2026-02-13 00:12:45 -08:00
Roger Wang	eea3024f43	[Bugfix] Fix mamba state dtype setting for Qwen3-Next and Qwen3.5 (#34489 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-12 22:48:42 -08:00
haosdent	dcf6ee8592	[Bugfix] Fix encoder cache underestimation for GLM-4V/GLM-OCR single image (#34483 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-02-12 21:04:06 -08:00
Cyrus Leung	372b2e762a	[Bugfix] Standardize getting number of image patches/tokens (#34358 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 20:47:01 -08:00
LoganJane	62788f99a4	[Bugfix] Delete unused redundant code in Kimi-K2.5 (#34427 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-12 18:18:42 -08:00
Patrick von Platen	6c0baee610	[Voxtral Realtime] Refactor & Improve buffering logic (#34428 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-12 09:46:43 -08:00
Patrick von Platen	1100a97621	[Voxstral Realtime] Enable tests (#33803 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>	2026-02-12 09:43:24 -08:00
Harry Mellor	679ca5d8d3	Fix MoE for the Transformers modelling backend (#34436 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-12 09:29:42 -08:00
AllenDou	386bfe5d08	[bugfix] refactor FunASR's _get_data_parser (#34397 ) Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>	2026-02-12 07:26:49 +00:00
Yichuan Wang	80f2ba6ea6	Fix DeepSeek-OCR tensor validation for all size variants (#34085 ) Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-11 22:50:23 -08:00
Michael Goin	ff1f83b056	[Refactor] Replace `activation: str` with `MoEActivation` enum (#33843 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>	2026-02-11 17:29:32 -08:00
Raushan Turganbay	527ca32197	[Bugfix] Fix more multimodal tests for transformers V5 (#34334 ) Signed-off-by: raushan <raushan@huggingface.co>	2026-02-11 22:02:05 +01:00
elvischenv	83e26c834e	[GPT-OSS] Remove unnecessary contiguous (#34337 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2026-02-11 15:29:29 -05:00
Eldar Kurtić	11c7ace340	[Bugfix] Enable attn quantization of Llama-4 by correctly permuting scales for rope (int8, fp8) (#34243 ) Signed-off-by: Your Name <you@example.com> Co-authored-by: Your Name <you@example.com>	2026-02-11 13:24:22 -05:00

1 2 3 4 5 ...

2299 Commits