biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
simone-dotolo	e86221deb6	[Doc] Fix GPU Worker count in Process Count Summary (#36000 ) Signed-off-by: simone-dotolo <simonedotolo@libero.it> Signed-off-by: simone-dotolo <84937474+simone-dotolo@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-04 17:03:14 +00:00
AllenDou	c1d963403c	[model] support FireRedASR2 (#35727 ) Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-03 19:41:30 -08:00
Shanshan Shen	77e6dcbbfa	[PluggableLayer][MM] Add PluggableLayer for RelPosAttention (#33753 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2026-03-03 19:41:27 -08:00
Robert Shaw	97995f6376	[MoE Refactor] Create MK for TRTLLM Kernels (#32564 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2026-03-03 10:39:50 -08:00
Woosuk Kwon	4f85bae9d6	[Docs][Model Runner V2] Add Design Docs (#35819 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-02 19:58:14 -08:00
Jakub Zakrzewski	c8b678e53e	[Model] Add support for nvidia/llama-nemotron-rerank-vl-1b-v2 (#35735 ) Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>	2026-03-03 08:32:14 +08:00
Lucas Wilkinson	8b5014d3dd	[Attention] FA4 integration (#32974 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-03-01 23:44:57 +00:00
Augusto Yao	8e75d88554	add io_process_plugin for sparse embedding (#34214 ) Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Signed-off-by: Augusto Yao <augusto.yjh@antgroup.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-28 09:16:37 +00:00
Cyrus Leung	4292e3b807	[Benchmark] Improve UX of sweep scripts (#35600 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-28 00:36:02 -08:00
Cyrus Leung	24d6ea8afd	[Benchmark] Rename SLA Finder to Workload Explorer (#35586 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-27 23:31:55 -08:00
Cyrus Leung	fd68cd132b	[Bugfix] Fixes for SLA finder (#35537 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-27 20:20:55 -08:00
Micah Williamson	0edf101d2b	[ROCm] Add `stablelm` Head Size 80 To Supported Head Sizes For ROCM_ATTN (#35527 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-02-28 12:16:34 +08:00
Gregory Shtrasberg	9fa6c68fa6	[ROCm] Enabling encoder and encoder-decoder on ROCm and AITER unified backends (#35334 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-02-27 21:32:55 +00:00
Martin Hickey	b602e4f299	[Doc] Fix link to Llama chat template for usability (#35525 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-27 17:51:09 +00:00
fort726	905d76b51d	[Model] Add huggingface skt/A.X-K1 model (#32407 ) Signed-off-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com> Signed-off-by: fort726 <38447663+fort726@users.noreply.github.com> Co-authored-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-02-27 09:26:02 -08:00
Boyuan Feng	5de98abc12	Add @BoyuanFeng to CODEOWNERS (#35317 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2026-02-27 15:53:47 +00:00
Wentao Ye	062b789632	[Bug] Fix outdated links in source code (#35314 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-27 03:50:46 +00:00
Tyler Michael Smith	eb19955c37	[WideEP] Remove pplx all2all backend (#33724 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 14:30:10 -08:00
Jakub Zakrzewski	111d869069	[Model] Add nvidia/llama-nemotron-embed-vl-1b-v2 multimodal embedding model (#35297 ) Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>	2026-02-26 14:17:17 +00:00
Jiangyun Zhu	ab87f85231	[Model] Ring 2.5 (#35102 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-02-26 02:17:11 -08:00
Cyrus Leung	d3a51da92a	[Benchmark] Simplify SLA scan (#35306 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-25 22:35:41 -08:00
Seungmin Kim	160424a937	[Bugfix] Fix CUDA compatibility path setting for both datacenter and consumer NVIDIA GPUs (#33992 ) Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com> Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com> Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-25 18:15:51 -08:00
Joao Gante	709eadbb0b	Doc link typo (#35281 ) Signed-off-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-25 03:00:31 -08:00
Yanwen Lin	80e60a6133	[Doc] Suggest "--managed-python" flag when installing python using uv (#33069 ) Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>	2026-02-25 08:19:43 +00:00
jonoillar	26e722f906	[DOC][BugFix] Specfiy build dependency installation (#34513 ) Signed-off-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com> Co-authored-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com>	2026-02-25 08:04:06 +00:00
lichuang	2c619e5e3f	[Docs]Fix documentation formatting in architecture overview (#34679 ) Signed-off-by: codedump <lichuang1982@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-25 08:00:15 +00:00
Simon Mo	8a685be8d9	docs: document committer proposal process in governance (#35225 ) Signed-off-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-25 07:58:48 +00:00
Harry Mellor	f7967577f5	Remove requirement to use `--hf-overrides` for `DeepseekVLV2ForCausalLM` (#35203 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-24 22:00:06 -08:00
Nicolò Lucchesi	f91808ae0d	[MM] Allow audio chunking for offline LLM (#34628 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-23 21:04:28 -08:00
Mark McLoughlin	5cc7c4452e	[Metrics] Add Prometheus counters for Model FLOPs Utilization (MFU) (#30950 ) Export the existing Model FLOPs Utilization (MFU) metrics via Prometheus. `--enable-mfu-metrics` is required for these to be exposed. Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-02-23 15:01:07 +00:00
Cyrus Leung	987506bca6	[Refactor] Simplify dummy data generation (#35025 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-22 20:55:27 -08:00
Athrael Soju	970861ac0c	[New Model] Add ColModernVBERT (#34558 ) Signed-off-by: Athrael Soju <athrael.soju@gmail.com> Signed-off-by: athrael-soju <athrael-soju@users.noreply.github.com>	2026-02-22 12:23:41 +08:00
petrpechman	bebfe55b1c	[Doc] Fix example of eagle3 (#34960 ) Signed-off-by: Petr Pechman <petr.pechman@firma.seznam.cz> Co-authored-by: Petr Pechman <petr.pechman@firma.seznam.cz>	2026-02-21 09:57:53 +00:00
Nicolò Lucchesi	ab6f3487a6	[PD] Change kv_load_failure_policy Default from "recompute" to "fail" (#34896 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-21 01:34:57 -08:00
BADAOUI Abdennacer	8dc8a99b56	[ROCm] Enable bitsandbytes quantization support on ROCm (#34688 ) Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com>	2026-02-21 00:34:55 -08:00
Kata Coder	5719a4e4e6	[Frontend] Support multimodal inputs for late-interaction scoring (ColQwen3) + NewModel: nvidia/nemotron-colembed (#34574 ) Signed-off-by: craftsangjae <craftsangjae@gmail.com>	2026-02-20 20:01:40 -08:00
Kyle Sayers	64ac1395e8	[Docs] Clean up speculators docs (#34065 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2026-02-18 13:48:11 -08:00
Cyrus Leung	a766b30349	[Renderer] Deprecate code paths for old input processing (#34775 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-18 00:35:04 -08:00
Matthew Bonanni	dc5fa77a4e	[Bugfix][MTP][Sparse MLA] Allow sparse MLA with MTP to run with FULL cudagraphs (#34457 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-02-17 14:01:27 -05:00
Harry Mellor	a21cedf4ff	Bump `lm-eval` version for Transformers v5 compatibility (#33994 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-16 05:24:35 -08:00
Parth Bansal	5653021094	[Doc] Add Mistral-7b-v0.3 model to the batch invariance validated model (#34584 ) Signed-off-by: Parth Bansal <parthbansal127@gmail.com>	2026-02-16 12:09:00 +08:00
Maryam Tahhan	f07a128413	[CPU][ARM] Add ARM BF16 cross-compilation support and improve documen… (#33079 ) Signed-off-by: Maryam Tahhan <mtahhan@redhat.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2026-02-15 06:33:08 -08:00
Isotr0py	19fab44152	[Doc] Update Encoder-Decoder models support doc with Florence-2 (#34581 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-15 04:18:57 -08:00
Kata Coder	d1ea65d0a1	[new model] add COLQwen3 code & Inference (#34398 ) Signed-off-by: craftsangjae <craftsangjae@gmail.com> Signed-off-by: katacoder <craftsangjae@gmail.com>	2026-02-14 12:15:19 +08:00
Ilya Boytsov	071d863e20	Extend ColBERT support to non-standard BERT backbones (#34170 ) Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com>	2026-02-13 09:53:09 +00:00
myselvess	bcf0731aa0	[New Model] support new model ovis2.6 (#34426 ) Signed-off-by: myselvess <23743269+myselvess@users.noreply.github.com>	2026-02-13 00:12:45 -08:00
Matthew Bonanni	f2c47886fd	[Attention] Add FlashInfer Sparse MLA backend (#33451 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2026-02-12 17:21:54 +00:00
Nicolò Lucchesi	334c715e0f	[Docs] Spec decoding docs warning removal (#34439 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-12 09:01:51 -08:00
Louie Tsai	55a1a9563a	Vllm CPU benchmark suite improvement (#34128 ) Signed-off-by: louie-tsai <louie.tsai@intel.com>	2026-02-12 16:04:44 +08:00
Tianqi Ren	786806dd44	[Doc] Update Marlin support matrix for Turing (#34319 ) Signed-off-by: Tianqi Ren <tianqi.r@outlook.com>	2026-02-11 09:03:41 +00:00

1 2 3 4 5 ...

2010 Commits