biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Rohan Potdar	c5362c739f	Reenable features for ROCm attention backends (#36185 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-03-05 20:21:06 -08:00
Yanhong Li	a911f4dd20	[Model] Add support for OLMo Hybrid (#32550 )	2026-03-05 14:51:06 -05:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
Sage Moore	8c760b6ab6	[ROCm] Refactor ROCm attention backend selection logic (#35246 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-03-05 10:51:26 -06:00
Harry Mellor	8df523351f	[Docs] Only build docs if `documentation` or `ready` labels are present (#36135 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 13:58:16 +00:00
Kunshang Ji	66a2209645	[Hardware] Replace `torch.cuda.synchronize()` api with `torch.accelerator.synchronize` (#36085 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-05 10:36:39 +00:00
Paco Xu	7493c51c55	[Docs] add Dynamo/aibrix integration and kubeai/aks link (#32767 ) Signed-off-by: Paco Xu <paco.xu@daocloud.io>	2026-03-05 17:39:50 +08:00
Reagan Lee	ac773bbe80	[Docs] Update docs to include mm processor + encoder benchmarks (#34083 ) Signed-off-by: Reagan <reaganjlee@gmail.com>	2026-03-05 01:38:25 -08:00
zihaoanllm	d106bf39f5	[Doc] Add Parallel Draft Models (#35973 ) Signed-off-by: <zihaoan2@amd.com> Signed-off-by: zihaoanllm <zihaoan2@amd.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 05:44:07 +00:00
Russell Bryant	636ee223ac	[Docs] Document security risks of GPT-OSS Python tool (#35139 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-03-04 20:27:31 +00:00
Davina Zaman	138d891d7f	[Docs] Clarify structured outputs configuration for Qwen3 reasoning mode (#32441 ) Signed-off-by: Davina Zaman <davzaman@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 11:44:39 -08:00
Maxime Grenu	32224f568a	docs: update CPU Docker images to reference Docker Hub instead of AWS ECR (#34882 ) Signed-off-by: Maxime Grenu <69890511+cluster2600@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 10:31:35 -08:00
Abhishek Mathukiya	f3dc292e9f	docs: add version requirement note for --profiler-config flag (#32454 ) Signed-off-by: abhishkh <mathukiya.a@northeastern.edu>	2026-03-04 18:13:54 +00:00
Chen	138c5fa186	[Docs] Add RunPod GPU deployment guide for vLLM (#34531 ) Signed-off-by: lisperz <zhuchen200245@163.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 10:11:34 -08:00
Russell Bryant	2f2c1d73a7	[Docs] Upgrade dynamic LoRA warning to admonition block (#35218 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-03-04 10:01:42 -08:00
Michael Yao	fd3bfe74c9	[Docs] Update design/multiprocessing.md (#30677 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2026-03-04 17:58:59 +00:00
Sage	d25c1ec3c9	docs(cpu): Clarify pre-built wheels requirement for CPU Python-only build (#35090 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-04 17:45:35 +00:00
Xing Liu	7cc6058ac6	[Doc] Add MTP docs and update speculative decoding guidance (#35197 ) Signed-off-by: liuxing <945764858@qq.com>	2026-03-04 17:23:34 +00:00
Manrique Vargas	28028dff2f	fix(docs): use static rdzv backend in multi-node troubleshooting script (#34784 ) Signed-off-by: machov <mv1742@nyu.edu>	2026-03-04 17:15:35 +00:00
simone-dotolo	e86221deb6	[Doc] Fix GPU Worker count in Process Count Summary (#36000 ) Signed-off-by: simone-dotolo <simonedotolo@libero.it> Signed-off-by: simone-dotolo <84937474+simone-dotolo@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-04 17:03:14 +00:00
AllenDou	c1d963403c	[model] support FireRedASR2 (#35727 ) Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-03 19:41:30 -08:00
Shanshan Shen	77e6dcbbfa	[PluggableLayer][MM] Add PluggableLayer for RelPosAttention (#33753 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2026-03-03 19:41:27 -08:00
Robert Shaw	97995f6376	[MoE Refactor] Create MK for TRTLLM Kernels (#32564 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2026-03-03 10:39:50 -08:00
Woosuk Kwon	4f85bae9d6	[Docs][Model Runner V2] Add Design Docs (#35819 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-02 19:58:14 -08:00
Jakub Zakrzewski	c8b678e53e	[Model] Add support for nvidia/llama-nemotron-rerank-vl-1b-v2 (#35735 ) Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>	2026-03-03 08:32:14 +08:00
Lucas Wilkinson	8b5014d3dd	[Attention] FA4 integration (#32974 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-03-01 23:44:57 +00:00
Augusto Yao	8e75d88554	add io_process_plugin for sparse embedding (#34214 ) Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Signed-off-by: Augusto Yao <augusto.yjh@antgroup.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-28 09:16:37 +00:00
Cyrus Leung	4292e3b807	[Benchmark] Improve UX of sweep scripts (#35600 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-28 00:36:02 -08:00
Cyrus Leung	24d6ea8afd	[Benchmark] Rename SLA Finder to Workload Explorer (#35586 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-27 23:31:55 -08:00
Cyrus Leung	fd68cd132b	[Bugfix] Fixes for SLA finder (#35537 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-27 20:20:55 -08:00
Micah Williamson	0edf101d2b	[ROCm] Add `stablelm` Head Size 80 To Supported Head Sizes For ROCM_ATTN (#35527 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-02-28 12:16:34 +08:00
Gregory Shtrasberg	9fa6c68fa6	[ROCm] Enabling encoder and encoder-decoder on ROCm and AITER unified backends (#35334 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-02-27 21:32:55 +00:00
Martin Hickey	b602e4f299	[Doc] Fix link to Llama chat template for usability (#35525 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-27 17:51:09 +00:00
fort726	905d76b51d	[Model] Add huggingface skt/A.X-K1 model (#32407 ) Signed-off-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com> Signed-off-by: fort726 <38447663+fort726@users.noreply.github.com> Co-authored-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-02-27 09:26:02 -08:00
Boyuan Feng	5de98abc12	Add @BoyuanFeng to CODEOWNERS (#35317 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2026-02-27 15:53:47 +00:00
Wentao Ye	062b789632	[Bug] Fix outdated links in source code (#35314 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-27 03:50:46 +00:00
Tyler Michael Smith	eb19955c37	[WideEP] Remove pplx all2all backend (#33724 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 14:30:10 -08:00
Jakub Zakrzewski	111d869069	[Model] Add nvidia/llama-nemotron-embed-vl-1b-v2 multimodal embedding model (#35297 ) Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>	2026-02-26 14:17:17 +00:00
Jiangyun Zhu	ab87f85231	[Model] Ring 2.5 (#35102 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-02-26 02:17:11 -08:00
Cyrus Leung	d3a51da92a	[Benchmark] Simplify SLA scan (#35306 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-25 22:35:41 -08:00
Seungmin Kim	160424a937	[Bugfix] Fix CUDA compatibility path setting for both datacenter and consumer NVIDIA GPUs (#33992 ) Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com> Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com> Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-25 18:15:51 -08:00
Joao Gante	709eadbb0b	Doc link typo (#35281 ) Signed-off-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-25 03:00:31 -08:00
Yanwen Lin	80e60a6133	[Doc] Suggest "--managed-python" flag when installing python using uv (#33069 ) Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>	2026-02-25 08:19:43 +00:00
jonoillar	26e722f906	[DOC][BugFix] Specfiy build dependency installation (#34513 ) Signed-off-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com> Co-authored-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com>	2026-02-25 08:04:06 +00:00
lichuang	2c619e5e3f	[Docs]Fix documentation formatting in architecture overview (#34679 ) Signed-off-by: codedump <lichuang1982@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-25 08:00:15 +00:00
Simon Mo	8a685be8d9	docs: document committer proposal process in governance (#35225 ) Signed-off-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-25 07:58:48 +00:00
Harry Mellor	f7967577f5	Remove requirement to use `--hf-overrides` for `DeepseekVLV2ForCausalLM` (#35203 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-24 22:00:06 -08:00
Nicolò Lucchesi	f91808ae0d	[MM] Allow audio chunking for offline LLM (#34628 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-23 21:04:28 -08:00
Mark McLoughlin	5cc7c4452e	[Metrics] Add Prometheus counters for Model FLOPs Utilization (MFU) (#30950 ) Export the existing Model FLOPs Utilization (MFU) metrics via Prometheus. `--enable-mfu-metrics` is required for these to be exposed. Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-02-23 15:01:07 +00:00
Cyrus Leung	987506bca6	[Refactor] Simplify dummy data generation (#35025 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-22 20:55:27 -08:00

1 2 3 4 5 ...

2079 Commits