biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Kunshang Ji	fde4771bbd	[XPU][Doc] update xpu document about triton dependency/conflict issue. (#36301 ) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-03-09 02:09:22 +00:00
Wei Zhao	379689d533	[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891 )	2026-03-07 13:51:54 -08:00
rahul-sarvam	85f50eb41f	Adding support to Sarvam's MoE models (#33942 ) Signed-off-by: rahul-sarvam <140298821+rahul-sarvam@users.noreply.github.com>	2026-03-08 01:16:24 +08:00
lif	00b814ba5a	[V0 Deprecation] Remove unused swap_space parameter (#36216 ) Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: mcelrath	2026-03-07 22:09:55 +08:00
Copilot	ce8546a12b	[docs][torch.compile] Add fusions.md — kernel/operator fusion reference page (#35538 ) Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: ProExpertProg <luka.govedic@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-03-06 23:55:06 +00:00
Andreas Karatzas	807d680337	[ROCm][CI] Fix tool use test stability - disable skinny GEMM, prefix caching, eliminate batch variance (#35553 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-06 15:15:12 +08:00
Xiang Shi	e68de8adc0	docs: fix wrong cc in int8.md (#36209 ) Signed-off-by: Xiang Shi <realkevin@tutanota.com>	2026-03-06 06:01:02 +00:00
Rohan Potdar	c5362c739f	Reenable features for ROCm attention backends (#36185 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-03-05 20:21:06 -08:00
Yanhong Li	a911f4dd20	[Model] Add support for OLMo Hybrid (#32550 )	2026-03-05 14:51:06 -05:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
Sage Moore	8c760b6ab6	[ROCm] Refactor ROCm attention backend selection logic (#35246 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-03-05 10:51:26 -06:00
Harry Mellor	8df523351f	[Docs] Only build docs if `documentation` or `ready` labels are present (#36135 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 13:58:16 +00:00
Kunshang Ji	66a2209645	[Hardware] Replace `torch.cuda.synchronize()` api with `torch.accelerator.synchronize` (#36085 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-05 10:36:39 +00:00
Paco Xu	7493c51c55	[Docs] add Dynamo/aibrix integration and kubeai/aks link (#32767 ) Signed-off-by: Paco Xu <paco.xu@daocloud.io>	2026-03-05 17:39:50 +08:00
Reagan Lee	ac773bbe80	[Docs] Update docs to include mm processor + encoder benchmarks (#34083 ) Signed-off-by: Reagan <reaganjlee@gmail.com>	2026-03-05 01:38:25 -08:00
zihaoanllm	d106bf39f5	[Doc] Add Parallel Draft Models (#35973 ) Signed-off-by: <zihaoan2@amd.com> Signed-off-by: zihaoanllm <zihaoan2@amd.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 05:44:07 +00:00
Russell Bryant	636ee223ac	[Docs] Document security risks of GPT-OSS Python tool (#35139 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-03-04 20:27:31 +00:00
Davina Zaman	138d891d7f	[Docs] Clarify structured outputs configuration for Qwen3 reasoning mode (#32441 ) Signed-off-by: Davina Zaman <davzaman@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 11:44:39 -08:00
Maxime Grenu	32224f568a	docs: update CPU Docker images to reference Docker Hub instead of AWS ECR (#34882 ) Signed-off-by: Maxime Grenu <69890511+cluster2600@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 10:31:35 -08:00
Abhishek Mathukiya	f3dc292e9f	docs: add version requirement note for --profiler-config flag (#32454 ) Signed-off-by: abhishkh <mathukiya.a@northeastern.edu>	2026-03-04 18:13:54 +00:00
Chen	138c5fa186	[Docs] Add RunPod GPU deployment guide for vLLM (#34531 ) Signed-off-by: lisperz <zhuchen200245@163.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 10:11:34 -08:00
Russell Bryant	2f2c1d73a7	[Docs] Upgrade dynamic LoRA warning to admonition block (#35218 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-03-04 10:01:42 -08:00
Michael Yao	fd3bfe74c9	[Docs] Update design/multiprocessing.md (#30677 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2026-03-04 17:58:59 +00:00
Sage	d25c1ec3c9	docs(cpu): Clarify pre-built wheels requirement for CPU Python-only build (#35090 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-04 17:45:35 +00:00
Xing Liu	7cc6058ac6	[Doc] Add MTP docs and update speculative decoding guidance (#35197 ) Signed-off-by: liuxing <945764858@qq.com>	2026-03-04 17:23:34 +00:00
Manrique Vargas	28028dff2f	fix(docs): use static rdzv backend in multi-node troubleshooting script (#34784 ) Signed-off-by: machov <mv1742@nyu.edu>	2026-03-04 17:15:35 +00:00
simone-dotolo	e86221deb6	[Doc] Fix GPU Worker count in Process Count Summary (#36000 ) Signed-off-by: simone-dotolo <simonedotolo@libero.it> Signed-off-by: simone-dotolo <84937474+simone-dotolo@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-04 17:03:14 +00:00
AllenDou	c1d963403c	[model] support FireRedASR2 (#35727 ) Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-03 19:41:30 -08:00
Shanshan Shen	77e6dcbbfa	[PluggableLayer][MM] Add PluggableLayer for RelPosAttention (#33753 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2026-03-03 19:41:27 -08:00
Robert Shaw	97995f6376	[MoE Refactor] Create MK for TRTLLM Kernels (#32564 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2026-03-03 10:39:50 -08:00
Woosuk Kwon	4f85bae9d6	[Docs][Model Runner V2] Add Design Docs (#35819 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-02 19:58:14 -08:00
Jakub Zakrzewski	c8b678e53e	[Model] Add support for nvidia/llama-nemotron-rerank-vl-1b-v2 (#35735 ) Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>	2026-03-03 08:32:14 +08:00
Lucas Wilkinson	8b5014d3dd	[Attention] FA4 integration (#32974 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-03-01 23:44:57 +00:00
Augusto Yao	8e75d88554	add io_process_plugin for sparse embedding (#34214 ) Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Signed-off-by: Augusto Yao <augusto.yjh@antgroup.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-28 09:16:37 +00:00
Cyrus Leung	4292e3b807	[Benchmark] Improve UX of sweep scripts (#35600 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-28 00:36:02 -08:00
Cyrus Leung	24d6ea8afd	[Benchmark] Rename SLA Finder to Workload Explorer (#35586 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-27 23:31:55 -08:00
Cyrus Leung	fd68cd132b	[Bugfix] Fixes for SLA finder (#35537 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-27 20:20:55 -08:00
Micah Williamson	0edf101d2b	[ROCm] Add `stablelm` Head Size 80 To Supported Head Sizes For ROCM_ATTN (#35527 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-02-28 12:16:34 +08:00
Gregory Shtrasberg	9fa6c68fa6	[ROCm] Enabling encoder and encoder-decoder on ROCm and AITER unified backends (#35334 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-02-27 21:32:55 +00:00
Martin Hickey	b602e4f299	[Doc] Fix link to Llama chat template for usability (#35525 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-27 17:51:09 +00:00
fort726	905d76b51d	[Model] Add huggingface skt/A.X-K1 model (#32407 ) Signed-off-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com> Signed-off-by: fort726 <38447663+fort726@users.noreply.github.com> Co-authored-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-02-27 09:26:02 -08:00
Boyuan Feng	5de98abc12	Add @BoyuanFeng to CODEOWNERS (#35317 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2026-02-27 15:53:47 +00:00
Wentao Ye	062b789632	[Bug] Fix outdated links in source code (#35314 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-27 03:50:46 +00:00
Tyler Michael Smith	eb19955c37	[WideEP] Remove pplx all2all backend (#33724 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 14:30:10 -08:00
Jakub Zakrzewski	111d869069	[Model] Add nvidia/llama-nemotron-embed-vl-1b-v2 multimodal embedding model (#35297 ) Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>	2026-02-26 14:17:17 +00:00
Jiangyun Zhu	ab87f85231	[Model] Ring 2.5 (#35102 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-02-26 02:17:11 -08:00
Cyrus Leung	d3a51da92a	[Benchmark] Simplify SLA scan (#35306 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-25 22:35:41 -08:00
Seungmin Kim	160424a937	[Bugfix] Fix CUDA compatibility path setting for both datacenter and consumer NVIDIA GPUs (#33992 ) Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com> Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com> Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-25 18:15:51 -08:00
Joao Gante	709eadbb0b	Doc link typo (#35281 ) Signed-off-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-25 03:00:31 -08:00
Yanwen Lin	80e60a6133	[Doc] Suggest "--managed-python" flag when installing python using uv (#33069 ) Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>	2026-02-25 08:19:43 +00:00

1 2 3 4 5 ...

2136 Commits