biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Gregory Shtrasberg	189ddefbfd	[ROCm] Attention selector reordering (#36702 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2026-03-25 17:42:56 +08:00
Baorun (Lauren) Mu	9d0351c91d	[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc (#37914 ) Signed-off-by: Baorun Mu <bmu@nvidia.com>	2026-03-24 19:53:24 -07:00
Terry Gao	82580b10ac	[Perf] Disable inductor runtime asserts by default for serving perfor… (#37485 ) Signed-off-by: tianrengao <terrygao87@gmail.com> Co-authored-by: Tianren Gao <tianren@fb.com>	2026-03-24 19:37:51 -04:00
Nick Cao	935c46dd9b	[Model] Add Granite 4.0 1B speech to supported models (#38019 ) Signed-off-by: Nick Cao <ncao@redhat.com>	2026-03-24 18:23:41 +00:00
Vineeta Tiwari	b58c5f28aa	docs: fix broken offline inference paths in documentation (#37998 ) Signed-off-by: Vineeta Tiwari <vineeta.tiwari2@ibm.com> Signed-off-by: Vineeta Tiwari <vineetatiwari2000@gmail.com> Co-authored-by: Vineeta Tiwari <vineeta.tiwari2@ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-24 17:35:14 +00:00
Sungjae Lee	4731884796	[Feature] limit thinking tokens (hard limit) (#20859 ) Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Signed-off-by: Sungjae Lee <sung-jae.lee@navercorp.com> Signed-off-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-24 09:53:07 -07:00
wang.yuqi	1b6cb920e6	[Deprecate] Deprecate pooling multi task support. (#37956 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-03-24 14:07:47 +00:00
Andrew Xia	9ace378a63	[Frontend][Responses API] Fix arrival_time recording for TTFT on initial request (#37498 ) Signed-off-by: Andrew Xia <axia@meta.com>	2026-03-23 09:58:08 +00:00
Yan Ma	d3fe857135	update doc for online fp8 quantization (#37851 ) Signed-off-by: Yan Ma <yan.ma@intel.com>	2026-03-23 05:19:03 +00:00
Lasha Koroshinadze	e7767eccae	Fix AudioFlamingo3/MusicFlamingo HF parity and RoTE handling (#37643 ) Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>	2026-03-23 10:29:07 +08:00
Robert Shaw	4383f1532e	[MoE] Move PF Methods to Folder (#35927 )	2026-03-22 02:42:59 -06:00
Yongye Zhu	87bd91892f	[MoE Refactor] Mxfp4 oracle rebased (#37128 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 03:37:04 +00:00
Ilya Boytsov	8b6c6b9505	[Model] Add LFM2-ColBERT-350M support (#37528 ) Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>	2026-03-20 14:57:57 +00:00
Jee Jee Li	dd20ee4e3e	[UX] Enable torch_profiler_with_stack (#37571 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-03-20 11:17:26 +00:00
wang.yuqi	ed359c497a	[Model] Deprecate the score task (this will not affect users). (#37537 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-20 08:07:56 +00:00
Wangbei25	0674d1fee7	[PluggableLayer][MM] Add PluggableLayer for CustomQwen2Decoder (#37293 ) Signed-off-by: Wangbei25 <wangbei41@huawie.com> Signed-off-by: Wangbei25 <wangbei41@huawei.com> Co-authored-by: Wangbei25 <wangbei41@huawie.com>	2026-03-20 06:24:07 +00:00
bnellnm	91be5f9be3	[MoE Refactor] Rename "naive" all2all backend (#36294 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-03-19 15:50:34 -04:00
Lucas Kabela	7769b58307	[torch.compile][BE][Multimodal] Remove requirement to set_model_tag to avoid cache conflict (#37345 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-03-19 17:26:12 +00:00
Ifta khairul Alam Adil	104605cbf2	Remove deprecated reasoning_content message field(part-2) (#37480 ) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Philip Ottesen <phiott256@gmail.com> Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai> Signed-off-by: Andy Lo <andy@mistral.ai> Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com> Signed-off-by: sihao.li <sihao.li@intel.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: JartX <sagformas@epdcenter.es> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Philip Ottesen <phiott256@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com> Co-authored-by: Andy Lo <andy@mistral.ai> Co-authored-by: Thillai Chithambaram <79466435+thillai-c@users.noreply.github.com> Co-authored-by: sihao_li <165983188+1643661061leo@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 15:20:08 +00:00
wang.yuqi	f9e2a38386	[Docs] Reorganize pooling docs. (#35592 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 11:25:47 +00:00
Shwetha Poojary	cef1f302d2	[Model] Enable LoRA support for tower and connector in H2OVL (#31696 ) Signed-off-by: shwetha-s-poojary <shwetha.s-poojary@ibm.com>	2026-03-18 13:26:47 +00:00
Aaron Hao	47a1f11bff	[docs] Add docs for new RL flows (#36188 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-18 09:04:26 +00:00
Athrael Soju	c0745a851a	[Model] Add ColQwen3.5 4.5B support (#36887 ) Signed-off-by: Athrael Soju <athrael.soju@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-17 21:17:02 +00:00
Wei Zhao	b36adfa349	[Perf] Set Flashinfer sparse MLA as default backend for FP8 kv cache (#37252 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-03-17 20:09:20 +00:00
Bhoomit	3717a4dd47	[Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules (#34984 ) Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-17 14:36:41 +00:00
Walter Beller-Morales	061980c36a	[Feature][Frontend] add support for Cohere Embed v2 API (#37074 ) Signed-off-by: walterbm <walter.beller.morales@gmail.com>	2026-03-16 19:55:53 -04:00
Yuchen Fama	31a458c091	[Doc] Clarify schema enforcement behavior for tool_choice modes (#37064 ) Signed-off-by: yfama <yuchengu@gmail.com>	2026-03-16 22:27:42 +00:00
Matthew Bonanni	93f3c8e531	[Misc] Add `float16` to `CacheDType` (#37199 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-16 13:24:48 -07:00
Lucas Kabela	714c6e0eab	[torch.compile][BE] Modify cudagraph callable to check for is_forward_context_set (#36288 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-03-16 19:42:34 +00:00
Max de Bayser	9f9ecff4cd	Add simple granite4 tool parser (#36827 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2026-03-16 10:49:09 -07:00
Harry Mellor	9b005edc48	[Docs] Make the link to hardware plugins clearer (#37174 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-16 04:12:58 -07:00
SoluMilken	d8f8a7aad2	[Misc] Sync pre-commit to 4.5.1 in workflows and docs (#36675 ) Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-16 10:03:21 +00:00
leo-cf-tian	2754231ba3	[Kernel] Add FlashInfer MoE A2A Kernel (#36022 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Signed-off-by: Leo Tian <lctian@nvidia.com> Co-authored-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com> Co-authored-by: root <root@lyris0267.lyris.clusters.nvidia.com>	2026-03-15 23:45:32 -07:00
bigshanedogg	2390d44209	[Model] Add HyperCLOVAX-SEED-Think-14B language model support (#37107 ) Signed-off-by: bigshanedogg <bigshane319@gmail.com>	2026-03-16 06:40:05 +00:00
Hari	a3e2e250f0	[Feature] Add Azure Blob Storage support for RunAI Model Streamer (#34614 ) Signed-off-by: hasethuraman <hsethuraman@microsoft.com>	2026-03-15 19:38:21 +08:00
arlo	8c29042bb9	[Feature] Add InstantTensor weight loader (#36139 )	2026-03-14 18:05:23 +01:00
Li, Jiang	092ace9e3a	[UX] Improve UX of CPU backend (#36968 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Li, Jiang <bigpyj64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-14 09:27:29 +08:00
whyiug	1ce13cf992	[Model] Add support for BERT-like Chinese ERNIE pooling models (#36385 ) Signed-off-by: whyiug <whyiug@hotmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-13 03:23:53 +00:00
Nikita	10f08dedfa	[Model] Add ColPali late interaction model for multi-modal retrieval (#36818 ) Signed-off-by: Nikita Sukharev <kaonael@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-03-13 02:18:57 +00:00
Xinan Miao	2cdf92228c	[Feature]: Remove Chunking From FusedMoE (#34086 ) Signed-off-by: SouthWest7 <am1ao@qq.com> Signed-off-by: Southwest <1403572259@qq.com> Signed-off-by: southwest <am1ao@qq.com> Signed-off-by: Xinan Miao <1403572259@qq.com> Co-authored-by: SouthWest7 <am1ao@qq.com>	2026-03-12 14:24:38 -04:00
Harry Mellor	e39257a552	Add `AGENTS.md` (#36877 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-12 10:20:50 -07:00
grimulkan	a1257fd1ea	[Kernel] Add FP8 KV cache support to Triton MLA decode attention (#34597 ) Signed-off-by: grimulkan <grimulkan@gmail.com>	2026-03-12 08:32:34 -07:00
Kunshang Ji	53ec16a705	[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-12 07:57:47 -07:00
Mark McLoughlin	5282c7d4d0	[docs] Add lightweight AI assisted contribution policy (#30947 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-03-12 11:46:13 +00:00
sfeiqiang	8cb24d3aed	[KV Connector] Support using FlexKV as KV Cache Offloading option. (#34328 ) Signed-off-by: phaedonsun <phaedonsun@tencent.com> Co-authored-by: phaedonsun <phaedonsun@tencent.com>	2026-03-12 00:46:20 -07:00
Louie Tsai	17852aa503	more models for vLLM Benchmark Suite (#35086 ) Signed-off-by: louie-tsai <louie.tsai@intel.com>	2026-03-12 11:36:51 +08:00
Kunshang Ji	513949f95f	[XPU][Doc] Remove manual OneAPI install step, now handled by torch-xpu (#36831 ) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-03-12 01:46:02 +00:00
Nick Hill	262b76a09f	[Frontend] Exclude anthropic billing header to avoid prefix cache miss (#36829 ) Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-12 01:20:34 +00:00
Harry Mellor	35db669f1d	Correct link to supported hardware on vllm.ai (#36798 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 08:43:28 -07:00
Wuxun Zhang	e584dce52b	Add XPU MLA Sparse backend for DeepSeek v3.2 (#33230 ) Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com>	2026-03-11 19:19:15 +08:00

1 2 3 4 5 ...

2099 Commits