biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
whyiug	1ce13cf992	[Model] Add support for BERT-like Chinese ERNIE pooling models (#36385 ) Signed-off-by: whyiug <whyiug@hotmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-13 03:23:53 +00:00
Nikita	10f08dedfa	[Model] Add ColPali late interaction model for multi-modal retrieval (#36818 ) Signed-off-by: Nikita Sukharev <kaonael@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-03-13 02:18:57 +00:00
Xinan Miao	2cdf92228c	[Feature]: Remove Chunking From FusedMoE (#34086 ) Signed-off-by: SouthWest7 <am1ao@qq.com> Signed-off-by: Southwest <1403572259@qq.com> Signed-off-by: southwest <am1ao@qq.com> Signed-off-by: Xinan Miao <1403572259@qq.com> Co-authored-by: SouthWest7 <am1ao@qq.com>	2026-03-12 14:24:38 -04:00
Harry Mellor	e39257a552	Add `AGENTS.md` (#36877 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-12 10:20:50 -07:00
grimulkan	a1257fd1ea	[Kernel] Add FP8 KV cache support to Triton MLA decode attention (#34597 ) Signed-off-by: grimulkan <grimulkan@gmail.com>	2026-03-12 08:32:34 -07:00
Kunshang Ji	53ec16a705	[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-12 07:57:47 -07:00
Mark McLoughlin	5282c7d4d0	[docs] Add lightweight AI assisted contribution policy (#30947 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-03-12 11:46:13 +00:00
sfeiqiang	8cb24d3aed	[KV Connector] Support using FlexKV as KV Cache Offloading option. (#34328 ) Signed-off-by: phaedonsun <phaedonsun@tencent.com> Co-authored-by: phaedonsun <phaedonsun@tencent.com>	2026-03-12 00:46:20 -07:00
Louie Tsai	17852aa503	more models for vLLM Benchmark Suite (#35086 ) Signed-off-by: louie-tsai <louie.tsai@intel.com>	2026-03-12 11:36:51 +08:00
Kunshang Ji	513949f95f	[XPU][Doc] Remove manual OneAPI install step, now handled by torch-xpu (#36831 ) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-03-12 01:46:02 +00:00
Nick Hill	262b76a09f	[Frontend] Exclude anthropic billing header to avoid prefix cache miss (#36829 ) Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-12 01:20:34 +00:00
Harry Mellor	35db669f1d	Correct link to supported hardware on vllm.ai (#36798 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 08:43:28 -07:00
Wuxun Zhang	e584dce52b	Add XPU MLA Sparse backend for DeepSeek v3.2 (#33230 ) Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com>	2026-03-11 19:19:15 +08:00
JartX	a40ee486f2	[Bugfix] Add Multiple of 16 block_size to triton fallback on rocm Attention to support qwen3_5 (#35923 ) Signed-off-by: JartX <sagformas@epdcenter.es> Co-authored-by: akaratza <akaratza@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-03-11 07:45:57 +00:00
tunglinwood	42fadebecb	[Model] Add support for moonshotai/Kimi-Audio-7B-Instruct (#36127 ) Signed-off-by: tunglinwood <tunglinwood@gmail.com> Signed-off-by: tunglinwood <tomwu.tunglin@gmail.com> Signed-off-by: tunglinwood <113751333+tunglinwood@users.noreply.github.com>	2026-03-10 21:24:48 -07:00
Hojin Yang	0836be3b03	[Model] Add HyperCLOVAX-SEED-Think-32B vision-language model support (#31471 ) Signed-off-by: effortprogrammer <yhjhoward7@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-03-10 10:59:19 +08:00
Lucas Kabela	3fd03f1ec2	[BE] Rename `should_torch_compile_mm_vit` to `should_torch_compile_mm_encoder` (#36281 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-03-09 18:22:05 +00:00
Simon Mo	fe0c085c28	[Docs] Remove the reo beacon (#36528 ) Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2026-03-09 11:16:50 -07:00
Russell Bryant	d460a18fc6	[Docs] Expand --allowed-media-domains security guidance with threat details (#36506 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-03-09 17:43:42 +00:00
Andreas Karatzas	c174d54f86	[ROCm][CI] Fix ROCm attention backend validation for head sizes, block sizes, and compute capability checks (#36292 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-09 12:02:41 -05:00
Harry Mellor	74a9f54cdb	[CI] Fix edge case that could lead to broken docs builds on main (#36515 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-09 09:06:19 -07:00
Cyrus Leung	f96c3ab08c	[Deprecation][1/2] Remove items deprecated in v0.18 (#36470 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-09 03:43:23 -07:00
Alex Brooks	65a4da1504	[Frontend] Add Support for MM Encoder/Decoder Beam Search (Online Transcriptions) (#36160 ) Signed-off-by: Alex Brooks <albrooks@redhat.com>	2026-03-09 05:46:23 +00:00
wang.yuqi	dcf8862fd4	[Examples][1/n] Resettle basic examples. (#35579 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-08 20:22:53 -07:00
Wentao Ye	384425f84e	[Dependency] Remove default ray dependency (#36170 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-08 20:06:22 -07:00
Harry Mellor	a0f44bb616	Allow `markdownlint` to run locally (#36398 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-08 20:05:24 -07:00
Kunshang Ji	fde4771bbd	[XPU][Doc] update xpu document about triton dependency/conflict issue. (#36301 ) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-03-09 02:09:22 +00:00
Wei Zhao	379689d533	[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891 )	2026-03-07 13:51:54 -08:00
rahul-sarvam	85f50eb41f	Adding support to Sarvam's MoE models (#33942 ) Signed-off-by: rahul-sarvam <140298821+rahul-sarvam@users.noreply.github.com>	2026-03-08 01:16:24 +08:00
lif	00b814ba5a	[V0 Deprecation] Remove unused swap_space parameter (#36216 ) Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: mcelrath	2026-03-07 22:09:55 +08:00
Copilot	ce8546a12b	[docs][torch.compile] Add fusions.md — kernel/operator fusion reference page (#35538 ) Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: ProExpertProg <luka.govedic@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-03-06 23:55:06 +00:00
Andreas Karatzas	807d680337	[ROCm][CI] Fix tool use test stability - disable skinny GEMM, prefix caching, eliminate batch variance (#35553 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-06 15:15:12 +08:00
Xiang Shi	e68de8adc0	docs: fix wrong cc in int8.md (#36209 ) Signed-off-by: Xiang Shi <realkevin@tutanota.com>	2026-03-06 06:01:02 +00:00
Rohan Potdar	c5362c739f	Reenable features for ROCm attention backends (#36185 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-03-05 20:21:06 -08:00
Yanhong Li	a911f4dd20	[Model] Add support for OLMo Hybrid (#32550 )	2026-03-05 14:51:06 -05:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
Sage Moore	8c760b6ab6	[ROCm] Refactor ROCm attention backend selection logic (#35246 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-03-05 10:51:26 -06:00
Harry Mellor	8df523351f	[Docs] Only build docs if `documentation` or `ready` labels are present (#36135 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 13:58:16 +00:00
Kunshang Ji	66a2209645	[Hardware] Replace `torch.cuda.synchronize()` api with `torch.accelerator.synchronize` (#36085 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-05 10:36:39 +00:00
Paco Xu	7493c51c55	[Docs] add Dynamo/aibrix integration and kubeai/aks link (#32767 ) Signed-off-by: Paco Xu <paco.xu@daocloud.io>	2026-03-05 17:39:50 +08:00
Reagan Lee	ac773bbe80	[Docs] Update docs to include mm processor + encoder benchmarks (#34083 ) Signed-off-by: Reagan <reaganjlee@gmail.com>	2026-03-05 01:38:25 -08:00
zihaoanllm	d106bf39f5	[Doc] Add Parallel Draft Models (#35973 ) Signed-off-by: <zihaoan2@amd.com> Signed-off-by: zihaoanllm <zihaoan2@amd.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 05:44:07 +00:00
Russell Bryant	636ee223ac	[Docs] Document security risks of GPT-OSS Python tool (#35139 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-03-04 20:27:31 +00:00
Davina Zaman	138d891d7f	[Docs] Clarify structured outputs configuration for Qwen3 reasoning mode (#32441 ) Signed-off-by: Davina Zaman <davzaman@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 11:44:39 -08:00
Maxime Grenu	32224f568a	docs: update CPU Docker images to reference Docker Hub instead of AWS ECR (#34882 ) Signed-off-by: Maxime Grenu <69890511+cluster2600@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 10:31:35 -08:00
Abhishek Mathukiya	f3dc292e9f	docs: add version requirement note for --profiler-config flag (#32454 ) Signed-off-by: abhishkh <mathukiya.a@northeastern.edu>	2026-03-04 18:13:54 +00:00
Chen	138c5fa186	[Docs] Add RunPod GPU deployment guide for vLLM (#34531 ) Signed-off-by: lisperz <zhuchen200245@163.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 10:11:34 -08:00
Russell Bryant	2f2c1d73a7	[Docs] Upgrade dynamic LoRA warning to admonition block (#35218 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-03-04 10:01:42 -08:00
Michael Yao	fd3bfe74c9	[Docs] Update design/multiprocessing.md (#30677 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2026-03-04 17:58:59 +00:00
Sage	d25c1ec3c9	docs(cpu): Clarify pre-built wheels requirement for CPU Python-only build (#35090 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-04 17:45:35 +00:00

1 2 3 4 5 ...

2062 Commits