biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Kunshang Ji	53ec16a705	[Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145 ) Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-12 07:57:47 -07:00
Mark McLoughlin	5282c7d4d0	[docs] Add lightweight AI assisted contribution policy (#30947 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-03-12 11:46:13 +00:00
sfeiqiang	8cb24d3aed	[KV Connector] Support using FlexKV as KV Cache Offloading option. (#34328 ) Signed-off-by: phaedonsun <phaedonsun@tencent.com> Co-authored-by: phaedonsun <phaedonsun@tencent.com>	2026-03-12 00:46:20 -07:00
Louie Tsai	17852aa503	more models for vLLM Benchmark Suite (#35086 ) Signed-off-by: louie-tsai <louie.tsai@intel.com>	2026-03-12 11:36:51 +08:00
Kunshang Ji	513949f95f	[XPU][Doc] Remove manual OneAPI install step, now handled by torch-xpu (#36831 ) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-03-12 01:46:02 +00:00
Nick Hill	262b76a09f	[Frontend] Exclude anthropic billing header to avoid prefix cache miss (#36829 ) Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-12 01:20:34 +00:00
Harry Mellor	35db669f1d	Correct link to supported hardware on vllm.ai (#36798 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-11 08:43:28 -07:00
Wuxun Zhang	e584dce52b	Add XPU MLA Sparse backend for DeepSeek v3.2 (#33230 ) Signed-off-by: Zhang, Wuxun <wuxun.zhang@intel.com>	2026-03-11 19:19:15 +08:00
JartX	a40ee486f2	[Bugfix] Add Multiple of 16 block_size to triton fallback on rocm Attention to support qwen3_5 (#35923 ) Signed-off-by: JartX <sagformas@epdcenter.es> Co-authored-by: akaratza <akaratza@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-03-11 07:45:57 +00:00
tunglinwood	42fadebecb	[Model] Add support for moonshotai/Kimi-Audio-7B-Instruct (#36127 ) Signed-off-by: tunglinwood <tunglinwood@gmail.com> Signed-off-by: tunglinwood <tomwu.tunglin@gmail.com> Signed-off-by: tunglinwood <113751333+tunglinwood@users.noreply.github.com>	2026-03-10 21:24:48 -07:00
Hojin Yang	0836be3b03	[Model] Add HyperCLOVAX-SEED-Think-32B vision-language model support (#31471 ) Signed-off-by: effortprogrammer <yhjhoward7@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-03-10 10:59:19 +08:00
Lucas Kabela	3fd03f1ec2	[BE] Rename `should_torch_compile_mm_vit` to `should_torch_compile_mm_encoder` (#36281 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-03-09 18:22:05 +00:00
Simon Mo	fe0c085c28	[Docs] Remove the reo beacon (#36528 ) Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2026-03-09 11:16:50 -07:00
Russell Bryant	d460a18fc6	[Docs] Expand --allowed-media-domains security guidance with threat details (#36506 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-03-09 17:43:42 +00:00
Andreas Karatzas	c174d54f86	[ROCm][CI] Fix ROCm attention backend validation for head sizes, block sizes, and compute capability checks (#36292 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-09 12:02:41 -05:00
Harry Mellor	74a9f54cdb	[CI] Fix edge case that could lead to broken docs builds on main (#36515 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-09 09:06:19 -07:00
Cyrus Leung	f96c3ab08c	[Deprecation][1/2] Remove items deprecated in v0.18 (#36470 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-09 03:43:23 -07:00
Alex Brooks	65a4da1504	[Frontend] Add Support for MM Encoder/Decoder Beam Search (Online Transcriptions) (#36160 ) Signed-off-by: Alex Brooks <albrooks@redhat.com>	2026-03-09 05:46:23 +00:00
wang.yuqi	dcf8862fd4	[Examples][1/n] Resettle basic examples. (#35579 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-08 20:22:53 -07:00
Wentao Ye	384425f84e	[Dependency] Remove default ray dependency (#36170 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-08 20:06:22 -07:00
Harry Mellor	a0f44bb616	Allow `markdownlint` to run locally (#36398 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-08 20:05:24 -07:00
Kunshang Ji	fde4771bbd	[XPU][Doc] update xpu document about triton dependency/conflict issue. (#36301 ) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-03-09 02:09:22 +00:00
Wei Zhao	379689d533	[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891 )	2026-03-07 13:51:54 -08:00
rahul-sarvam	85f50eb41f	Adding support to Sarvam's MoE models (#33942 ) Signed-off-by: rahul-sarvam <140298821+rahul-sarvam@users.noreply.github.com>	2026-03-08 01:16:24 +08:00
lif	00b814ba5a	[V0 Deprecation] Remove unused swap_space parameter (#36216 ) Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: mcelrath	2026-03-07 22:09:55 +08:00
Copilot	ce8546a12b	[docs][torch.compile] Add fusions.md — kernel/operator fusion reference page (#35538 ) Signed-off-by: ProExpertProg <luka.govedic@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: ProExpertProg <luka.govedic@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-03-06 23:55:06 +00:00
Andreas Karatzas	807d680337	[ROCm][CI] Fix tool use test stability - disable skinny GEMM, prefix caching, eliminate batch variance (#35553 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-06 15:15:12 +08:00
Xiang Shi	e68de8adc0	docs: fix wrong cc in int8.md (#36209 ) Signed-off-by: Xiang Shi <realkevin@tutanota.com>	2026-03-06 06:01:02 +00:00
Rohan Potdar	c5362c739f	Reenable features for ROCm attention backends (#36185 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-03-05 20:21:06 -08:00
Yanhong Li	a911f4dd20	[Model] Add support for OLMo Hybrid (#32550 )	2026-03-05 14:51:06 -05:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
Sage Moore	8c760b6ab6	[ROCm] Refactor ROCm attention backend selection logic (#35246 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-03-05 10:51:26 -06:00
Harry Mellor	8df523351f	[Docs] Only build docs if `documentation` or `ready` labels are present (#36135 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 13:58:16 +00:00
Kunshang Ji	66a2209645	[Hardware] Replace `torch.cuda.synchronize()` api with `torch.accelerator.synchronize` (#36085 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-05 10:36:39 +00:00
Paco Xu	7493c51c55	[Docs] add Dynamo/aibrix integration and kubeai/aks link (#32767 ) Signed-off-by: Paco Xu <paco.xu@daocloud.io>	2026-03-05 17:39:50 +08:00
Reagan Lee	ac773bbe80	[Docs] Update docs to include mm processor + encoder benchmarks (#34083 ) Signed-off-by: Reagan <reaganjlee@gmail.com>	2026-03-05 01:38:25 -08:00
zihaoanllm	d106bf39f5	[Doc] Add Parallel Draft Models (#35973 ) Signed-off-by: <zihaoan2@amd.com> Signed-off-by: zihaoanllm <zihaoan2@amd.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 05:44:07 +00:00
Russell Bryant	636ee223ac	[Docs] Document security risks of GPT-OSS Python tool (#35139 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-03-04 20:27:31 +00:00
Davina Zaman	138d891d7f	[Docs] Clarify structured outputs configuration for Qwen3 reasoning mode (#32441 ) Signed-off-by: Davina Zaman <davzaman@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 11:44:39 -08:00
Maxime Grenu	32224f568a	docs: update CPU Docker images to reference Docker Hub instead of AWS ECR (#34882 ) Signed-off-by: Maxime Grenu <69890511+cluster2600@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 10:31:35 -08:00
Abhishek Mathukiya	f3dc292e9f	docs: add version requirement note for --profiler-config flag (#32454 ) Signed-off-by: abhishkh <mathukiya.a@northeastern.edu>	2026-03-04 18:13:54 +00:00
Chen	138c5fa186	[Docs] Add RunPod GPU deployment guide for vLLM (#34531 ) Signed-off-by: lisperz <zhuchen200245@163.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-04 10:11:34 -08:00
Russell Bryant	2f2c1d73a7	[Docs] Upgrade dynamic LoRA warning to admonition block (#35218 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-03-04 10:01:42 -08:00
Michael Yao	fd3bfe74c9	[Docs] Update design/multiprocessing.md (#30677 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2026-03-04 17:58:59 +00:00
Sage	d25c1ec3c9	docs(cpu): Clarify pre-built wheels requirement for CPU Python-only build (#35090 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-04 17:45:35 +00:00
Xing Liu	7cc6058ac6	[Doc] Add MTP docs and update speculative decoding guidance (#35197 ) Signed-off-by: liuxing <945764858@qq.com>	2026-03-04 17:23:34 +00:00
Manrique Vargas	28028dff2f	fix(docs): use static rdzv backend in multi-node troubleshooting script (#34784 ) Signed-off-by: machov <mv1742@nyu.edu>	2026-03-04 17:15:35 +00:00
simone-dotolo	e86221deb6	[Doc] Fix GPU Worker count in Process Count Summary (#36000 ) Signed-off-by: simone-dotolo <simonedotolo@libero.it> Signed-off-by: simone-dotolo <84937474+simone-dotolo@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-04 17:03:14 +00:00
AllenDou	c1d963403c	[model] support FireRedASR2 (#35727 ) Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-03 19:41:30 -08:00
Shanshan Shen	77e6dcbbfa	[PluggableLayer][MM] Add PluggableLayer for RelPosAttention (#33753 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2026-03-03 19:41:27 -08:00

1 2 3 4 5 ...

2107 Commits