biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Richard Zou	fd9c83d0e0	[torch.compile] Document the workaround to standalone_compile failing (#33571 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-03 07:16:55 +00:00
Cyrus Leung	793af538a3	[Doc] Update plugin deprecation notices (#33476 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-31 22:48:28 +08:00
jennyyyyzhen	527bcd14d4	[ROCM] Enable aiter attn backend for qwen3-next model (#32492 ) Signed-off-by: jennyyyyzhen <yzhen@hmc.edu>	2026-01-31 17:03:57 +08:00
Didier Durand	31b25f6516	[Doc]: fixing multiple typos in diverse files (#33256 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-29 16:52:03 +08:00
Matthew Bonanni	77c4f45c6c	[7/N][Attention][Docs] Add documentation for attention backends (#32477 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-28 17:20:22 -05:00
Angela Yi	fb7abfc1d0	[docs] Improve tlparse section (#33211 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-01-28 02:07:37 +00:00
Matthew Bonanni	a608b4c6c2	[5/N][Attention] Finish eliminating `vllm/attention` folder (#32064 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-27 10:02:51 -05:00
Robert Shaw	5a93b9162b	[MoE Refactor] Integrate Naive Prepare Finalize into MK (#32567 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: amirkl94 <203507526+amirkl94@users.noreply.github.com>	2026-01-27 01:28:02 +00:00
Alex Brooks	9ac818a551	[Misc] HF Hub LoRA Resolver (#20320 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2026-01-26 13:56:32 +00:00
wang.yuqi	328cbb2773	[Frontend][2/n] Make pooling entrypoints request schema consensus \| ChatRequest (#32574 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-01-22 10:32:44 +00:00
whx	1861ae8aae	[PluggableLayer][1/N] Define PluggableLayer (Fix ci) (#32744 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2026-01-21 11:38:04 -05:00
Robert Shaw	42135d6898	[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (#32414 )	2026-01-21 08:22:33 -05:00
Lucas Kabela	c80f92c14d	[Documentation] Fix typo in `docs/design/torch_compile_multimodal.md` (#32741 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-01-20 23:54:20 -08:00
Robert Shaw	c78ee240b3	Revert "[PluggableLayer][1/N] Define PluggableLayer" (#32725 )	2026-01-21 00:21:06 +00:00
whx	4ca62a0dbd	[PluggableLayer][1/N] Define PluggableLayer (#32331 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2026-01-20 16:19:21 +00:00
杨朱 · Kiki	bb9172030e	[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric (#32661 ) This PR completes the removal of the deprecated vllm:time_per_output_token_seconds metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13, but delayed until v0.15. Signed-off-by: carlory <baofa.fan@daocloud.io> Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>	2026-01-20 12:28:41 +00:00
lon	73f2a81c75	docs: prefix caching seems quite outdated (#28784 ) Signed-off-by: lon <114724657+longregen@users.noreply.github.com> Signed-off-by: Russell Bryant <russell.bryant@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Russell Bryant <russell.bryant@gmail.com>	2026-01-19 11:49:52 -08:00
Robert Shaw	4a6af8813f	[MoE Refactor] Move Test Impl into Test Dirs (#32129 ) Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2026-01-18 12:16:59 +08:00
Cyrus Leung	9ea07b41da	[1/N] Reorganize multimodal processing code (#32327 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-14 15:25:31 +00:00
Andrew Bennett	f243abc92d	Fix various typos found in `docs` (#32212 ) Signed-off-by: Andrew Bennett <potatosaladx@meta.com>	2026-01-13 03:41:47 +00:00
Matthew Bonanni	20228cb851	[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-12 09:13:56 -08:00
XlKsyt	899541bdb1	[doc] fix broken links (#32158 ) Signed-off-by: minimAluminiumalism <caixuesen@outlook.com>	2026-01-12 10:18:38 +00:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
Shanshan Shen	08d954f036	[Doc] Add developer guide for CustomOp (#30886 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2026-01-09 16:21:11 +00:00
Lucas Kabela	f16bfbe5bc	[Documentation][torch.compile] Add documentation for torch.compile + multimodal encoders (#31627 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-01-08 14:33:24 -05:00
Robert Shaw	9f6dcb71ae	[MoE Refactor][16/N] Apply Refactor to NVFP4 (#31692 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Pavani Majety <pmajety@nvidia.com>	2026-01-08 03:46:27 +00:00
Robert Shaw	5dcd7ef1f2	[MoE Refactor][15/N] Apply Refactor to Fp8 (#31415 )	2026-01-07 19:42:33 -05:00
weiyu	e7596371a4	[Refactor][TPU] Remove torch_xla path and use tpu-inference (#30808 ) Signed-off-by: Wei-Yu Lin <weiyulin@google.com> Signed-off-by: weiyu <62784299+weiyu0824@users.noreply.github.com>	2026-01-07 16:07:16 +08:00
Cyrus Leung	db318326a5	[Misc] Use `deprecated` for `seed_everything` (#31780 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-06 11:29:55 +00:00
wangxiyuan	bb4337b34c	[Platform] Deprecate seed_everything (#31659 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-01-04 18:34:04 -08:00
Harry Mellor	decc244767	[Docs] Use relative `md` links instead of absolute `html` links for cross referencing (#31494 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-29 13:33:44 +00:00
Steve Westerhouse	9d701e90d8	[Doc] Clarify FP8 KV cache computation workflow (#31071 ) Signed-off-by: westers <steve.westerhouse@origami-analytics.com>	2025-12-22 08:41:37 +08:00
Elizabeth Thomas	41b6f9200f	Remove all2all backend envvar (#30363 ) Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-18 19:46:28 +00:00
rongfu.leng	9e67c4ce98	[Docs] fix function name (#30748 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-12-17 12:14:45 +00:00
Didier Durand	1a55cfafcb	[Doc]: fixing typos in various files (#30540 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-12-14 02:14:37 -08:00
Cyrus Leung	5a87d8b9b1	[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release (#30396 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-10 19:59:35 -08:00
Mark McLoughlin	2dcbac9077	[Docs] Generate full list of metrics in user docs (#30388 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-10 16:09:34 +00:00
redwrasse	6476382384	prefix caching design doc sha256 now default (#29261 ) Signed-off-by: redwrasse <mail@redwrasse.io>	2025-12-06 07:39:56 +00:00
Yanan Cao	62b3333448	[Frontend] Remove deprecated -O.xx flag (#29991 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-12-05 00:47:22 -08:00
TimWang	690cc3ef20	docs: update metrics design doc to use new vllm:kv_cache_usage_perc (#30041 ) Signed-off-by: Tim <tim.wang03@sap.com>	2025-12-04 23:37:14 +00:00
CYJiang	fd68e909db	[docs] Remove _total from counter metrics names (#30028 ) In Prometheus Counters always expose their actual numeric value with a metric name that ends in _total. We should document the base name, as this what appears in the get_metrics() API. Signed-off-by: CYJiang <86391540+googs1025@users.noreply.github.com>	2025-12-04 07:46:15 +00:00
bnellnm	2902c34826	[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton (#29929 ) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-12-03 20:49:00 +00:00
wang.yuqi	2eb4fe9129	[examples] Resettle pooling examples. (#29365 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 15:54:28 +00:00
shivampr	cabc77cc86	[Core][Observability] Add KV cache residency metrics (#27793 ) Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by: Shivam <shivamprasad91@gmail.com>	2025-12-01 18:27:53 +00:00
wang.yuqi	62de4f4257	[Frontend] Resettle pooling entrypoints (#29634 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-01 15:30:43 +08:00
Cyrus Leung	2afcec4dec	[Misc] Update `TokenizerLike` interface and move `get_cached_tokenizer` (#29730 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-30 14:59:47 +08:00
Jinzhen Lin	1656ad3704	[Kernel][Quantization] add w4a8 support for marlin kernel (#24722 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com>	2025-11-29 07:19:33 -08:00
dublc	f4341f45d3	[Doc]: fix code block rendering (#29728 ) Signed-off-by: dublc <jdublc0x@gmail.com>	2025-11-29 13:46:48 +00:00
Yanan Cao	3461e7efd8	[Frontend] Remap -O to -cc commandline flag (#29557 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-11-28 21:51:12 +00:00
Morrison Turnansky	0838b52e2e	[Frontend][torch.compile] CompilationConfig Overhaul (#20283 ): Set up -O infrastructure (#26847 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: adabeyta <aabeyta@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Co-authored-by: adabeyta <aabeyta@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-27 01:55:58 -08:00

1 2 3

143 Commits