wang.yuqi
328cbb2773
[Frontend][2/n] Make pooling entrypoints request schema consensus | ChatRequest ( #32574 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2026-01-22 10:32:44 +00:00
whx
1861ae8aae
[PluggableLayer][1/N] Define PluggableLayer (Fix ci) ( #32744 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
2026-01-21 11:38:04 -05:00
Robert Shaw
42135d6898
[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority ( #32414 )
2026-01-21 08:22:33 -05:00
Lucas Kabela
c80f92c14d
[Documentation] Fix typo in docs/design/torch_compile_multimodal.md ( #32741 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-01-20 23:54:20 -08:00
Robert Shaw
c78ee240b3
Revert "[PluggableLayer][1/N] Define PluggableLayer" ( #32725 )
2026-01-21 00:21:06 +00:00
whx
4ca62a0dbd
[PluggableLayer][1/N] Define PluggableLayer ( #32331 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
2026-01-20 16:19:21 +00:00
杨朱 · Kiki
bb9172030e
[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric ( #32661 )
...
This PR completes the removal of the deprecated vllm:time_per_output_token_seconds
metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13,
but delayed until v0.15.
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com >
2026-01-20 12:28:41 +00:00
lon
73f2a81c75
docs: prefix caching seems quite outdated ( #28784 )
...
Signed-off-by: lon <114724657+longregen@users.noreply.github.com >
Signed-off-by: Russell Bryant <russell.bryant@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <russell.bryant@gmail.com >
2026-01-19 11:49:52 -08:00
Robert Shaw
4a6af8813f
[MoE Refactor] Move Test Impl into Test Dirs ( #32129 )
...
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com >
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com >
2026-01-18 12:16:59 +08:00
Cyrus Leung
9ea07b41da
[1/N] Reorganize multimodal processing code ( #32327 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-14 15:25:31 +00:00
Andrew Bennett
f243abc92d
Fix various typos found in docs ( #32212 )
...
Signed-off-by: Andrew Bennett <potatosaladx@meta.com >
2026-01-13 03:41:47 +00:00
Matthew Bonanni
20228cb851
[3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py ( #32054 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-12 09:13:56 -08:00
XlKsyt
899541bdb1
[doc] fix broken links ( #32158 )
...
Signed-off-by: minimAluminiumalism <caixuesen@outlook.com >
2026-01-12 10:18:38 +00:00
Matthew Bonanni
2612ba9285
[1/N][Attention] Restructure attention: move files ( #31916 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-01-09 13:10:24 -08:00
Shanshan Shen
08d954f036
[Doc] Add developer guide for CustomOp ( #30886 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2026-01-09 16:21:11 +00:00
Lucas Kabela
f16bfbe5bc
[Documentation][torch.compile] Add documentation for torch.compile + multimodal encoders ( #31627 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2026-01-08 14:33:24 -05:00
Robert Shaw
9f6dcb71ae
[MoE Refactor][16/N] Apply Refactor to NVFP4 ( #31692 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Pavani Majety <pmajety@nvidia.com >
2026-01-08 03:46:27 +00:00
Robert Shaw
5dcd7ef1f2
[MoE Refactor][15/N] Apply Refactor to Fp8 ( #31415 )
2026-01-07 19:42:33 -05:00
weiyu
e7596371a4
[Refactor][TPU] Remove torch_xla path and use tpu-inference ( #30808 )
...
Signed-off-by: Wei-Yu Lin <weiyulin@google.com >
Signed-off-by: weiyu <62784299+weiyu0824@users.noreply.github.com >
2026-01-07 16:07:16 +08:00
Cyrus Leung
db318326a5
[Misc] Use deprecated for seed_everything ( #31780 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-01-06 11:29:55 +00:00
wangxiyuan
bb4337b34c
[Platform] Deprecate seed_everything ( #31659 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2026-01-04 18:34:04 -08:00
Harry Mellor
decc244767
[Docs] Use relative md links instead of absolute html links for cross referencing ( #31494 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-29 13:33:44 +00:00
Steve Westerhouse
9d701e90d8
[Doc] Clarify FP8 KV cache computation workflow ( #31071 )
...
Signed-off-by: westers <steve.westerhouse@origami-analytics.com >
2025-12-22 08:41:37 +08:00
Elizabeth Thomas
41b6f9200f
Remove all2all backend envvar ( #30363 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-18 19:46:28 +00:00
rongfu.leng
9e67c4ce98
[Docs] fix function name ( #30748 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-12-17 12:14:45 +00:00
Didier Durand
1a55cfafcb
[Doc]: fixing typos in various files ( #30540 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-12-14 02:14:37 -08:00
Cyrus Leung
5a87d8b9b1
[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release ( #30396 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-10 19:59:35 -08:00
Mark McLoughlin
2dcbac9077
[Docs] Generate full list of metrics in user docs ( #30388 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Claude <noreply@anthropic.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-10 16:09:34 +00:00
redwrasse
6476382384
prefix caching design doc sha256 now default ( #29261 )
...
Signed-off-by: redwrasse <mail@redwrasse.io >
2025-12-06 07:39:56 +00:00
Yanan Cao
62b3333448
[Frontend] Remove deprecated -O.xx flag ( #29991 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2025-12-05 00:47:22 -08:00
TimWang
690cc3ef20
docs: update metrics design doc to use new vllm:kv_cache_usage_perc ( #30041 )
...
Signed-off-by: Tim <tim.wang03@sap.com >
2025-12-04 23:37:14 +00:00
CYJiang
fd68e909db
[docs] Remove _total from counter metrics names ( #30028 )
...
In Prometheus Counters always expose their actual numeric value with a metric name that ends in _total. We should document the base name, as this what appears in the get_metrics() API.
Signed-off-by: CYJiang <86391540+googs1025@users.noreply.github.com >
2025-12-04 07:46:15 +00:00
bnellnm
2902c34826
[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton ( #29929 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-12-03 20:49:00 +00:00
wang.yuqi
2eb4fe9129
[examples] Resettle pooling examples. ( #29365 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-02 15:54:28 +00:00
shivampr
cabc77cc86
[Core][Observability] Add KV cache residency metrics ( #27793 )
...
Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior:
vllm:kv_block_lifetime_seconds — total lifetime from allocation to free
vllm:kv_block_idle_before_evict_seconds — idle duration before eviction
vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block
These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates.
Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled.
Two new runtime flags are introduced:
--kv-cache-metrics – enable KV cache residency metrics
--kv-cache-metrics-sample – control sampling ratio (default: 0.01)
Signed-off-by: Shivam <shivamprasad91@gmail.com >
2025-12-01 18:27:53 +00:00
wang.yuqi
62de4f4257
[Frontend] Resettle pooling entrypoints ( #29634 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
2025-12-01 15:30:43 +08:00
Cyrus Leung
2afcec4dec
[Misc] Update TokenizerLike interface and move get_cached_tokenizer ( #29730 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-30 14:59:47 +08:00
Jinzhen Lin
1656ad3704
[Kernel][Quantization] add w4a8 support for marlin kernel ( #24722 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin@redhat.com >
2025-11-29 07:19:33 -08:00
dublc
f4341f45d3
[Doc]: fix code block rendering ( #29728 )
...
Signed-off-by: dublc <jdublc0x@gmail.com >
2025-11-29 13:46:48 +00:00
Yanan Cao
3461e7efd8
[Frontend] Remap -O to -cc commandline flag ( #29557 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Claude <noreply@anthropic.com >
2025-11-28 21:51:12 +00:00
Morrison Turnansky
0838b52e2e
[Frontend][torch.compile] CompilationConfig Overhaul ( #20283 ): Set up -O infrastructure ( #26847 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com >
Signed-off-by: adabeyta <aabeyta@redhat.com >
Signed-off-by: Morrison Turnansky <mturnans@redhat.com >
Co-authored-by: adabeyta <aabeyta@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-27 01:55:58 -08:00
Lucas Wilkinson
56539cddac
[Core] Refactor padding logic and pad for CUDA graphs before attention metadata building ( #28579 )
2025-11-26 14:07:13 -05:00
Michael Goin
e502098643
[Kernel] Add NVFP4 MoE CUTLASS support for SM120 ( #29242 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2025-11-25 06:59:07 -08:00
Laith Sakka
7a228b5305
Add option to use unbacked, and backed size obl dynamic shapes for more sounds compilation. ( #26199 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com >
2025-11-24 10:12:41 -05:00
Angela Yi
d5dbdbfcb2
[docs] Fix cudagraph mode config ( #29170 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-11-21 17:10:27 -08:00
wangxiyuan
4050bae417
[Doc] Update plugin doc ( #28532 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-21 14:57:26 +00:00
Cyrus Leung
aab0102a26
[V0 deprecation] Remove more V0 references ( #29088 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-21 11:56:59 +00:00
Didier Durand
09540cd918
[Doc]: fix typos in various files ( #29010 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-11-19 04:56:21 -08:00
Michael Yao
fdf93486d6
[Docs] Clean up moe_kernel_features.md ( #28530 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-11-19 02:35:29 -08:00
Didier Durand
7ed27f3cb5
[Doc]: fix typos in various files ( #28945 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-11-18 22:52:30 -08:00