vllm/tests/v1/metrics at fa6a6be51978bd4b49ba0da17039e60f96dc5b13 - vllm

Files

zhanqiuhu 4403e3ed4c [Metrics] Add labeled prompt token metrics for P/D disaggregation (#33290 )

Add labeled Prometheus metrics to distinguish where prompt tokens come
from in P/D disaggregated deployments.

In P/D disaggregation, decode instances receive KV cache from prefill instances.
Currently, decode reports inflated prompt throughput because it counts all
prompt tokens as "computed", even though most were transferred.

This PR adds labeled metrics so users can understand actual compute work vs
transferred work:

vllm:prompt_tokens_by_source_total{source="local_compute"}        # Tokens prefilled locally
vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer  
vllm:prompt_tokens_by_source_total{source="local_cache_hit"}      # Tokens from local prefix cache
vllm:prompt_tokens_cached_total                                    # Total cached (local + external, -1 when all 

Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>

2026-02-04 07:46:48 +00:00

test_engine_logger_apis.py

[V1][Metrics][Plugin] Add plugin support for custom StatLoggerBase implementations (#22456 )

2025-10-18 15:12:46 -07:00

test_metrics_reader.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_perf_metrics.py

[Core] Parse vLLM engine required fields from hf_config to model_arch_config (#28454 )

2026-01-02 15:13:15 -08:00

test_ray_metrics.py

Tidy vllm/config/__init__.py to only add classes and functions (#26405 )

2025-10-08 07:10:00 -07:00

test_stats.py

[Metrics] Add labeled prompt token metrics for P/D disaggregation (#33290 )

2026-02-04 07:46:48 +00:00