[Doc] Update prefix cache metrics to counting tokens (#18138)

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
This commit is contained in:
Chen Zhang
2025-05-14 23:23:30 +08:00
committed by GitHub
parent 59dd311cf5
commit 964472b966

View File

@@ -415,8 +415,8 @@ The discussion in <gh-issue:10582> about adding prefix cache metrics yielded
some interesting points which may be relevant to how we approach
future metrics.
Every time the prefix cache is queried, we record the number of blocks
queried and the number of queried blocks present in the cache
Every time the prefix cache is queried, we record the number of tokens
queried and the number of queried tokens present in the cache
(i.e. hits).
However, the metric of interest is the hit rate - i.e. the number of