vllm/vllm/v1/core at a9e532afe2a1ae65c917ae977bf9090806e14721 - vllm

Files

Wentao Ye a8ff2cca92 [Perf] Optimize scheduler overhead for PD disaggregation, around 5% E2E perf improvement (#35781 )

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>

2026-03-10 21:25:30 -07:00

sched

[Perf] Optimize scheduler overhead for PD disaggregation, around 5% E2E perf improvement (#35781 )

2026-03-10 21:25:30 -07:00

__init__.py

[V1] Implement vLLM V1 [1/N] (#9289 )

2024-10-22 01:24:07 -07:00

block_pool.py

[feat] Add per-block extra_keys to KV events (#33304 )

2026-02-20 20:11:40 -08:00

encoder_cache_manager.py

[Refactor] Move profiling methods to MM budget (#33559 )

2026-02-02 23:27:00 +08:00

kv_cache_coordinator.py

[BugFix] Avoid prefix cache hit in the same schedule step for mamba layers (#29387 )

2026-02-10 07:41:16 +00:00

kv_cache_manager.py

[BUGFIX][Mamba][Qwen3.5] Zero freed SSM cache blocks on GPU (#35219 )

2026-03-10 03:32:20 -07:00

kv_cache_metrics.py

[Core][Observability] Add KV cache residency metrics (#27793 )

2025-12-01 18:27:53 +00:00

kv_cache_utils.py

perf: add __slots__ to KVCacheBlock (#36164 )

2026-03-05 22:04:09 -08:00

single_type_kv_cache_manager.py

[BUGFIX][Mamba][Qwen3.5] Zero freed SSM cache blocks on GPU (#35219 )

2026-03-10 03:32:20 -07:00