[Core] Deprecating block manager v1 and make block manager v2 default (#8704)

Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
This commit is contained in:
Kuntai Du
2024-10-17 11:38:15 -05:00
committed by GitHub
parent 5eda21e773
commit 81ede99ca4
45 changed files with 206 additions and 2109 deletions

View File

@@ -19,9 +19,6 @@ SPEC_MODEL = "JackFram/llama-68m"
[[
# Skip cuda graph recording for fast test.
"--enforce_eager",
# Required for spec decode.
"--use-v2-block-manager",
"--tensor-parallel-size",
"4",
]])
@@ -71,9 +68,6 @@ def test_draft_model_tp_lt_target_model_tp4(common_llm_kwargs,
# Skip cuda graph recording for fast test.
"--enforce-eager",
# Required for spec decode.
"--use-v2-block-manager",
"--tensor-parallel-size",
"4",
]])