[CI/Build] Auto-fix Markdown files (#12941)
This commit is contained in:
@@ -14,8 +14,8 @@ The KV cache transfer contains three layer of abstractions:
|
||||
|
||||
Why we need KV lookup buffer: FIFO pipe itself is not enough as prefill vLLM worker may process requests in a different order compared to decode vLLM worker. Say the QPS is really high, prefill worker may handle requests in order A -> B -> C, but the decode worker may process request C first. This is not the case that can be naturally handled by FIFO pipe, so we provide KV lookup buffer to help translate a FIFO pipe to a lookup buffer.
|
||||
|
||||
NOTE: KV pipe layer is bypassible: you can skip this layer if your distributed
|
||||
communication service already supports key-value-based lookup (like redis or
|
||||
NOTE: KV pipe layer is bypassible: you can skip this layer if your distributed
|
||||
communication service already supports key-value-based lookup (like redis or
|
||||
RDMA database).
|
||||
|
||||
NOTE: If you want to not only transfer KV caches, but adjust the model execution flow of vLLM as well (for example, allow vLLM to receive KV caches on some tokens and do prefill on the remaining tokens), you can bypass both KV pipe layer and KV lookup buffer layer, and directly implement on KV connector layer. Bear in mind that as vLLM's model input is constantly changing, this implementation will likely be broken when vLLM has new updates.
|
||||
@@ -27,4 +27,3 @@ The example usage is in [this file](../../../examples/online_serving/disaggregat
|
||||
Here is the diagram of how we run disaggretgated prefilling.
|
||||
|
||||

|
||||
|
||||
|
||||
Reference in New Issue
Block a user