[Docs] Spec decoding docs warning removal (#34439)
Signed-off-by: NickLucche <nlucches@redhat.com>
This commit is contained in:
@@ -1,10 +1,5 @@
|
||||
# Speculative Decoding
|
||||
|
||||
!!! warning
|
||||
Please note that speculative decoding in vLLM is not yet optimized and does
|
||||
not usually yield inter-token latency reductions for all prompt datasets or sampling parameters.
|
||||
The work to optimize it is ongoing and can be followed here: <https://github.com/vllm-project/vllm/issues/4630>
|
||||
|
||||
!!! warning
|
||||
Currently, speculative decoding in vLLM is not compatible with pipeline parallelism.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user