[CI/Build] Add markdown linter (#11857)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
This commit is contained in:
@@ -192,11 +192,11 @@ A few important things to consider when using the EAGLE based draft models:
|
||||
|
||||
1. The EAGLE draft models available in the [HF repository for EAGLE models](https://huggingface.co/yuhuili) cannot be
|
||||
used directly with vLLM due to differences in the expected layer names and model definition.
|
||||
To use these models with vLLM, use the [following script](https://gist.github.com/abhigoyal1997/1e7a4109ccb7704fbc67f625e86b2d6d)
|
||||
To use these models with vLLM, use the [following script](https://gist.github.com/abhigoyal1997/1e7a4109ccb7704fbc67f625e86b2d6d)
|
||||
to convert them. Note that this script does not modify the model's weights.
|
||||
|
||||
In the above example, use the script to first convert
|
||||
the [yuhuili/EAGLE-LLaMA3-Instruct-8B](https://huggingface.co/yuhuili/EAGLE-LLaMA3-Instruct-8B) model
|
||||
the [yuhuili/EAGLE-LLaMA3-Instruct-8B](https://huggingface.co/yuhuili/EAGLE-LLaMA3-Instruct-8B) model
|
||||
and then use the converted checkpoint as the draft model in vLLM.
|
||||
|
||||
2. The EAGLE based draft models need to be run without tensor parallelism
|
||||
@@ -207,7 +207,6 @@ A few important things to consider when using the EAGLE based draft models:
|
||||
reported in the reference implementation [here](https://github.com/SafeAILab/EAGLE). This issue is under
|
||||
investigation and tracked here: [https://github.com/vllm-project/vllm/issues/9565](https://github.com/vllm-project/vllm/issues/9565).
|
||||
|
||||
|
||||
A variety of EAGLE draft models are available on the Hugging Face hub:
|
||||
|
||||
| Base Model | EAGLE on Hugging Face | # EAGLE Parameters |
|
||||
@@ -224,7 +223,6 @@ A variety of EAGLE draft models are available on the Hugging Face hub:
|
||||
| Qwen2-7B-Instruct | yuhuili/EAGLE-Qwen2-7B-Instruct | 0.26B |
|
||||
| Qwen2-72B-Instruct | yuhuili/EAGLE-Qwen2-72B-Instruct | 1.05B |
|
||||
|
||||
|
||||
## Lossless guarantees of Speculative Decoding
|
||||
|
||||
In vLLM, speculative decoding aims to enhance inference efficiency while maintaining accuracy. This section addresses the lossless guarantees of
|
||||
@@ -250,8 +248,6 @@ speculative decoding, breaking down the guarantees into three key areas:
|
||||
same request across runs. For more details, see the FAQ section
|
||||
titled *Can the output of a prompt vary across runs in vLLM?* in the [FAQs](#faq).
|
||||
|
||||
**Conclusion**
|
||||
|
||||
While vLLM strives to ensure losslessness in speculative decoding, variations in generated outputs with and without speculative decoding
|
||||
can occur due to following factors:
|
||||
|
||||
@@ -259,8 +255,6 @@ can occur due to following factors:
|
||||
- **Batch Size and Numerical Stability**: Changes in batch size may cause variations in logprobs and output probabilities, potentially
|
||||
due to non-deterministic behavior in batched operations or numerical instability.
|
||||
|
||||
**Mitigation Strategies**
|
||||
|
||||
For mitigation strategies, please refer to the FAQ entry *Can the output of a prompt vary across runs in vLLM?* in the [FAQs](#faq).
|
||||
|
||||
## Resources for vLLM contributors
|
||||
|
||||
Reference in New Issue
Block a user