[doc] add install tips (#17373)
Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>
This commit is contained in:
@@ -44,6 +44,12 @@ To produce performant FP8 quantized models with vLLM, you'll need to install the
|
||||
pip install llmcompressor
|
||||
```
|
||||
|
||||
Additionally, install `vllm` and `lm-evaluation-harness` for evaluation:
|
||||
|
||||
```console
|
||||
pip install vllm lm-eval==0.4.4
|
||||
```
|
||||
|
||||
## Quantization Process
|
||||
|
||||
The quantization process involves three main steps:
|
||||
@@ -86,7 +92,7 @@ recipe = QuantizationModifier(
|
||||
# Apply the quantization algorithm.
|
||||
oneshot(model=model, recipe=recipe)
|
||||
|
||||
# Save the model.
|
||||
# Save the model: Meta-Llama-3-8B-Instruct-FP8-Dynamic
|
||||
SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic"
|
||||
model.save_pretrained(SAVE_DIR)
|
||||
tokenizer.save_pretrained(SAVE_DIR)
|
||||
@@ -94,12 +100,6 @@ tokenizer.save_pretrained(SAVE_DIR)
|
||||
|
||||
### 3. Evaluating Accuracy
|
||||
|
||||
Install `vllm` and `lm-evaluation-harness`:
|
||||
|
||||
```console
|
||||
pip install vllm lm-eval==0.4.4
|
||||
```
|
||||
|
||||
Load and run the model in `vllm`:
|
||||
|
||||
```python
|
||||
|
||||
Reference in New Issue
Block a user