[doc] add install tips (#17373)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
This commit is contained in:
Reid
2025-05-01 01:02:41 +08:00
committed by GitHub
parent 584f5fb4c6
commit 2ac74d098e
5 changed files with 29 additions and 10 deletions

View File

@@ -44,6 +44,12 @@ To produce performant FP8 quantized models with vLLM, you'll need to install the
pip install llmcompressor
```
Additionally, install `vllm` and `lm-evaluation-harness` for evaluation:
```console
pip install vllm lm-eval==0.4.4
```
## Quantization Process
The quantization process involves three main steps:
@@ -86,7 +92,7 @@ recipe = QuantizationModifier(
# Apply the quantization algorithm.
oneshot(model=model, recipe=recipe)
# Save the model.
# Save the model: Meta-Llama-3-8B-Instruct-FP8-Dynamic
SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic"
model.save_pretrained(SAVE_DIR)
tokenizer.save_pretrained(SAVE_DIR)
@@ -94,12 +100,6 @@ tokenizer.save_pretrained(SAVE_DIR)
### 3. Evaluating Accuracy
Install `vllm` and `lm-evaluation-harness`:
```console
pip install vllm lm-eval==0.4.4
```
Load and run the model in `vllm`:
```python