[UX] Use gguf repo_id:quant_type syntax for examples and docs (#33371)

Signed-off-by: mgoin <mgoin64@gmail.com>
This commit is contained in:
Michael Goin
2026-01-30 23:14:54 -05:00
committed by GitHub
parent 9df152bbf6
commit 29fba76781
4 changed files with 79 additions and 28 deletions

View File

@@ -56,17 +56,10 @@ Try it yourself with the following argument:
vLLM supports models that are quantized using GGUF.
Try one yourself by downloading a quantized GGUF model and using the following arguments:
```python
from huggingface_hub import hf_hub_download
repo_id = "bartowski/Phi-3-medium-4k-instruct-GGUF"
filename = "Phi-3-medium-4k-instruct-IQ2_M.gguf"
print(hf_hub_download(repo_id, filename=filename))
```
Try one yourself using the `repo_id:quant_type` format to load directly from HuggingFace:
```bash
--model {local-path-printed-above} --tokenizer microsoft/Phi-3-medium-4k-instruct
--model unsloth/Qwen3-0.6B-GGUF:Q4_K_M --tokenizer Qwen/Qwen3-0.6B
```
### CPU offload