[Docs] Adding links and intro to Speculators and LLM Compressor (#32849)

Signed-off-by: Aidan Reilly <aireilly@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
Aidan Reilly
2026-01-29 22:12:35 +00:00
committed by GitHub
parent bfb9bdaf3f
commit 133765760b
5 changed files with 73 additions and 7 deletions

View File

@@ -2,7 +2,10 @@
Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices.
Contents:
!!! tip
To get started with quantization, see [LLM Compressor](llm_compressor.md), a library for optimizing models for deployment with vLLM that supports FP8, INT8, INT4, and other quantization formats.
The following are the supported quantization formats for vLLM:
- [AutoAWQ](auto_awq.md)
- [BitsAndBytes](bnb.md)