[Docs] Adding links and intro to Speculators and LLM Compressor (#32849)
Signed-off-by: Aidan Reilly <aireilly@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
@@ -2,7 +2,10 @@
|
||||
|
||||
Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices.
|
||||
|
||||
Contents:
|
||||
!!! tip
|
||||
To get started with quantization, see [LLM Compressor](llm_compressor.md), a library for optimizing models for deployment with vLLM that supports FP8, INT8, INT4, and other quantization formats.
|
||||
|
||||
The following are the supported quantization formats for vLLM:
|
||||
|
||||
- [AutoAWQ](auto_awq.md)
|
||||
- [BitsAndBytes](bnb.md)
|
||||
|
||||
Reference in New Issue
Block a user