Stop using title frontmatter and fix doc that can only be reached by search (#20623)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
Harry Mellor
2025-07-08 11:27:40 +01:00
committed by GitHub
parent b4bab81660
commit b942c094e3
81 changed files with 82 additions and 238 deletions

View File

@@ -1,6 +1,4 @@
---
title: INT4 W4A16
---
# INT4 W4A16
vLLM supports quantizing weights to INT4 for memory savings and inference acceleration. This quantization method is particularly useful for reducing model size and maintaining low latency in workloads with low queries per second (QPS).