[Hardware][CPU] Support MOE models on x86 CPU (#11831)

Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-01-11 00:07:58 +08:00
parent 5959564f94
commit aa1e77a19c
3 changed files with 43 additions and 4 deletions
--- a/docs/source/getting_started/installation/cpu-x86.md
+++ b/docs/source/getting_started/installation/cpu-x86.md
@@ -5,7 +5,7 @@
 vLLM initially supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16. vLLM CPU backend supports the following vLLM features:

 - Tensor Parallel
- Model Quantization (`INT8 W8A8, AWQ`)
+- Model Quantization (`INT8 W8A8, AWQ, GPTQ`)
 - Chunked-prefill
 - Prefix-caching
 - FP8-E5M2 KV-Caching (TODO)