[Hardware][XPU] AWQ/GPTQ support for xpu backend (#10107)

Signed-off-by: yan ma <yan.ma@intel.com>
This commit is contained in:
Yan Ma
2024-11-19 02:18:05 +08:00
committed by GitHub
parent 281cc4b3cd
commit 6b2d25efc7
7 changed files with 146 additions and 52 deletions

View File

@@ -27,7 +27,7 @@ The table below shows the compatibility of various quantization implementations
- ✅︎
- ✅︎
- ✗
-
- ✅︎
- ✅︎
- ✗
- ✗
@@ -38,8 +38,8 @@ The table below shows the compatibility of various quantization implementations
- ✅︎
- ✅︎
- ✗
-
-
- ✅︎
- ✅︎
- ✗
- ✗
* - Marlin (GPTQ/AWQ/FP8)
@@ -129,4 +129,4 @@ Notes:
Please note that this compatibility chart may be subject to change as vLLM continues to evolve and expand its support for different hardware platforms and quantization methods.
For the most up-to-date information on hardware support and quantization methods, please check the `quantization directory <https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/layers/quantization>`_ or consult with the vLLM development team.
For the most up-to-date information on hardware support and quantization methods, please check the `quantization directory <https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/layers/quantization>`_ or consult with the vLLM development team.