[BUGFIX] GPTQ quantization compatibility for Qwen3 MOE models (AutoGPTQ and AutoRound-GPTQ) (#23994)

Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-01 05:33:40 +02:00
parent 14b4326b94
commit 183a70967a
3 changed files with 17 additions and 4 deletions
--- a/vllm/model_executor/layers/quantization/gptq_marlin.py
+++ b/vllm/model_executor/layers/quantization/gptq_marlin.py
@@ -119,6 +119,9 @@ class GPTQMarlinConfig(QuantizationConfig):

        self.quant_type = self.TYPE_MAP[(weight_bits, is_sym)]

+        # used to identify GPTQ model quantized by autoround
+        self.autoround_version = full_config.get("autoround_version", "")
+
    def __repr__(self) -> str:
        return (f"GPTQMarlinConfig(quant_type={self.quant_type}, "
                f"group_size={self.group_size}, "