[Misc][Quark] Upstream Quark format to VLLM (#10765)

Signed-off-by: kewang-xlnx <kewang@xilinx.com> Signed-off-by: kewang2 <kewang2@amd.com> Co-authored-by: kewang2 <kewang2@amd.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-01-16 00:05:15 +08:00
parent 5ecf3e0aaf
commit de0526f668
32 changed files with 1264 additions and 70 deletions
--- a/vllm/model_executor/layers/quantization/base_config.py
+++ b/vllm/model_executor/layers/quantization/base_config.py
@@ -133,3 +133,6 @@ class QuantizationConfig(ABC):
            method.
        """
        raise NotImplementedError
+
+    def get_cache_scale(self, name: str) -> Optional[str]:
+        return None