[FEAT][ROCm]: Support AITER MLA (#15893)

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: qli88 <qiang.li2@amd.com>
This commit is contained in:
vllmellm
2025-04-23 00:31:13 +08:00
committed by GitHub
parent f34410715f
commit 30bc3e0f66
9 changed files with 668 additions and 30 deletions

View File

@@ -1248,7 +1248,7 @@ class ModelConfig:
or getattr(self.hf_config, "is_matryoshka", False))
BlockSize = Literal[8, 16, 32, 64, 128]
BlockSize = Literal[1, 8, 16, 32, 64, 128]
CacheDType = Literal["auto", "fp8", "fp8_e4m3", "fp8_e5m2"]
PrefixCachingHashAlgo = Literal["builtin", "sha256"]