[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com>
This commit is contained in:
@@ -78,8 +78,12 @@ DOLPHIN_CONFIG = GGUFTestConfig(
|
||||
)
|
||||
|
||||
MODELS = [
|
||||
LLAMA_CONFIG, QWEN2_CONFIG, PHI3_CONFIG, GPT2_CONFIG, STABLELM_CONFIG,
|
||||
DOLPHIN_CONFIG
|
||||
LLAMA_CONFIG,
|
||||
QWEN2_CONFIG,
|
||||
PHI3_CONFIG,
|
||||
GPT2_CONFIG,
|
||||
# STABLELM_CONFIG, # enable this when v1 support head_size=80
|
||||
DOLPHIN_CONFIG,
|
||||
# STARCODER_CONFIG, # broken
|
||||
]
|
||||
|
||||
|
||||
Reference in New Issue
Block a user