[CPU] Support FP8 KV cache (#14741)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
This commit is contained in:
@@ -189,7 +189,7 @@ vLLM CPU backend supports the following vLLM features:
|
||||
- Model Quantization (`INT8 W8A8, AWQ, GPTQ`)
|
||||
- Chunked-prefill
|
||||
- Prefix-caching
|
||||
- FP8-E5M2 KV-Caching (TODO)
|
||||
- FP8-E5M2 KV cache
|
||||
|
||||
## Related runtime environment variables
|
||||
|
||||
|
||||
Reference in New Issue
Block a user