Add Automatic Prefix Caching (#2762)

Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
This commit is contained in:
Sage Moore
2024-03-02 03:50:01 -05:00
committed by GitHub
parent baee28c46c
commit ce4f5a29fb
18 changed files with 615 additions and 289 deletions

View File

@@ -81,6 +81,10 @@ Below, you can find an explanation of every engine argument for vLLM:
Token block size for contiguous chunks of tokens.
.. option:: --enable-prefix-caching
Enables automatic prefix caching
.. option:: --seed <seed>
Random seed for operations.