Add Automatic Prefix Caching (#2762)

Co-authored-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-03-02 03:50:01 -05:00
parent baee28c46c
commit ce4f5a29fb
18 changed files with 615 additions and 289 deletions
--- a/docs/source/models/engine_args.rst
+++ b/docs/source/models/engine_args.rst
@@ -81,6 +81,10 @@ Below, you can find an explanation of every engine argument for vLLM:

    Token block size for contiguous chunks of tokens.

+.. option:: --enable-prefix-caching
+
+    Enables automatic prefix caching
+
 .. option:: --seed <seed>

    Random seed for operations.