[Doc] Minor documentation fixes (#11580)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
Cyrus Leung
2024-12-28 21:53:59 +08:00
committed by GitHub
parent 42bb201fd6
commit d427e5cfda
13 changed files with 27 additions and 25 deletions

View File

@@ -41,7 +41,7 @@ For reading from S3, it will be the number of client instances the host is openi
$ vllm serve /home/meta-llama/Llama-3.2-3B-Instruct --load-format runai_streamer --model-loader-extra-config '{"concurrency":16}'
```
You can controls the size of the CPU Memory buffer to which tensors are read from the file, and limit this size.
You can control the size of the CPU Memory buffer to which tensors are read from the file, and limit this size.
You can read further about CPU buffer memory limiting [here](https://github.com/run-ai/runai-model-streamer/blob/master/docs/src/env-vars.md#runai_streamer_memory_limit).
```console