[Doc] Update more docs with respect to V1 (#29188)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
Cyrus Leung
2025-11-23 10:58:48 +08:00
committed by GitHub
parent 3ed767ec06
commit 389aa1b2eb
6 changed files with 89 additions and 100 deletions

View File

@@ -49,9 +49,6 @@ llm = LLM(model="adept/fuyu-8b", max_model_len=2048, max_num_seqs=2)
By default, we optimize model inference using CUDA graphs which take up extra memory in the GPU.
!!! warning
CUDA graph capture takes up more memory in V1 than in V0.
You can adjust `compilation_config` to achieve a better balance between inference speed and memory usage:
??? code