[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc (#37914)

Signed-off-by: Baorun Mu <bmu@nvidia.com>
This commit is contained in:
Baorun (Lauren) Mu
2026-03-24 22:53:24 -04:00
committed by GitHub
parent a93a53f8a1
commit 9d0351c91d
2 changed files with 170 additions and 0 deletions

View File

@@ -12,6 +12,7 @@ In this document we will discuss the:
* [CUDA Graphs modes](#cudagraphmodes)
* [Detailed design](#detailed-design)
* [Example usage of the different CUDA Graphs modes](#usage-guide)
* [Vision Encoder (ViT) CUDA Graphs](cuda_graphs_multimodal.md)
!!! note
In this document, we refer to pure decode (`max_query_len=1`) or speculative decode (`max_query_len =1+num_spec_tokens`) as **uniform decode** batches, and the opposite would be **non-uniform** batches (i.e., prefill or mixed prefill-decode batches).