[Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc (#37914)

Signed-off-by: Baorun Mu <bmu@nvidia.com>
2026-03-24 22:53:24 -04:00
parent a93a53f8a1
commit 9d0351c91d
2 changed files with 170 additions and 0 deletions
--- a/docs/design/cuda_graphs.md
+++ b/docs/design/cuda_graphs.md
@@ -12,6 +12,7 @@ In this document we will discuss the:
 * [CUDA Graphs modes](#cudagraphmodes)
 * [Detailed design](#detailed-design)
 * [Example usage of the different CUDA Graphs modes](#usage-guide)
+* [Vision Encoder (ViT) CUDA Graphs](cuda_graphs_multimodal.md)

 !!! note
    In this document, we refer to pure decode (`max_query_len=1`) or speculative decode (`max_query_len =1+num_spec_tokens`) as **uniform decode** batches, and the opposite would be **non-uniform** batches (i.e., prefill or mixed prefill-decode batches).