From e86221deb6859c28325097f4568e6d553ae92e8d Mon Sep 17 00:00:00 2001 From: simone-dotolo <84937474+simone-dotolo@users.noreply.github.com> Date: Wed, 4 Mar 2026 18:03:14 +0100 Subject: [PATCH] [Doc] Fix GPU Worker count in Process Count Summary (#36000) Signed-off-by: simone-dotolo Signed-off-by: simone-dotolo <84937474+simone-dotolo@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- docs/design/arch_overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/arch_overview.md b/docs/design/arch_overview.md index 9c25368e5..143cffc26 100644 --- a/docs/design/arch_overview.md +++ b/docs/design/arch_overview.md @@ -122,7 +122,7 @@ For a deployment with `N` GPUs, `TP` tensor parallel size, `DP` data parallel si |---|---|---| | API Server | `A` (default `DP`) | Handles HTTP requests and input processing | | Engine Core | `DP` (default 1) | Scheduler and KV cache management | -| GPU Worker | `N` (= `DP x TP`) | One per GPU, executes model forward passes | +| GPU Worker | `N` (= `DP x PP x TP`) | One per GPU, executes model forward passes | | DP Coordinator | 1 if `DP > 1`, else 0 | Load balancing across DP ranks | | **Total** | **`A + DP + N` (+ 1 if DP > 1)** | |