[Docs] Data Parallel deployment documentation (#20768)

Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-11 17:42:10 +01:00
parent d47661f0cd
commit 9907fc4494
6 changed files with 118 additions and 2 deletions
--- a/docs/README.md
+++ b/docs/README.md
@@ -36,7 +36,7 @@ vLLM is flexible and easy to use with:

 - Seamless integration with popular HuggingFace models
 - High-throughput serving with various decoding algorithms, including *parallel sampling*, *beam search*, and more
- Tensor parallelism and pipeline parallelism support for distributed inference
+- Tensor, pipeline, data and expert parallelism support for distributed inference
 - Streaming outputs
 - OpenAI-compatible API server
 - Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs, Gaudi® accelerators and GPUs, IBM Power CPUs, TPU, and AWS Trainium and Inferentia Accelerators.