[docs] Improve wide-EP performance + benchmarking documentation (#27933)

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-12-10 17:15:54 -05:00
parent fcb894222f
commit b9e0951f96
3 changed files with 42 additions and 4 deletions
--- a/tools/ep_kernels/README.md
+++ b/tools/ep_kernels/README.md
@@ -7,7 +7,7 @@ Here we break down the requirements in 2 steps:
 1. Build and install the Python libraries (both [pplx-kernels](https://github.com/ppl-ai/pplx-kernels) and [DeepEP](https://github.com/deepseek-ai/DeepEP)), including necessary dependencies like NVSHMEM. This step does not require any privileged access. Any user can do this.
 2. Configure NVIDIA driver to enable IBGDA. This step requires root access, and must be done on the host machine.

-2 is necessary for multi-node deployment.
+Step 2 is necessary for multi-node deployment.

 All scripts accept a positional argument as workspace path for staging the build, defaulting to `$(pwd)/ep_kernels_workspace`.

@@ -23,6 +23,6 @@ TORCH_CUDA_ARCH_LIST="10.0" bash install_python_libraries.sh
 Additional step for multi-node deployment:

 ```bash
-sudo bash configure_system_drivers.sh
+sudo bash configure_system_drivers.sh # update-initramfs can take several minutes
 sudo reboot # Reboot is required to load the new driver
 ```