tools/ep_kernels/README.md

# Expert parallel kernels

Large-scale cluster-level expert parallel, as described in the [DeepSeek-V3 Technical Report](http://arxiv.org/abs/2412.19437), is an efficient way to deploy sparse MoE models with many experts. However, such deployment requires many components beyond a normal Python package, including system package support and system driver support. It is impossible to bundle all these components into a Python package.

Here we break down the requirements in 2 steps:

1. Build and install the Python libraries ([DeepEP](https://github.com/deepseek-ai/DeepEP)), including necessary dependencies like NVSHMEM. This step does not require any privileged access. Any user can do this.
2. Configure NVIDIA driver to enable IBGDA. This step requires root access, and must be done on the host machine.

Step 2 is necessary for multi-node deployment.

All scripts accept a positional argument as workspace path for staging the build, defaulting to `$(pwd)/ep_kernels_workspace`.

## Usage

```bash
# for hopper
TORCH_CUDA_ARCH_LIST="9.0" bash install_python_libraries.sh
# for blackwell
TORCH_CUDA_ARCH_LIST="10.0" bash install_python_libraries.sh
```

Additional step for multi-node deployment:

```bash
sudo bash configure_system_drivers.sh # update-initramfs can take several minutes
sudo reboot # Reboot is required to load the new driver
```
[Docs] Switch to better markdown linting pre-commit hook (#21851) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-07-30 03:45:08 +01:00			`# Expert parallel kernels`

[misc] add instructions on how to install nvshmem/pplx/deepep (#17964) Signed-off-by: youkaichao <youkaichao@gmail.com> 2025-05-12 09:02:39 +08:00			`Large-scale cluster-level expert parallel, as described in the [DeepSeek-V3 Technical Report](http://arxiv.org/abs/2412.19437), is an efficient way to deploy sparse MoE models with many experts. However, such deployment requires many components beyond a normal Python package, including system package support and system driver support. It is impossible to bundle all these components into a Python package.`

Simplify ep kernels installation (#19412) Signed-off-by: youkaichao <youkaichao@gmail.com> 2025-06-10 20:06:08 +08:00			`Here we break down the requirements in 2 steps:`
[Docs] Switch to better markdown linting pre-commit hook (#21851) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-07-30 03:45:08 +01:00
[WideEP] Remove pplx all2all backend (#33724) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> 2026-02-26 17:30:10 -05:00			`1. Build and install the Python libraries ([DeepEP](https://github.com/deepseek-ai/DeepEP)), including necessary dependencies like NVSHMEM. This step does not require any privileged access. Any user can do this.`
Simplify ep kernels installation (#19412) Signed-off-by: youkaichao <youkaichao@gmail.com> 2025-06-10 20:06:08 +08:00			`2. Configure NVIDIA driver to enable IBGDA. This step requires root access, and must be done on the host machine.`
[misc] add instructions on how to install nvshmem/pplx/deepep (#17964) Signed-off-by: youkaichao <youkaichao@gmail.com> 2025-05-12 09:02:39 +08:00
[docs] Improve wide-EP performance + benchmarking documentation (#27933) Signed-off-by: Seiji Eicher <seiji@anyscale.com> 2025-12-10 17:15:54 -05:00			`Step 2 is necessary for multi-node deployment.`
[misc] add instructions on how to install nvshmem/pplx/deepep (#17964) Signed-off-by: youkaichao <youkaichao@gmail.com> 2025-05-12 09:02:39 +08:00
			All scripts accept a positional argument as workspace path for staging the build, defaulting to `$(pwd)/ep_kernels_workspace`.

[Docs] Switch to better markdown linting pre-commit hook (#21851) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> 2025-07-30 03:45:08 +01:00			`## Usage`
[misc] add instructions on how to install nvshmem/pplx/deepep (#17964) Signed-off-by: youkaichao <youkaichao@gmail.com> 2025-05-12 09:02:39 +08:00
			```bash
[bugfix] fix blackwell deepep installation (#22255) 2025-08-06 01:26:09 +08:00			`# for hopper`
			`TORCH_CUDA_ARCH_LIST="9.0" bash install_python_libraries.sh`
			`# for blackwell`
			`TORCH_CUDA_ARCH_LIST="10.0" bash install_python_libraries.sh`
[misc] add instructions on how to install nvshmem/pplx/deepep (#17964) Signed-off-by: youkaichao <youkaichao@gmail.com> 2025-05-12 09:02:39 +08:00			```

[bugfix] fix blackwell deepep installation (#22255) 2025-08-06 01:26:09 +08:00			`Additional step for multi-node deployment:`
[misc] add instructions on how to install nvshmem/pplx/deepep (#17964) Signed-off-by: youkaichao <youkaichao@gmail.com> 2025-05-12 09:02:39 +08:00
			```bash
[docs] Improve wide-EP performance + benchmarking documentation (#27933) Signed-off-by: Seiji Eicher <seiji@anyscale.com> 2025-12-10 17:15:54 -05:00			`sudo bash configure_system_drivers.sh # update-initramfs can take several minutes`
[misc] add instructions on how to install nvshmem/pplx/deepep (#17964) Signed-off-by: youkaichao <youkaichao@gmail.com> 2025-05-12 09:02:39 +08:00			`sudo reboot # Reboot is required to load the new driver`
			```