[MoE Refactor] Rename "naive" all2all backend (#36294)

Signed-off-by: Bill Nell <bnell@redhat.com>
2026-03-19 15:50:34 -04:00
parent 4ee847e400
commit 91be5f9be3
6 changed files with 15 additions and 16 deletions
--- a/docs/design/moe_kernel_features.md
+++ b/docs/design/moe_kernel_features.md
@@ -103,7 +103,7 @@ To be used with a particular `FusedMoEPrepareAndFinalizeModular` subclass, MoE k

 ## Modular Kernel "families"

-The following table shows "families" of modular kernels that are intended to work together. There are some combinations which may work but have not yet been tested, e.g. flashinfer with other fp8 experts. Note that the "naive" backend will work with any non-modular experts.
+The following table shows "families" of modular kernels that are intended to work together. There are some combinations which may work but have not yet been tested, e.g. flashinfer with other fp8 experts.

 | backend | `FusedMoEPrepareAndFinalizeModular` subclasses | `FusedMoEExpertsModular` subclasses |
 | ------- | ---------------------------------------------- | ----------------------------------- |
--- a/docs/serving/expert_parallel_deployment.md
+++ b/docs/serving/expert_parallel_deployment.md
@@ -23,7 +23,6 @@ vLLM provides multiple communication backends for EP. Use `--all2all-backend` to
 | `deepep_low_latency` | Multi-node decode | CUDA graph support, masked layout, optimized for decode | Decode-dominated workloads, low-latency scenarios |
 | `flashinfer_nvlink_one_sided` | MNNVL systems | FlashInfer's one-sided A2A strategy for multi-node NVLink | High-throughput workloads |
 | `flashinfer_nvlink_two_sided` | MNNVL systems | FlashInfer's two-sided A2A strategy for multi-node NVLink | Systems with NVLink across nodes |
-| `naive` | Testing/debugging | Simple broadcast-based implementation | Debugging, not recommended for production |

 ## Single Node Deployment