Allow markdownlint to run locally (#36398)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-09 03:05:24 +00:00
parent fde4771bbd
commit a0f44bb616
47 changed files with 394 additions and 392 deletions
--- a/docs/serving/expert_parallel_deployment.md
+++ b/docs/serving/expert_parallel_deployment.md
@@ -17,7 +17,7 @@ Before using EP, you need to install the necessary dependencies. We are actively
 vLLM provides multiple communication backends for EP. Use `--all2all-backend` to select one:

 | Backend | Use Case | Features | Best For |
-|---------|----------|----------|----------|
+| ------- | -------- | -------- | -------- |
 | `allgather_reducescatter` | Default backend | Standard all2all using allgather/reducescatter primitives | General purpose, works with any EP+DP configuration |
 | `deepep_high_throughput` | Multi-node prefill | Grouped GEMM with continuous layout, optimized for prefill | Prefill-dominated workloads, high-throughput scenarios |
 | `deepep_low_latency` | Multi-node decode | CUDA graph support, masked layout, optimized for decode | Decode-dominated workloads, low-latency scenarios |
@@ -48,7 +48,7 @@ Where:
 When EP is enabled, different layers in MoE models behave differently:

 | Layer Type | Behavior | Parallelism Used |
-|------------|----------|------------------|
+| ---------- | -------- | ---------------- |
 | **Expert (MoE) Layers** | Sharded across all EP ranks | Expert Parallel (EP) of size `TP × DP` |
 | **Attention Layers** | Behavior depends on TP size | See below |

@@ -146,9 +146,9 @@ When enabled, vLLM collects load statistics with every forward pass and periodic
 Configure EPLB with the `--eplb-config` argument, which accepts a JSON string. The available keys and their descriptions are:

 | Parameter | Description | Default |
-|-----------|-------------|---------|
-| `window_size`| Number of engine steps to track for rebalancing decisions | 1000 |
-| `step_interval`| Frequency of rebalancing (every N engine steps) | 3000 |
+| --------- | ----------- | ------- |
+| `window_size` | Number of engine steps to track for rebalancing decisions | 1000 |
+| `step_interval` | Frequency of rebalancing (every N engine steps) | 3000 |
 | `log_balancedness` | Log balancedness metrics (avg tokens per expert ÷ max tokens per expert) | `false` |
 | `num_redundant_experts` | Additional global experts per EP rank beyond equal distribution | `0` |
 | `use_async` | Use non-blocking EPLB for reduced latency overhead | `false` |