Allow markdownlint to run locally (#36398)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
Harry Mellor
2026-03-09 03:05:24 +00:00
committed by GitHub
parent fde4771bbd
commit a0f44bb616
47 changed files with 394 additions and 392 deletions

View File

@@ -17,7 +17,7 @@ Before using EP, you need to install the necessary dependencies. We are actively
vLLM provides multiple communication backends for EP. Use `--all2all-backend` to select one:
| Backend | Use Case | Features | Best For |
|---------|----------|----------|----------|
| ------- | -------- | -------- | -------- |
| `allgather_reducescatter` | Default backend | Standard all2all using allgather/reducescatter primitives | General purpose, works with any EP+DP configuration |
| `deepep_high_throughput` | Multi-node prefill | Grouped GEMM with continuous layout, optimized for prefill | Prefill-dominated workloads, high-throughput scenarios |
| `deepep_low_latency` | Multi-node decode | CUDA graph support, masked layout, optimized for decode | Decode-dominated workloads, low-latency scenarios |
@@ -48,7 +48,7 @@ Where:
When EP is enabled, different layers in MoE models behave differently:
| Layer Type | Behavior | Parallelism Used |
|------------|----------|------------------|
| ---------- | -------- | ---------------- |
| **Expert (MoE) Layers** | Sharded across all EP ranks | Expert Parallel (EP) of size `TP × DP` |
| **Attention Layers** | Behavior depends on TP size | See below |
@@ -146,9 +146,9 @@ When enabled, vLLM collects load statistics with every forward pass and periodic
Configure EPLB with the `--eplb-config` argument, which accepts a JSON string. The available keys and their descriptions are:
| Parameter | Description | Default |
|-----------|-------------|---------|
| `window_size`| Number of engine steps to track for rebalancing decisions | 1000 |
| `step_interval`| Frequency of rebalancing (every N engine steps) | 3000 |
| --------- | ----------- | ------- |
| `window_size` | Number of engine steps to track for rebalancing decisions | 1000 |
| `step_interval` | Frequency of rebalancing (every N engine steps) | 3000 |
| `log_balancedness` | Log balancedness metrics (avg tokens per expert ÷ max tokens per expert) | `false` |
| `num_redundant_experts` | Additional global experts per EP rank beyond equal distribution | `0` |
| `use_async` | Use non-blocking EPLB for reduced latency overhead | `false` |