[doc] fix broken links (#32158)

Signed-off-by: minimAluminiumalism <caixuesen@outlook.com>
This commit is contained in:
XlKsyt
2026-01-12 18:18:38 +08:00
committed by GitHub
parent d7b2e57097
commit 899541bdb1

View File

@@ -139,18 +139,14 @@ token data.
const scalar_t* q_ptr = q + seq_idx * q_stride + head_idx * HEAD_SIZE; const scalar_t* q_ptr = q + seq_idx * q_stride + head_idx * HEAD_SIZE;
``` ```
<p align="center"> ![query](../assets/design/paged_attention/query.png)
<img src="../assets/design/paged_attention/query.png" alt="query" width="70%" />
</p>
Each thread defines its own `q_ptr` which points to the assigned Each thread defines its own `q_ptr` which points to the assigned
query token data on global memory. For example, if `VEC_SIZE` is 4 query token data on global memory. For example, if `VEC_SIZE` is 4
and `HEAD_SIZE` is 128, the `q_ptr` points to data that contains and `HEAD_SIZE` is 128, the `q_ptr` points to data that contains
total of 128 elements divided into 128 / 4 = 32 vecs. total of 128 elements divided into 128 / 4 = 32 vecs.
<p align="center"> ![q_vecs](../assets/design/paged_attention/q_vecs.png)
<img src="../assets/design/paged_attention/q_vecs.png" alt="q_vecs" width="70%" />
</p>
```cpp ```cpp
__shared__ Q_vec q_vecs[THREAD_GROUP_SIZE][NUM_VECS_PER_THREAD]; __shared__ Q_vec q_vecs[THREAD_GROUP_SIZE][NUM_VECS_PER_THREAD];
@@ -187,9 +183,7 @@ key token at different iterations. As shown above, that `k_ptr`
points to key token data based on `k_cache` at assigned block, points to key token data based on `k_cache` at assigned block,
assigned head and assigned token. assigned head and assigned token.
<p align="center"> ![key](../assets/design/paged_attention/key.png)
<img src="../assets/design/paged_attention/key.png" alt="key" width="70%" />
</p>
The diagram above illustrates the memory layout for key data. It The diagram above illustrates the memory layout for key data. It
assumes that the `BLOCK_SIZE` is 16, `HEAD_SIZE` is 128, `x` is assumes that the `BLOCK_SIZE` is 16, `HEAD_SIZE` is 128, `x` is
@@ -202,9 +196,7 @@ iterations. Inside each rectangle, there are a total 32 vecs (128
elements for one token) that will be processed by 2 threads (one elements for one token) that will be processed by 2 threads (one
thread group) separately. thread group) separately.
<p align="center"> ![k_vecs](../assets/design/paged_attention/k_vecs.png)
<img src="../assets/design/paged_attention/k_vecs.png" alt="k_vecs" width="70%" />
</p>
```cpp ```cpp
K_vec k_vecs[NUM_VECS_PER_THREAD] K_vec k_vecs[NUM_VECS_PER_THREAD]
@@ -361,17 +353,11 @@ later steps. Now, it should store the normalized softmax result of
## Value ## Value
<p align="center"> ![value](../assets/design/paged_attention/value.png)
<img src="../assets/design/paged_attention/value.png" alt="value" width="70%" />
</p>
<p align="center"> ![logits_vec](../assets/design/paged_attention/logits_vec.png)
<img src="../assets/design/paged_attention/logits_vec.png" alt="logits_vec" width="50%" />
</p>
<p align="center"> ![v_vec](../assets/design/paged_attention/v_vec.png)
<img src="../assets/design/paged_attention/v_vec.png" alt="v_vec" width="70%" />
</p>
Now we need to retrieve the value data and perform dot multiplication Now we need to retrieve the value data and perform dot multiplication
with `logits`. Unlike query and key, there is no thread group with `logits`. Unlike query and key, there is no thread group