Fix various typos found in docs (#32212)

Signed-off-by: Andrew Bennett <potatosaladx@meta.com>
2026-01-12 21:41:47 -06:00
parent 60b77e1463
commit f243abc92d
21 changed files with 26 additions and 26 deletions
--- a/docs/features/disagg_encoder.md
+++ b/docs/features/disagg_encoder.md
@@ -68,7 +68,7 @@ Here is a figure illustrating disaggregate encoder flow:

 ![Disaggregated Encoder Flow](../assets/features/disagg_encoder/disagg_encoder_flow.png)

-For the PD disaggregation part, the Prefill instance receive cache exactly the same as the disaggregate encoder flow above. Prefill instance executes 1 step (prefill -> 1 token output) and then transfer KV cache to the Decode instance for the remaining execution. The KV transfer part purely happens after the execute of the PDinstance.
+For the PD disaggregation part, the Prefill instance receives cache exactly the same as the disaggregated encoder flow above. Prefill instance executes 1 step (prefill -> 1 token output) and then transfers KV cache to the Decode instance for the remaining execution. The KV transfer part purely happens after the execution of the PD instance.

 `docs/features/disagg_prefill.md` shows the brief idea about the disaggregated prefill (v0)

--- a/docs/features/disagg_prefill.md
+++ b/docs/features/disagg_prefill.md
@@ -1,6 +1,6 @@
 # Disaggregated Prefilling (experimental)

-This page introduces you the disaggregated prefilling feature in vLLM.
+This page introduces you to the disaggregated prefilling feature in vLLM.

 !!! note
    This feature is experimental and subject to change.
--- a/docs/features/quantization/inc.md
+++ b/docs/features/quantization/inc.md
@@ -19,7 +19,7 @@ Once you've completed the model calibration process and collected the measuremen

 ```bash
 export QUANT_CONFIG=/path/to/quant/config/inc/meta-llama-3.1-405b-instruct/maxabs_measure_g3.json
-vllm serve meta-llama/Llama-3.1-405B-Instruct --quantization inc --kv-cache-dtype fp8_inc --tensor_paralel_size 8
+vllm serve meta-llama/Llama-3.1-405B-Instruct --quantization inc --kv-cache-dtype fp8_inc --tensor-parallel-size 8
 ```

 !!! tip
--- a/docs/features/spec_decode.md
+++ b/docs/features/spec_decode.md
@@ -173,7 +173,7 @@ Suffix Decoding can achieve better performance for tasks with high repetition, s
 ## Speculating using MLP speculators

 The following code configures vLLM to use speculative decoding where proposals are generated by
-draft models that conditioning draft predictions on both context vectors and sampled tokens.
+draft models that condition draft predictions on both context vectors and sampled tokens.
 For more information see [this blog](https://pytorch.org/blog/hitchhikers-guide-speculative-decoding/) or
 [this technical report](https://arxiv.org/abs/2404.19124).

--- a/docs/features/structured_outputs.md
+++ b/docs/features/structured_outputs.md
@@ -39,7 +39,7 @@ request. You may also choose a specific backend, along with
 some options. A full set of options is available in the `vllm serve --help`
 text.

-Now let´s see an example for each of the cases, starting with the `choice`, as it´s the easiest one:
+Now let's see an example for each of the cases, starting with the `choice`, as it's the easiest one:

 ??? code

@@ -126,12 +126,12 @@ The next example shows how to use the `response_format` parameter with a Pydanti
    ```

 !!! tip
-    While not strictly necessary, normally it´s better to indicate in the prompt the
+    While not strictly necessary, normally it's better to indicate in the prompt the
    JSON schema and how the fields should be populated. This can improve the
    results notably in most cases.

 Finally we have the `grammar` option, which is probably the most
-difficult to use, but it´s really powerful. It allows us to define complete
+difficult to use, but it's really powerful. It allows us to define complete
 languages like SQL queries. It works by using a context free EBNF grammar.
 As an example, we can use to define a specific format of simplified SQL queries:

@@ -303,7 +303,7 @@ An example of using `structural_tag` can be found here: [examples/online_serving
 ## Offline Inference

 Offline inference allows for the same types of structured outputs.
-To use it, we´ll need to configure the structured outputs using the class `StructuredOutputsParams` inside `SamplingParams`.
+To use it, we'll need to configure the structured outputs using the class `StructuredOutputsParams` inside `SamplingParams`.
 The main available options inside `StructuredOutputsParams` are:

 - `json`