diff --git a/docs/contributing/model/basic.md b/docs/contributing/model/basic.md index 624f13bf7..ba1f5e43d 100644 --- a/docs/contributing/model/basic.md +++ b/docs/contributing/model/basic.md @@ -138,7 +138,7 @@ These models should follow the same instructions as case (1), but they should in For case (3), we recommend looking at the implementation of [`MiniMaxText01ForCausalLM`](../../../vllm/model_executor/models/minimax_text_01.py) or [`Lfm2ForCausalLM`](../../../vllm/model_executor/models/lfm2.py) as a reference, which use custom "mamba-like" layers `MiniMaxText01LinearAttention` and `ShortConv` respectively. Please follow the same guidelines as case (2) for implementing these models. -We use "mamba-like" to refer to layers that posses a state that is updated in-place, rather than being appended-to (like KV cache for attention). +We use "mamba-like" to refer to layers that possess a state that is updated in-place, rather than being appended-to (like KV cache for attention). For implementing new custom mamba-like layers, one should inherit from `MambaBase` and implement the methods `get_state_dtype`, `get_state_shape` to calculate the data types and state shapes at runtime, as well as `mamba_type` and `get_attn_backend`. It is also necessary to implement the "attention meta-data" class which handles the meta-data that is common across all layers. Please see [`LinearAttentionMetadata`](../../../vllm/v1/attention/backends/linear_attn.py) or [`ShortConvAttentionMetadata`](../../../vllm/v1/attention/backends/short_conv_attn.py) for examples of this. diff --git a/docs/contributing/model/multimodal.md b/docs/contributing/model/multimodal.md index c876cc47c..e123e0dcd 100644 --- a/docs/contributing/model/multimodal.md +++ b/docs/contributing/model/multimodal.md @@ -739,7 +739,7 @@ Each [PromptUpdate][vllm.multimodal.processing.PromptUpdate] instance specifies ``` However, this is not entirely correct. After `FuyuImageProcessor.preprocess_with_tokenizer_info` is called, - a BOS token (``) is also added to the promopt: + a BOS token (``) is also added to the prompt: ??? code diff --git a/docs/getting_started/quickstart.md b/docs/getting_started/quickstart.md index d5c68172d..40b6dab06 100644 --- a/docs/getting_started/quickstart.md +++ b/docs/getting_started/quickstart.md @@ -57,7 +57,7 @@ This guide will help you quickly get started with vLLM to perform: It currently supports Python 3.12, ROCm 7.0 and `glibc >= 2.35`. !!! note - Note that, previously, docker images were published using AMD's docker release pipeline and were located `rocm/vlm-dev`. This is being deprecated by using vLLM's docker release pipeline. + Note that, previously, docker images were published using AMD's docker release pipeline and were located `rocm/vllm-dev`. This is being deprecated by using vLLM's docker release pipeline. === "Google TPU"