Fix various typos found in docs (#32212)
Signed-off-by: Andrew Bennett <potatosaladx@meta.com>
This commit is contained in:
@@ -338,7 +338,7 @@ If you use triton kernels with cuda 13, you might see an error like `ptxas fatal
|
||||
vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
|
||||
```
|
||||
|
||||
It means that the ptxas in triton bundle not compatible with your device. You need to set `TRITON_PTXAS_PATH` environment variable to use cuda toolkit's ptxas manually instead:
|
||||
It means that the ptxas in the triton bundle is not compatible with your device. You need to set `TRITON_PTXAS_PATH` environment variable to use cuda toolkit's ptxas manually instead:
|
||||
|
||||
```shell
|
||||
export CUDA_HOME=/usr/local/cuda
|
||||
|
||||
@@ -123,7 +123,7 @@ We are working on enabling prefix caching and chunked prefill for more categorie
|
||||
#### Mamba Models
|
||||
|
||||
Models using selective state-space mechanisms instead of standard transformer attention are supported.
|
||||
Models that use Mamba-2 and Mamba-1 layers (e.g., `Mamba2ForCausalLM`, `MambaForCausalLM`,`FalconMambaForCausalLM`) are supported.
|
||||
Models that use Mamba-2 and Mamba-1 layers (e.g., `Mamba2ForCausalLM`, `MambaForCausalLM`, `FalconMambaForCausalLM`) are supported.
|
||||
|
||||
Hybrid models that combine Mamba-2 and Mamba-1 layers with standard attention layers are also supported (e.g., `BambaForCausalLM`,
|
||||
`Zamba2ForCausalLM`, `NemotronHForCausalLM`, `FalconH1ForCausalLM` and `GraniteMoeHybridForCausalLM`, `JambaForCausalLM`, `Plamo2ForCausalLM`).
|
||||
|
||||
Reference in New Issue
Block a user