[Doc] Convert list tables to MyST (#11594)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
@@ -197,4 +197,4 @@ if __name__ == '__main__':
|
||||
## Known Issues
|
||||
|
||||
- In `v0.5.2`, `v0.5.3`, and `v0.5.3.post1`, there is a bug caused by [zmq](https://github.com/zeromq/pyzmq/issues/2000) , which can occasionally cause vLLM to hang depending on the machine configuration. The solution is to upgrade to the latest version of `vllm` to include the [fix](gh-pr:6759).
|
||||
- To circumvent a NCCL [bug](https://github.com/NVIDIA/nccl/issues/1234) , all vLLM processes will set an environment variable ``NCCL_CUMEM_ENABLE=0`` to disable NCCL's ``cuMem`` allocator. It does not affect performance but only gives memory benefits. When external processes want to set up a NCCL connection with vLLM's processes, they should also set this environment variable, otherwise, inconsistent environment setup will cause NCCL to hang or crash, as observed in the [RLHF integration](https://github.com/OpenRLHF/OpenRLHF/pull/604) and the [discussion](gh-issue:5723#issuecomment-2554389656) .
|
||||
- To circumvent a NCCL [bug](https://github.com/NVIDIA/nccl/issues/1234) , all vLLM processes will set an environment variable `NCCL_CUMEM_ENABLE=0` to disable NCCL's `cuMem` allocator. It does not affect performance but only gives memory benefits. When external processes want to set up a NCCL connection with vLLM's processes, they should also set this environment variable, otherwise, inconsistent environment setup will cause NCCL to hang or crash, as observed in the [RLHF integration](https://github.com/OpenRLHF/OpenRLHF/pull/604) and the [discussion](gh-issue:5723#issuecomment-2554389656) .
|
||||
|
||||
@@ -141,26 +141,25 @@ Gaudi2 devices. Configurations that are not listed may or may not work.
|
||||
|
||||
Currently in vLLM for HPU we support four execution modes, depending on selected HPU PyTorch Bridge backend (via `PT_HPU_LAZY_MODE` environment variable), and `--enforce-eager` flag.
|
||||
|
||||
```{eval-rst}
|
||||
.. list-table:: vLLM execution modes
|
||||
:widths: 25 25 50
|
||||
:header-rows: 1
|
||||
```{list-table} vLLM execution modes
|
||||
:widths: 25 25 50
|
||||
:header-rows: 1
|
||||
|
||||
* - ``PT_HPU_LAZY_MODE``
|
||||
- ``enforce_eager``
|
||||
- execution mode
|
||||
* - 0
|
||||
- 0
|
||||
- torch.compile
|
||||
* - 0
|
||||
- 1
|
||||
- PyTorch eager mode
|
||||
* - 1
|
||||
- 0
|
||||
- HPU Graphs
|
||||
* - 1
|
||||
- 1
|
||||
- PyTorch lazy mode
|
||||
* - `PT_HPU_LAZY_MODE`
|
||||
- `enforce_eager`
|
||||
- execution mode
|
||||
* - 0
|
||||
- 0
|
||||
- torch.compile
|
||||
* - 0
|
||||
- 1
|
||||
- PyTorch eager mode
|
||||
* - 1
|
||||
- 0
|
||||
- HPU Graphs
|
||||
* - 1
|
||||
- 1
|
||||
- PyTorch lazy mode
|
||||
```
|
||||
|
||||
```{warning}
|
||||
|
||||
@@ -68,33 +68,32 @@ gcloud alpha compute tpus queued-resources create QUEUED_RESOURCE_ID \
|
||||
--service-account SERVICE_ACCOUNT
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. list-table:: Parameter descriptions
|
||||
:header-rows: 1
|
||||
```{list-table} Parameter descriptions
|
||||
:header-rows: 1
|
||||
|
||||
* - Parameter name
|
||||
- Description
|
||||
* - QUEUED_RESOURCE_ID
|
||||
- The user-assigned ID of the queued resource request.
|
||||
* - TPU_NAME
|
||||
- The user-assigned name of the TPU which is created when the queued
|
||||
resource request is allocated.
|
||||
* - PROJECT_ID
|
||||
- Your Google Cloud project
|
||||
* - ZONE
|
||||
- The GCP zone where you want to create your Cloud TPU. The value you use
|
||||
depends on the version of TPUs you are using. For more information, see
|
||||
`TPU regions and zones <https://cloud.google.com/tpu/docs/regions-zones>`_
|
||||
* - ACCELERATOR_TYPE
|
||||
- The TPU version you want to use. Specify the TPU version, for example
|
||||
`v5litepod-4` specifies a v5e TPU with 4 cores. For more information,
|
||||
see `TPU versions <https://cloud.devsite.corp.google.com/tpu/docs/system-architecture-tpu-vm#versions>`_.
|
||||
* - RUNTIME_VERSION
|
||||
- The TPU VM runtime version to use. For more information see `TPU VM images <https://cloud.google.com/tpu/docs/runtimes>`_.
|
||||
* - SERVICE_ACCOUNT
|
||||
- The email address for your service account. You can find it in the IAM
|
||||
Cloud Console under *Service Accounts*. For example:
|
||||
`tpu-service-account@<your_project_ID>.iam.gserviceaccount.com`
|
||||
* - Parameter name
|
||||
- Description
|
||||
* - QUEUED_RESOURCE_ID
|
||||
- The user-assigned ID of the queued resource request.
|
||||
* - TPU_NAME
|
||||
- The user-assigned name of the TPU which is created when the queued
|
||||
resource request is allocated.
|
||||
* - PROJECT_ID
|
||||
- Your Google Cloud project
|
||||
* - ZONE
|
||||
- The GCP zone where you want to create your Cloud TPU. The value you use
|
||||
depends on the version of TPUs you are using. For more information, see
|
||||
`TPU regions and zones <https://cloud.google.com/tpu/docs/regions-zones>`_
|
||||
* - ACCELERATOR_TYPE
|
||||
- The TPU version you want to use. Specify the TPU version, for example
|
||||
`v5litepod-4` specifies a v5e TPU with 4 cores. For more information,
|
||||
see `TPU versions <https://cloud.devsite.corp.google.com/tpu/docs/system-architecture-tpu-vm#versions>`_.
|
||||
* - RUNTIME_VERSION
|
||||
- The TPU VM runtime version to use. For more information see `TPU VM images <https://cloud.google.com/tpu/docs/runtimes>`_.
|
||||
* - SERVICE_ACCOUNT
|
||||
- The email address for your service account. You can find it in the IAM
|
||||
Cloud Console under *Service Accounts*. For example:
|
||||
`tpu-service-account@<your_project_ID>.iam.gserviceaccount.com`
|
||||
```
|
||||
|
||||
Connect to your TPU using SSH:
|
||||
|
||||
Reference in New Issue
Block a user