[doc] Fold long code blocks to improve readability (#19926)

Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-06-23 13:24:23 +08:00
parent 493c275352
commit f17aec0d63
50 changed files with 3455 additions and 3180 deletions
--- a/docs/models/supported_models.md
+++ b/docs/models/supported_models.md
@@ -70,7 +70,10 @@ To make your model compatible with the Transformers backend, it needs:
 2. `MyAttention` must use `ALL_ATTENTION_FUNCTIONS` to call attention.
 3. `MyModel` must contain `_supports_attention_backend = True`.

-```python title="modeling_my_model.py"
+<details>
+<summary>modeling_my_model.py</summary>
+
+```python

 from transformers import PreTrainedModel
 from torch import nn
@@ -93,6 +96,8 @@ class MyModel(PreTrainedModel):
    _supports_attention_backend = True
 ```

+</details>
+
 Here is what happens in the background when this model is loaded:

 1. The config is loaded.
@@ -103,7 +108,10 @@ That's it!

 For your model to be compatible with vLLM's tensor parallel and/or pipeline parallel features, you must add `base_model_tp_plan` and/or `base_model_pp_plan` to your model's config class:

-```python title="configuration_my_model.py"
+<details>
+<summary>configuration_my_model.py</summary>
+
+```python

 from transformers import PretrainedConfig

@@ -123,6 +131,8 @@ class MyConfig(PretrainedConfig):
    }
 ```

+</details>
+
 - `base_model_tp_plan` is a `dict` that maps fully qualified layer name patterns to tensor parallel styles (currently only `"colwise"` and `"rowwise"` are supported).
 - `base_model_pp_plan` is a `dict` that maps direct child layer names to `tuple`s of `list`s of `str`s:
    * You only need to do this for layers which are not present on all pipeline stages
@@ -198,6 +208,9 @@ huggingface-cli scan-cache --dir ~/.cache/huggingface/hub

 Use the Hugging Face CLI to interactively [delete downloaded model](https://huggingface.co/docs/huggingface_hub/guides/manage-cache#clean-your-cache) from the cache:

+<details>
+<summary>Commands</summary>
+
 ```console
 # The `delete-cache` command requires extra dependencies to work with the TUI.
 # Please run `pip install huggingface_hub[cli]` to install them.
@@ -224,6 +237,8 @@ Start deletion.
 Done. Deleted 1 repo(s) and 0 revision(s) for a total of 438.9M.
 ```

+</details>
+
 #### Using a proxy

 Here are some tips for loading/downloading models from Hugging Face using a proxy:
@@ -601,27 +616,29 @@ Specified using `--task generate`.

    For the best results, we recommend using the following dependency versions (tested on A10 and L40):

-    ```text
-    # Core vLLM-compatible dependencies with Molmo accuracy setup (tested on L40)
-    torch==2.5.1
-    torchvision==0.20.1
-    transformers==4.48.1
-    tokenizers==0.21.0
-    tiktoken==0.7.0
-    vllm==0.7.0
+    ??? Dependency versions

-    # Optional but recommended for improved performance and stability
-    triton==3.1.0
-    xformers==0.0.28.post3
-    uvloop==0.21.0
-    protobuf==5.29.3
-    openai==1.60.2
-    opencv-python-headless==4.11.0.86
-    pillow==10.4.0
+        ```text
+        # Core vLLM-compatible dependencies with Molmo accuracy setup (tested on L40)
+        torch==2.5.1
+        torchvision==0.20.1
+        transformers==4.48.1
+        tokenizers==0.21.0
+        tiktoken==0.7.0
+        vllm==0.7.0

-    # Installed FlashAttention (for float16 only)
-    flash-attn>=2.5.6  # Not used in float32, but should be documented
-    ```
+        # Optional but recommended for improved performance and stability
+        triton==3.1.0
+        xformers==0.0.28.post3
+        uvloop==0.21.0
+        protobuf==5.29.3
+        openai==1.60.2
+        opencv-python-headless==4.11.0.86
+        pillow==10.4.0
+
+        # Installed FlashAttention (for float16 only)
+        flash-attn>=2.5.6  # Not used in float32, but should be documented
+        ```

    **Note:** Make sure you understand the security implications of using outdated packages.