[Docs] Use relative md links instead of absolute html links for cross referencing (#31494)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
@@ -84,10 +84,10 @@ DOCKER_BUILDKIT=1 docker build . \
|
||||
If you have not changed any C++ or CUDA kernel code, you can use precompiled wheels to significantly reduce Docker build time.
|
||||
|
||||
* **Enable the feature** by adding the build argument: `--build-arg VLLM_USE_PRECOMPILED="1"`.
|
||||
* **How it works**: By default, vLLM automatically finds the correct wheels from our [Nightly Builds](https://docs.vllm.ai/en/latest/contributing/ci/nightly_builds/) by using the merge-base commit with the upstream `main` branch.
|
||||
* **How it works**: By default, vLLM automatically finds the correct wheels from our [Nightly Builds](../contributing/ci/nightly_builds.md) by using the merge-base commit with the upstream `main` branch.
|
||||
* **Override commit**: To use wheels from a specific commit, provide the `--build-arg VLLM_PRECOMPILED_WHEEL_COMMIT=<commit_hash>` argument.
|
||||
|
||||
For a detailed explanation, refer to the documentation on 'Set up using Python-only build (without compilation)' part in [Build wheel from source](https://docs.vllm.ai/en/latest/contributing/ci/nightly_builds.html#precompiled-wheels-usage), these args are similar.
|
||||
For a detailed explanation, refer to the documentation on 'Set up using Python-only build (without compilation)' part in [Build wheel from source](../contributing/ci/nightly_builds.md#precompiled-wheels-usage), these args are similar.
|
||||
|
||||
## Building for Arm64/aarch64
|
||||
|
||||
|
||||
@@ -33,7 +33,7 @@ goals while minimizing impact to performance and also helps us (vLLM) when you o
|
||||
For more details on the design, please see the following resources:
|
||||
|
||||
- [Introduction to vLLM-torch.compile blogpost](https://blog.vllm.ai/2025/08/20/torch-compile.html)
|
||||
- [vLLM-torch.compile integration design](https://docs.vllm.ai/en/latest/design/torch_compile.html)
|
||||
- [vLLM-torch.compile integration design](./torch_compile.md)
|
||||
- [vLLM Office Hours #26](https://www.youtube.com/live/xLyxc7hxCJc?si=Xulo9pe53C6ywf0V&t=561)
|
||||
- [Talk at PyTorch Conference 2025](https://youtu.be/1wV1ESbGrVQ?si=s1GqymUfwiwOrDTg&t=725)
|
||||
|
||||
|
||||
@@ -180,7 +180,7 @@ The `DummyLogitsProcessor.update_state()` implementation maintains a "sparse" re
|
||||
|
||||
### Wrapping an Existing Request-Level Logits Processor
|
||||
|
||||
Although the vLLM engine applies logits processors at batch granularity, some users may want to use vLLM with a "request-level" logits processor implementation - an implementation which operates on individual requests. This will be especially true if your logits processor was developed for vLLM version 0, which required it to be a `Callable` (as described [here](https://docs.vllm.ai/en/v0.10.1.1/api/vllm/logits_process.html)) conforming to the following type annotation:
|
||||
Although the vLLM engine applies logits processors at batch granularity, some users may want to use vLLM with a "request-level" logits processor implementation - an implementation which operates on individual requests. This will be especially true if your logits processor was developed for vLLM version 0, which required it to be a `Callable` (as described [here][vllm.logits_process]) conforming to the following type annotation:
|
||||
|
||||
``` python
|
||||
RequestLogitsProcessor = Union[
|
||||
|
||||
@@ -172,13 +172,13 @@ Note, it is recommended to manually reserve 1 CPU for vLLM front-end process whe
|
||||
|
||||
### What are supported models on CPU?
|
||||
|
||||
For the full and up-to-date list of models validated on CPU platforms, please see the official documentation: [Supported Models on CPU](https://docs.vllm.ai/en/latest/models/hardware_supported_models/cpu)
|
||||
For the full and up-to-date list of models validated on CPU platforms, please see the official documentation: [Supported Models on CPU](../../models/hardware_supported_models/cpu.md)
|
||||
|
||||
### How to find benchmark configuration examples for supported CPU models?
|
||||
|
||||
For any model listed under [Supported Models on CPU](https://docs.vllm.ai/en/latest/models/hardware_supported_models/cpu), optimized runtime configurations are provided in the vLLM Benchmark Suite’s CPU test cases, defined in [cpu test cases](https://github.com/vllm-project/vllm/blob/main/.buildkite/performance-benchmarks/tests/serving-tests-cpu.json)
|
||||
For details on how these optimized configurations are determined, see: [performance-benchmark-details](https://github.com/vllm-project/vllm/tree/main/.buildkite/performance-benchmarks#performance-benchmark-details).
|
||||
To benchmark the supported models using these optimized settings, follow the steps in [running vLLM Benchmark Suite manually](https://docs.vllm.ai/en/latest/contributing/benchmarks/#manually-trigger-the-benchmark) and run the Benchmark Suite on a CPU environment.
|
||||
For any model listed under [Supported Models on CPU](../../models/hardware_supported_models/cpu.md), optimized runtime configurations are provided in the vLLM Benchmark Suite’s CPU test cases, defined in [cpu test cases](../../../.buildkite/performance-benchmarks/tests/serving-tests-cpu.json)
|
||||
For details on how these optimized configurations are determined, see: [performance-benchmark-details](../../../.buildkite/performance-benchmarks/README.md#performance-benchmark-details).
|
||||
To benchmark the supported models using these optimized settings, follow the steps in [running vLLM Benchmark Suite manually](../../benchmarking/dashboard.md#manually-trigger-the-benchmark) and run the Benchmark Suite on a CPU environment.
|
||||
|
||||
Below is an example command to benchmark all CPU-supported models using optimized configurations.
|
||||
|
||||
|
||||
@@ -34,9 +34,10 @@ TITLE = r"(?P<title>[^\[\]<>]+?)"
|
||||
REPO = r"(?P<repo>.+?/.+?)"
|
||||
TYPE = r"(?P<type>issues|pull|projects)"
|
||||
NUMBER = r"(?P<number>\d+)"
|
||||
PATH = r"(?P<path>[^\s]+?)"
|
||||
FRAGMENT = r"(?P<fragment>#[^\s]+)?"
|
||||
URL = f"https://github.com/{REPO}/{TYPE}/{NUMBER}{FRAGMENT}"
|
||||
RELATIVE = r"(?!(https?|ftp)://|#)(?P<path>[^\s]+?)"
|
||||
RELATIVE = rf"(?!(https?|ftp)://|#){PATH}{FRAGMENT}"
|
||||
|
||||
# Common titles to use for GitHub links when none is provided in the link.
|
||||
TITLES = {"issues": "Issue ", "pull": "Pull Request ", "projects": "Project "}
|
||||
@@ -55,6 +56,7 @@ def on_page_markdown(
|
||||
title = match.group("title")
|
||||
path = match.group("path")
|
||||
path = (Path(page.file.abs_src_path).parent / path).resolve()
|
||||
fragment = match.group("fragment") or ""
|
||||
|
||||
# Check if the path exists and is outside the docs dir
|
||||
if not path.exists() or path.is_relative_to(DOC_DIR):
|
||||
@@ -64,7 +66,7 @@ def on_page_markdown(
|
||||
slug = "tree/main" if path.is_dir() else "blob/main"
|
||||
|
||||
path = path.relative_to(ROOT_DIR)
|
||||
url = f"https://github.com/vllm-project/vllm/{slug}/{path}"
|
||||
url = f"https://github.com/vllm-project/vllm/{slug}/{path}{fragment}"
|
||||
return f"[{gh_icon} {title}]({url})"
|
||||
|
||||
def replace_github_link(match: re.Match) -> str:
|
||||
@@ -88,8 +90,4 @@ def on_page_markdown(
|
||||
|
||||
markdown = relative_link.sub(replace_relative_link, markdown)
|
||||
markdown = github_link.sub(replace_github_link, markdown)
|
||||
|
||||
if "interface" in str(page.file.abs_src_path):
|
||||
print(markdown)
|
||||
|
||||
return markdown
|
||||
|
||||
Reference in New Issue
Block a user