diff --git a/docs/deployment/docker.md b/docs/deployment/docker.md index 9df829453..ae7cea436 100644 --- a/docs/deployment/docker.md +++ b/docs/deployment/docker.md @@ -84,10 +84,10 @@ DOCKER_BUILDKIT=1 docker build . \ If you have not changed any C++ or CUDA kernel code, you can use precompiled wheels to significantly reduce Docker build time. * **Enable the feature** by adding the build argument: `--build-arg VLLM_USE_PRECOMPILED="1"`. - * **How it works**: By default, vLLM automatically finds the correct wheels from our [Nightly Builds](https://docs.vllm.ai/en/latest/contributing/ci/nightly_builds/) by using the merge-base commit with the upstream `main` branch. + * **How it works**: By default, vLLM automatically finds the correct wheels from our [Nightly Builds](../contributing/ci/nightly_builds.md) by using the merge-base commit with the upstream `main` branch. * **Override commit**: To use wheels from a specific commit, provide the `--build-arg VLLM_PRECOMPILED_WHEEL_COMMIT=` argument. - For a detailed explanation, refer to the documentation on 'Set up using Python-only build (without compilation)' part in [Build wheel from source](https://docs.vllm.ai/en/latest/contributing/ci/nightly_builds.html#precompiled-wheels-usage), these args are similar. + For a detailed explanation, refer to the documentation on 'Set up using Python-only build (without compilation)' part in [Build wheel from source](../contributing/ci/nightly_builds.md#precompiled-wheels-usage), these args are similar. ## Building for Arm64/aarch64 diff --git a/docs/design/debug_vllm_compile.md b/docs/design/debug_vllm_compile.md index 731e542a0..328df5816 100644 --- a/docs/design/debug_vllm_compile.md +++ b/docs/design/debug_vllm_compile.md @@ -33,7 +33,7 @@ goals while minimizing impact to performance and also helps us (vLLM) when you o For more details on the design, please see the following resources: - [Introduction to vLLM-torch.compile blogpost](https://blog.vllm.ai/2025/08/20/torch-compile.html) -- [vLLM-torch.compile integration design](https://docs.vllm.ai/en/latest/design/torch_compile.html) +- [vLLM-torch.compile integration design](./torch_compile.md) - [vLLM Office Hours #26](https://www.youtube.com/live/xLyxc7hxCJc?si=Xulo9pe53C6ywf0V&t=561) - [Talk at PyTorch Conference 2025](https://youtu.be/1wV1ESbGrVQ?si=s1GqymUfwiwOrDTg&t=725) diff --git a/docs/features/custom_logitsprocs.md b/docs/features/custom_logitsprocs.md index 5ddef9db1..232f4363e 100644 --- a/docs/features/custom_logitsprocs.md +++ b/docs/features/custom_logitsprocs.md @@ -180,7 +180,7 @@ The `DummyLogitsProcessor.update_state()` implementation maintains a "sparse" re ### Wrapping an Existing Request-Level Logits Processor -Although the vLLM engine applies logits processors at batch granularity, some users may want to use vLLM with a "request-level" logits processor implementation - an implementation which operates on individual requests. This will be especially true if your logits processor was developed for vLLM version 0, which required it to be a `Callable` (as described [here](https://docs.vllm.ai/en/v0.10.1.1/api/vllm/logits_process.html)) conforming to the following type annotation: +Although the vLLM engine applies logits processors at batch granularity, some users may want to use vLLM with a "request-level" logits processor implementation - an implementation which operates on individual requests. This will be especially true if your logits processor was developed for vLLM version 0, which required it to be a `Callable` (as described [here][vllm.logits_process]) conforming to the following type annotation: ``` python RequestLogitsProcessor = Union[ diff --git a/docs/getting_started/installation/cpu.md b/docs/getting_started/installation/cpu.md index affb94593..d3e23c359 100644 --- a/docs/getting_started/installation/cpu.md +++ b/docs/getting_started/installation/cpu.md @@ -172,13 +172,13 @@ Note, it is recommended to manually reserve 1 CPU for vLLM front-end process whe ### What are supported models on CPU? -For the full and up-to-date list of models validated on CPU platforms, please see the official documentation: [Supported Models on CPU](https://docs.vllm.ai/en/latest/models/hardware_supported_models/cpu) +For the full and up-to-date list of models validated on CPU platforms, please see the official documentation: [Supported Models on CPU](../../models/hardware_supported_models/cpu.md) ### How to find benchmark configuration examples for supported CPU models? -For any model listed under [Supported Models on CPU](https://docs.vllm.ai/en/latest/models/hardware_supported_models/cpu), optimized runtime configurations are provided in the vLLM Benchmark Suite’s CPU test cases, defined in [cpu test cases](https://github.com/vllm-project/vllm/blob/main/.buildkite/performance-benchmarks/tests/serving-tests-cpu.json) -For details on how these optimized configurations are determined, see: [performance-benchmark-details](https://github.com/vllm-project/vllm/tree/main/.buildkite/performance-benchmarks#performance-benchmark-details). -To benchmark the supported models using these optimized settings, follow the steps in [running vLLM Benchmark Suite manually](https://docs.vllm.ai/en/latest/contributing/benchmarks/#manually-trigger-the-benchmark) and run the Benchmark Suite on a CPU environment. +For any model listed under [Supported Models on CPU](../../models/hardware_supported_models/cpu.md), optimized runtime configurations are provided in the vLLM Benchmark Suite’s CPU test cases, defined in [cpu test cases](../../../.buildkite/performance-benchmarks/tests/serving-tests-cpu.json) +For details on how these optimized configurations are determined, see: [performance-benchmark-details](../../../.buildkite/performance-benchmarks/README.md#performance-benchmark-details). +To benchmark the supported models using these optimized settings, follow the steps in [running vLLM Benchmark Suite manually](../../benchmarking/dashboard.md#manually-trigger-the-benchmark) and run the Benchmark Suite on a CPU environment. Below is an example command to benchmark all CPU-supported models using optimized configurations. diff --git a/docs/mkdocs/hooks/url_schemes.py b/docs/mkdocs/hooks/url_schemes.py index f36a64ed7..66fa25d2a 100644 --- a/docs/mkdocs/hooks/url_schemes.py +++ b/docs/mkdocs/hooks/url_schemes.py @@ -34,9 +34,10 @@ TITLE = r"(?P[^\[\]<>]+?)" REPO = r"(?P<repo>.+?/.+?)" TYPE = r"(?P<type>issues|pull|projects)" NUMBER = r"(?P<number>\d+)" +PATH = r"(?P<path>[^\s]+?)" FRAGMENT = r"(?P<fragment>#[^\s]+)?" URL = f"https://github.com/{REPO}/{TYPE}/{NUMBER}{FRAGMENT}" -RELATIVE = r"(?!(https?|ftp)://|#)(?P<path>[^\s]+?)" +RELATIVE = rf"(?!(https?|ftp)://|#){PATH}{FRAGMENT}" # Common titles to use for GitHub links when none is provided in the link. TITLES = {"issues": "Issue ", "pull": "Pull Request ", "projects": "Project "} @@ -55,6 +56,7 @@ def on_page_markdown( title = match.group("title") path = match.group("path") path = (Path(page.file.abs_src_path).parent / path).resolve() + fragment = match.group("fragment") or "" # Check if the path exists and is outside the docs dir if not path.exists() or path.is_relative_to(DOC_DIR): @@ -64,7 +66,7 @@ def on_page_markdown( slug = "tree/main" if path.is_dir() else "blob/main" path = path.relative_to(ROOT_DIR) - url = f"https://github.com/vllm-project/vllm/{slug}/{path}" + url = f"https://github.com/vllm-project/vllm/{slug}/{path}{fragment}" return f"[{gh_icon} {title}]({url})" def replace_github_link(match: re.Match) -> str: @@ -88,8 +90,4 @@ def on_page_markdown( markdown = relative_link.sub(replace_relative_link, markdown) markdown = github_link.sub(replace_github_link, markdown) - - if "interface" in str(page.file.abs_src_path): - print(markdown) - return markdown