Add full API docs and improve the UX of navigating them (#17485)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
@@ -1,7 +0,0 @@
|
||||
# AsyncLLMEngine
|
||||
|
||||
```{eval-rst}
|
||||
.. autoclass:: vllm.AsyncLLMEngine
|
||||
:members:
|
||||
:show-inheritance:
|
||||
```
|
||||
@@ -1,17 +0,0 @@
|
||||
# vLLM Engine
|
||||
|
||||
```{eval-rst}
|
||||
.. automodule:: vllm.engine
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. currentmodule:: vllm.engine
|
||||
```
|
||||
|
||||
:::{toctree}
|
||||
:caption: Engines
|
||||
:maxdepth: 2
|
||||
|
||||
llm_engine
|
||||
async_llm_engine
|
||||
:::
|
||||
@@ -1,7 +0,0 @@
|
||||
# LLMEngine
|
||||
|
||||
```{eval-rst}
|
||||
.. autoclass:: vllm.LLMEngine
|
||||
:members:
|
||||
:show-inheritance:
|
||||
```
|
||||
@@ -1,21 +0,0 @@
|
||||
# Inference Parameters
|
||||
|
||||
Inference parameters for vLLM APIs.
|
||||
|
||||
(sampling-params)=
|
||||
|
||||
## Sampling Parameters
|
||||
|
||||
```{eval-rst}
|
||||
.. autoclass:: vllm.SamplingParams
|
||||
:members:
|
||||
```
|
||||
|
||||
(pooling-params)=
|
||||
|
||||
## Pooling Parameters
|
||||
|
||||
```{eval-rst}
|
||||
.. autoclass:: vllm.PoolingParams
|
||||
:members:
|
||||
```
|
||||
@@ -1,9 +0,0 @@
|
||||
# Model Adapters
|
||||
|
||||
## Module Contents
|
||||
|
||||
```{eval-rst}
|
||||
.. automodule:: vllm.model_executor.models.adapters
|
||||
:members:
|
||||
:member-order: bysource
|
||||
```
|
||||
@@ -1,11 +0,0 @@
|
||||
# Model Development
|
||||
|
||||
## Submodules
|
||||
|
||||
:::{toctree}
|
||||
:maxdepth: 1
|
||||
|
||||
interfaces_base
|
||||
interfaces
|
||||
adapters
|
||||
:::
|
||||
@@ -1,9 +0,0 @@
|
||||
# Optional Interfaces
|
||||
|
||||
## Module Contents
|
||||
|
||||
```{eval-rst}
|
||||
.. automodule:: vllm.model_executor.models.interfaces
|
||||
:members:
|
||||
:member-order: bysource
|
||||
```
|
||||
@@ -1,9 +0,0 @@
|
||||
# Base Model Interfaces
|
||||
|
||||
## Module Contents
|
||||
|
||||
```{eval-rst}
|
||||
.. automodule:: vllm.model_executor.models.interfaces_base
|
||||
:members:
|
||||
:member-order: bysource
|
||||
```
|
||||
@@ -1,28 +0,0 @@
|
||||
(multi-modality)=
|
||||
|
||||
# Multi-Modality
|
||||
|
||||
vLLM provides experimental support for multi-modal models through the {mod}`vllm.multimodal` package.
|
||||
|
||||
Multi-modal inputs can be passed alongside text and token prompts to [supported models](#supported-mm-models)
|
||||
via the `multi_modal_data` field in {class}`vllm.inputs.PromptType`.
|
||||
|
||||
Looking to add your own multi-modal model? Please follow the instructions listed [here](#supports-multimodal).
|
||||
|
||||
## Module Contents
|
||||
|
||||
```{eval-rst}
|
||||
.. autodata:: vllm.multimodal.MULTIMODAL_REGISTRY
|
||||
```
|
||||
|
||||
## Submodules
|
||||
|
||||
:::{toctree}
|
||||
:maxdepth: 1
|
||||
|
||||
inputs
|
||||
parse
|
||||
processing
|
||||
profiling
|
||||
registry
|
||||
:::
|
||||
@@ -1,49 +0,0 @@
|
||||
# Input Definitions
|
||||
|
||||
## User-facing inputs
|
||||
|
||||
```{eval-rst}
|
||||
.. autodata:: vllm.multimodal.inputs.MultiModalDataDict
|
||||
```
|
||||
|
||||
## Internal data structures
|
||||
|
||||
```{eval-rst}
|
||||
.. autoclass:: vllm.multimodal.inputs.PlaceholderRange
|
||||
:members:
|
||||
:show-inheritance:
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. autodata:: vllm.multimodal.inputs.NestedTensors
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. autoclass:: vllm.multimodal.inputs.MultiModalFieldElem
|
||||
:members:
|
||||
:show-inheritance:
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. autoclass:: vllm.multimodal.inputs.MultiModalFieldConfig
|
||||
:members:
|
||||
:show-inheritance:
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. autoclass:: vllm.multimodal.inputs.MultiModalKwargsItem
|
||||
:members:
|
||||
:show-inheritance:
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. autoclass:: vllm.multimodal.inputs.MultiModalKwargs
|
||||
:members:
|
||||
:show-inheritance:
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. autoclass:: vllm.multimodal.inputs.MultiModalInputs
|
||||
:members:
|
||||
:show-inheritance:
|
||||
```
|
||||
@@ -1,9 +0,0 @@
|
||||
# Data Parsing
|
||||
|
||||
## Module Contents
|
||||
|
||||
```{eval-rst}
|
||||
.. automodule:: vllm.multimodal.parse
|
||||
:members:
|
||||
:member-order: bysource
|
||||
```
|
||||
@@ -1,9 +0,0 @@
|
||||
# Data Processing
|
||||
|
||||
## Module Contents
|
||||
|
||||
```{eval-rst}
|
||||
.. automodule:: vllm.multimodal.processing
|
||||
:members:
|
||||
:member-order: bysource
|
||||
```
|
||||
@@ -1,9 +0,0 @@
|
||||
# Memory Profiling
|
||||
|
||||
## Module Contents
|
||||
|
||||
```{eval-rst}
|
||||
.. automodule:: vllm.multimodal.profiling
|
||||
:members:
|
||||
:member-order: bysource
|
||||
```
|
||||
@@ -1,9 +0,0 @@
|
||||
# Registry
|
||||
|
||||
## Module Contents
|
||||
|
||||
```{eval-rst}
|
||||
.. automodule:: vllm.multimodal.registry
|
||||
:members:
|
||||
:member-order: bysource
|
||||
```
|
||||
@@ -1,9 +0,0 @@
|
||||
# Offline Inference
|
||||
|
||||
:::{toctree}
|
||||
:caption: Contents
|
||||
:maxdepth: 1
|
||||
|
||||
llm
|
||||
llm_inputs
|
||||
:::
|
||||
@@ -1,7 +0,0 @@
|
||||
# LLM Class
|
||||
|
||||
```{eval-rst}
|
||||
.. autoclass:: vllm.LLM
|
||||
:members:
|
||||
:show-inheritance:
|
||||
```
|
||||
@@ -1,19 +0,0 @@
|
||||
# LLM Inputs
|
||||
|
||||
```{eval-rst}
|
||||
.. autodata:: vllm.inputs.PromptType
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. autoclass:: vllm.inputs.TextPrompt
|
||||
:show-inheritance:
|
||||
:members:
|
||||
:member-order: bysource
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. autoclass:: vllm.inputs.TokensPrompt
|
||||
:show-inheritance:
|
||||
:members:
|
||||
:member-order: bysource
|
||||
```
|
||||
133
docs/source/api/summary.md
Normal file
133
docs/source/api/summary.md
Normal file
@@ -0,0 +1,133 @@
|
||||
# Summary
|
||||
|
||||
(configuration)=
|
||||
|
||||
## Configuration
|
||||
|
||||
API documentation for vLLM's configuration classes.
|
||||
|
||||
```{autodoc2-summary}
|
||||
vllm.config.ModelConfig
|
||||
vllm.config.CacheConfig
|
||||
vllm.config.TokenizerPoolConfig
|
||||
vllm.config.LoadConfig
|
||||
vllm.config.ParallelConfig
|
||||
vllm.config.SchedulerConfig
|
||||
vllm.config.DeviceConfig
|
||||
vllm.config.SpeculativeConfig
|
||||
vllm.config.LoRAConfig
|
||||
vllm.config.PromptAdapterConfig
|
||||
vllm.config.MultiModalConfig
|
||||
vllm.config.PoolerConfig
|
||||
vllm.config.DecodingConfig
|
||||
vllm.config.ObservabilityConfig
|
||||
vllm.config.KVTransferConfig
|
||||
vllm.config.CompilationConfig
|
||||
vllm.config.VllmConfig
|
||||
```
|
||||
|
||||
(offline-inference-api)=
|
||||
|
||||
## Offline Inference
|
||||
|
||||
LLM Class.
|
||||
|
||||
```{autodoc2-summary}
|
||||
vllm.LLM
|
||||
```
|
||||
|
||||
LLM Inputs.
|
||||
|
||||
```{autodoc2-summary}
|
||||
vllm.inputs.PromptType
|
||||
vllm.inputs.TextPrompt
|
||||
vllm.inputs.TokensPrompt
|
||||
```
|
||||
|
||||
## vLLM Engines
|
||||
|
||||
Engine classes for offline and online inference.
|
||||
|
||||
```{autodoc2-summary}
|
||||
vllm.LLMEngine
|
||||
vllm.AsyncLLMEngine
|
||||
```
|
||||
|
||||
## Inference Parameters
|
||||
|
||||
Inference parameters for vLLM APIs.
|
||||
|
||||
(sampling-params)=
|
||||
(pooling-params)=
|
||||
|
||||
```{autodoc2-summary}
|
||||
vllm.SamplingParams
|
||||
vllm.PoolingParams
|
||||
```
|
||||
|
||||
(multi-modality)=
|
||||
|
||||
## Multi-Modality
|
||||
|
||||
vLLM provides experimental support for multi-modal models through the {mod}`vllm.multimodal` package.
|
||||
|
||||
Multi-modal inputs can be passed alongside text and token prompts to [supported models](#supported-mm-models)
|
||||
via the `multi_modal_data` field in {class}`vllm.inputs.PromptType`.
|
||||
|
||||
Looking to add your own multi-modal model? Please follow the instructions listed [here](#supports-multimodal).
|
||||
|
||||
```{autodoc2-summary}
|
||||
vllm.multimodal.MULTIMODAL_REGISTRY
|
||||
```
|
||||
|
||||
### Inputs
|
||||
|
||||
User-facing inputs.
|
||||
|
||||
```{autodoc2-summary}
|
||||
vllm.multimodal.inputs.MultiModalDataDict
|
||||
```
|
||||
|
||||
Internal data structures.
|
||||
|
||||
```{autodoc2-summary}
|
||||
vllm.multimodal.inputs.PlaceholderRange
|
||||
vllm.multimodal.inputs.NestedTensors
|
||||
vllm.multimodal.inputs.MultiModalFieldElem
|
||||
vllm.multimodal.inputs.MultiModalFieldConfig
|
||||
vllm.multimodal.inputs.MultiModalKwargsItem
|
||||
vllm.multimodal.inputs.MultiModalKwargs
|
||||
vllm.multimodal.inputs.MultiModalInputs
|
||||
```
|
||||
|
||||
### Data Parsing
|
||||
|
||||
```{autodoc2-summary}
|
||||
vllm.multimodal.parse
|
||||
```
|
||||
|
||||
### Data Processing
|
||||
|
||||
```{autodoc2-summary}
|
||||
vllm.multimodal.processing
|
||||
```
|
||||
|
||||
### Memory Profiling
|
||||
|
||||
```{autodoc2-summary}
|
||||
vllm.multimodal.profiling
|
||||
```
|
||||
|
||||
### Registry
|
||||
|
||||
```{autodoc2-summary}
|
||||
vllm.multimodal.registry
|
||||
```
|
||||
|
||||
## Model Development
|
||||
|
||||
```{autodoc2-summary}
|
||||
vllm.model_executor.models.interfaces_base
|
||||
vllm.model_executor.models.interfaces
|
||||
vllm.model_executor.models.adapters
|
||||
```
|
||||
21
docs/source/autodoc2_docstring_parser.py
Normal file
21
docs/source/autodoc2_docstring_parser.py
Normal file
@@ -0,0 +1,21 @@
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
from docutils import nodes
|
||||
from myst_parser.parsers.sphinx_ import MystParser
|
||||
from sphinx.ext.napoleon import docstring
|
||||
|
||||
|
||||
class NapoleonParser(MystParser):
|
||||
|
||||
def parse(self, input_string: str, document: nodes.document) -> None:
|
||||
# Get the Sphinx configuration
|
||||
config = document.settings.env.config
|
||||
|
||||
parsed_content = str(
|
||||
docstring.GoogleDocstring(
|
||||
str(docstring.NumpyDocstring(input_string, config)),
|
||||
config,
|
||||
))
|
||||
return super().parse(parsed_content, document)
|
||||
|
||||
|
||||
Parser = NapoleonParser
|
||||
@@ -13,16 +13,17 @@
|
||||
# documentation root, use os.path.abspath to make it absolute, like shown here.
|
||||
|
||||
import datetime
|
||||
import inspect
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
import requests
|
||||
from sphinx.ext import autodoc
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
sys.path.append(os.path.abspath("../.."))
|
||||
REPO_ROOT = Path(__file__).resolve().parent.parent.parent
|
||||
sys.path.append(os.path.abspath(REPO_ROOT))
|
||||
|
||||
# -- Project information -----------------------------------------------------
|
||||
|
||||
@@ -40,8 +41,7 @@ extensions = [
|
||||
"sphinx.ext.linkcode",
|
||||
"sphinx.ext.intersphinx",
|
||||
"sphinx_copybutton",
|
||||
"sphinx.ext.autodoc",
|
||||
"sphinx.ext.autosummary",
|
||||
"autodoc2",
|
||||
"myst_parser",
|
||||
"sphinxarg.ext",
|
||||
"sphinx_design",
|
||||
@@ -49,7 +49,22 @@ extensions = [
|
||||
]
|
||||
myst_enable_extensions = [
|
||||
"colon_fence",
|
||||
"fieldlist",
|
||||
]
|
||||
autodoc2_packages = [
|
||||
{
|
||||
"path": "../../vllm",
|
||||
"exclude_dirs": ["__pycache__", "third_party"],
|
||||
},
|
||||
]
|
||||
autodoc2_output_dir = "api"
|
||||
autodoc2_render_plugin = "myst"
|
||||
autodoc2_hidden_objects = ["dunder", "private", "inherited"]
|
||||
autodoc2_docstring_parser_regexes = [
|
||||
(".*", "docs.source.autodoc2_docstring_parser"),
|
||||
]
|
||||
autodoc2_sort_names = True
|
||||
autodoc2_index_template = None
|
||||
|
||||
# Add any paths that contain templates here, relative to this directory.
|
||||
templates_path = ['_templates']
|
||||
@@ -77,6 +92,11 @@ html_theme_options = {
|
||||
'repository_url': 'https://github.com/vllm-project/vllm',
|
||||
'use_repository_button': True,
|
||||
'use_edit_page_button': True,
|
||||
# Prevents the full API being added to the left sidebar of every page.
|
||||
# Reduces build time by 2.5x and reduces build size from ~225MB to ~95MB.
|
||||
'collapse_navbar': True,
|
||||
# Makes API visible in the right sidebar on API reference pages.
|
||||
'show_toc_level': 3,
|
||||
}
|
||||
# Add any paths that contain custom static files (such as style sheets) here,
|
||||
# relative to this directory. They are copied after the builtin static files,
|
||||
@@ -164,73 +184,64 @@ def linkcode_resolve(domain, info):
|
||||
return None
|
||||
if not info['module']:
|
||||
return None
|
||||
filename = info['module'].replace('.', '/')
|
||||
module = info['module']
|
||||
|
||||
# try to determine the correct file and line number to link to
|
||||
obj = sys.modules[module]
|
||||
# Get path from module name
|
||||
file = Path(f"{info['module'].replace('.', '/')}.py")
|
||||
path = REPO_ROOT / file
|
||||
if not path.exists():
|
||||
path = REPO_ROOT / file.with_suffix("") / "__init__.py"
|
||||
if not path.exists():
|
||||
return None
|
||||
|
||||
# get as specific as we can
|
||||
lineno: int = 0
|
||||
filename: str = ""
|
||||
try:
|
||||
for part in info['fullname'].split('.'):
|
||||
obj = getattr(obj, part)
|
||||
# Get the line number of the object
|
||||
with open(path) as f:
|
||||
lines = f.readlines()
|
||||
name = info['fullname'].split(".")[-1]
|
||||
pattern = fr"^( {{4}})*((def|class) )?{name}\b.*"
|
||||
for lineno, line in enumerate(lines, 1):
|
||||
if not line or line.startswith("#"):
|
||||
continue
|
||||
if re.match(pattern, line):
|
||||
break
|
||||
|
||||
# Skip decorator wrappers by checking if the object is a function
|
||||
# and has a __wrapped__ attribute (which decorators typically set)
|
||||
while hasattr(obj, '__wrapped__'):
|
||||
obj = obj.__wrapped__
|
||||
# If the line number is not found, return None
|
||||
if lineno == len(lines):
|
||||
return None
|
||||
|
||||
if not (inspect.isclass(obj) or inspect.isfunction(obj)
|
||||
or inspect.ismethod(obj)):
|
||||
obj = obj.__class__ # Get the class of the instance
|
||||
|
||||
lineno = inspect.getsourcelines(obj)[1]
|
||||
filename = (inspect.getsourcefile(obj)
|
||||
or f"{filename}.py").split("vllm/", 1)[1]
|
||||
except Exception:
|
||||
# For some things, like a class member, won't work, so
|
||||
# we'll use the line number of the parent (the class)
|
||||
pass
|
||||
|
||||
if filename.startswith("checkouts/"):
|
||||
# If the line number is found, create the URL
|
||||
filename = path.relative_to(REPO_ROOT)
|
||||
if "checkouts" in path.parts:
|
||||
# a PR build on readthedocs
|
||||
pr_number = filename.split("/")[1]
|
||||
filename = filename.split("/", 2)[2]
|
||||
pr_number = REPO_ROOT.name
|
||||
base, branch = get_repo_base_and_branch(pr_number)
|
||||
if base and branch:
|
||||
return f"https://github.com/{base}/blob/{branch}/{filename}#L{lineno}"
|
||||
|
||||
# Otherwise, link to the source file on the main branch
|
||||
return f"https://github.com/vllm-project/vllm/blob/main/{filename}#L{lineno}"
|
||||
|
||||
|
||||
# Mock out external dependencies here, otherwise the autodoc pages may be blank.
|
||||
# Mock out external dependencies here, otherwise sphinx-argparse won't work.
|
||||
autodoc_mock_imports = [
|
||||
"huggingface_hub",
|
||||
"pydantic",
|
||||
"zmq",
|
||||
"cloudpickle",
|
||||
"aiohttp",
|
||||
"starlette",
|
||||
"blake3",
|
||||
"compressed_tensors",
|
||||
"cpuinfo",
|
||||
"cv2",
|
||||
"torch",
|
||||
"transformers",
|
||||
"psutil",
|
||||
"prometheus_client",
|
||||
"sentencepiece",
|
||||
"vllm._C",
|
||||
"PIL",
|
||||
"numpy",
|
||||
'triton',
|
||||
"tqdm",
|
||||
"tensorizer",
|
||||
"pynvml",
|
||||
"outlines",
|
||||
"xgrammar",
|
||||
"librosa",
|
||||
"soundfile",
|
||||
"gguf",
|
||||
"lark",
|
||||
"decord",
|
||||
# The mocks below are required by
|
||||
# docs/source/serving/openai_compatible_server.md's
|
||||
# vllm.entrypoints.openai.cli_args
|
||||
"openai",
|
||||
"fastapi",
|
||||
"partial_json_parser",
|
||||
]
|
||||
|
||||
for mock_target in autodoc_mock_imports:
|
||||
@@ -241,18 +252,6 @@ for mock_target in autodoc_mock_imports:
|
||||
"been loaded into sys.modules when the sphinx build starts.",
|
||||
mock_target)
|
||||
|
||||
|
||||
class MockedClassDocumenter(autodoc.ClassDocumenter):
|
||||
"""Remove note about base class when a class is derived from object."""
|
||||
|
||||
def add_line(self, line: str, source: str, *lineno: int) -> None:
|
||||
if line == " Bases: :py:class:`object`":
|
||||
return
|
||||
super().add_line(line, source, *lineno)
|
||||
|
||||
|
||||
autodoc.ClassDocumenter = MockedClassDocumenter
|
||||
|
||||
intersphinx_mapping = {
|
||||
"python": ("https://docs.python.org/3", None),
|
||||
"typing_extensions":
|
||||
@@ -264,7 +263,4 @@ intersphinx_mapping = {
|
||||
"psutil": ("https://psutil.readthedocs.io/en/stable", None),
|
||||
}
|
||||
|
||||
autodoc_preserve_defaults = True
|
||||
autodoc_warningiserror = True
|
||||
|
||||
navigation_with_keys = False
|
||||
|
||||
@@ -52,8 +52,8 @@ for output in outputs:
|
||||
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
|
||||
```
|
||||
|
||||
More API details can be found in the {doc}`Offline Inference
|
||||
</api/offline_inference/index>` section of the API docs.
|
||||
More API details can be found in the [Offline Inference]
|
||||
(#offline-inference-api) section of the API docs.
|
||||
|
||||
The code for the `LLM` class can be found in <gh-file:vllm/entrypoints/llm.py>.
|
||||
|
||||
|
||||
@@ -42,7 +42,7 @@ Check the ❌ or 🟠 with links to see tracking issue for unsupported feature/h
|
||||
* [APC](#automatic-prefix-caching)
|
||||
* [LoRA](#lora-adapter)
|
||||
* <abbr title="Prompt Adapter">prmpt adptr</abbr>
|
||||
* [SD](#spec_decode)
|
||||
* [SD](#spec-decode)
|
||||
* CUDA graph
|
||||
* <abbr title="Pooling Models">pooling</abbr>
|
||||
* <abbr title="Encoder-Decoder Models">enc-dec</abbr>
|
||||
@@ -122,7 +122,7 @@ Check the ❌ or 🟠 with links to see tracking issue for unsupported feature/h
|
||||
*
|
||||
*
|
||||
*
|
||||
- * [SD](#spec_decode)
|
||||
- * [SD](#spec-decode)
|
||||
* ✅
|
||||
* ✅
|
||||
* ❌
|
||||
@@ -377,7 +377,7 @@ Check the ❌ or 🟠 with links to see tracking issue for unsupported feature/h
|
||||
* ✅
|
||||
* [❌](gh-issue:8475)
|
||||
* ✅
|
||||
- * [SD](#spec_decode)
|
||||
- * [SD](#spec-decode)
|
||||
* ✅
|
||||
* ✅
|
||||
* ✅
|
||||
|
||||
@@ -194,11 +194,8 @@ contributing/vulnerability_management
|
||||
:caption: API Reference
|
||||
:maxdepth: 2
|
||||
|
||||
api/offline_inference/index
|
||||
api/engine/index
|
||||
api/inference_params
|
||||
api/multimodal/index
|
||||
api/model/index
|
||||
api/summary
|
||||
api/vllm/vllm
|
||||
:::
|
||||
|
||||
% Latest news and acknowledgements
|
||||
|
||||
@@ -14,7 +14,7 @@ Usually, this is automatically inferred so you don't have to specify it.
|
||||
## Offline Inference
|
||||
|
||||
The {class}`~vllm.LLM` class provides various methods for offline inference.
|
||||
See [Engine Arguments](#engine-args) for a list of options when initializing the model.
|
||||
See <project:#configuration> for a list of options when initializing the model.
|
||||
|
||||
### `LLM.generate`
|
||||
|
||||
|
||||
@@ -60,7 +60,7 @@ which takes priority over both the model's and Sentence Transformers's defaults.
|
||||
## Offline Inference
|
||||
|
||||
The {class}`~vllm.LLM` class provides various methods for offline inference.
|
||||
See [Engine Arguments](#engine-args) for a list of options when initializing the model.
|
||||
See <project:#configuration> for a list of options when initializing the model.
|
||||
|
||||
### `LLM.encode`
|
||||
|
||||
|
||||
@@ -25,7 +25,7 @@ The available APIs depend on the type of model that is being run:
|
||||
Please refer to the above pages for more details about each API.
|
||||
|
||||
:::{seealso}
|
||||
[API Reference](/api/offline_inference/index)
|
||||
[API Reference](#offline-inference-api)
|
||||
:::
|
||||
|
||||
(configuration-options)=
|
||||
@@ -33,7 +33,7 @@ Please refer to the above pages for more details about each API.
|
||||
## Configuration Options
|
||||
|
||||
This section lists the most common options for running the vLLM engine.
|
||||
For a full list, refer to the [Engine Arguments](#engine-args) page.
|
||||
For a full list, refer to the <project:#configuration> page.
|
||||
|
||||
(model-resolution)=
|
||||
|
||||
|
||||
Reference in New Issue
Block a user