[Docs] Reduce custom syntax used in docs (#27009)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
@@ -49,7 +49,7 @@ Here is a sample of `LLM` class usage:
|
||||
|
||||
More API details can be found in the [Offline Inference](#offline-inference-api) section of the API docs.
|
||||
|
||||
The code for the `LLM` class can be found in <gh-file:vllm/entrypoints/llm.py>.
|
||||
The code for the `LLM` class can be found in [vllm/entrypoints/llm.py](../../vllm/entrypoints/llm.py).
|
||||
|
||||
### OpenAI-Compatible API Server
|
||||
|
||||
@@ -60,7 +60,7 @@ This server can be started using the `vllm serve` command.
|
||||
vllm serve <model>
|
||||
```
|
||||
|
||||
The code for the `vllm` CLI can be found in <gh-file:vllm/entrypoints/cli/main.py>.
|
||||
The code for the `vllm` CLI can be found in [vllm/entrypoints/cli/main.py](../../vllm/entrypoints/cli/main.py).
|
||||
|
||||
Sometimes you may see the API server entrypoint used directly instead of via the
|
||||
`vllm` CLI command. For example:
|
||||
@@ -74,7 +74,7 @@ python -m vllm.entrypoints.openai.api_server --model <model>
|
||||
`python -m vllm.entrypoints.openai.api_server` is deprecated
|
||||
and may become unsupported in a future release.
|
||||
|
||||
That code can be found in <gh-file:vllm/entrypoints/openai/api_server.py>.
|
||||
That code can be found in [vllm/entrypoints/openai/api_server.py](../../vllm/entrypoints/openai/api_server.py).
|
||||
|
||||
More details on the API server can be found in the [OpenAI-Compatible Server](../serving/openai_compatible_server.md) document.
|
||||
|
||||
@@ -101,7 +101,7 @@ processing.
|
||||
- **Output Processing**: Processes the outputs generated by the model, decoding the
|
||||
token IDs from a language model into human-readable text.
|
||||
|
||||
The code for `LLMEngine` can be found in <gh-file:vllm/engine/llm_engine.py>.
|
||||
The code for `LLMEngine` can be found in [vllm/engine/llm_engine.py](../../vllm/engine/llm_engine.py).
|
||||
|
||||
### AsyncLLMEngine
|
||||
|
||||
@@ -111,9 +111,9 @@ incoming requests. The `AsyncLLMEngine` is designed for online serving, where it
|
||||
can handle multiple concurrent requests and stream outputs to clients.
|
||||
|
||||
The OpenAI-compatible API server uses the `AsyncLLMEngine`. There is also a demo
|
||||
API server that serves as a simpler example in <gh-file:vllm/entrypoints/api_server.py>.
|
||||
API server that serves as a simpler example in [vllm/entrypoints/api_server.py](../../vllm/entrypoints/api_server.py).
|
||||
|
||||
The code for `AsyncLLMEngine` can be found in <gh-file:vllm/engine/async_llm_engine.py>.
|
||||
The code for `AsyncLLMEngine` can be found in [vllm/engine/async_llm_engine.py](../../vllm/engine/async_llm_engine.py).
|
||||
|
||||
## Worker
|
||||
|
||||
|
||||
Reference in New Issue
Block a user