[Doc] Improve GitHub links (#11491)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
Cyrus Leung
2024-12-26 06:49:26 +08:00
committed by GitHub
parent b689ada91e
commit 6ad909fdda
31 changed files with 147 additions and 136 deletions

View File

@@ -82,7 +82,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
-
-
* - [LoRA](#lora-adapter)
- [✗](https://github.com/vllm-project/vllm/pull/9057)
- [✗](gh-pr:9057)
- ✅
-
-
@@ -168,10 +168,10 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
-
* - <abbr title="Encoder-Decoder Models">enc-dec</abbr>
- ✗
- [✗](https://github.com/vllm-project/vllm/issues/7366)
- [✗](gh-issue:7366)
- ✗
- ✗
- [✗](https://github.com/vllm-project/vllm/issues/7366)
- [✗](gh-issue:7366)
- ✅
- ✅
-
@@ -205,7 +205,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
- ✅
- ✅
- ✅
- [✗](https://github.com/vllm-project/vllm/pull/8199)
- [✗](gh-pr:8199)
- ✅
- ✗
- ✅
@@ -244,7 +244,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
- ✗
- ✗
- ✅
- [✗](https://github.com/vllm-project/vllm/issues/8198)
- [✗](gh-issue:8198)
- ✅
-
-
@@ -253,8 +253,8 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
-
* - <abbr title="Multimodal Inputs">mm</abbr>
- ✅
- [✗](https://github.com/vllm-project/vllm/pull/8348)
- [✗](https://github.com/vllm-project/vllm/pull/7199)
- [✗](gh-pr:8348)
- [✗](gh-pr:7199)
- ?
- ?
- ✅
@@ -273,14 +273,14 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
- ✅
- ✅
- ✅
- [✗](https://github.com/vllm-project/vllm/issues/6137)
- [✗](gh-issue:6137)
- ✅
- ✗
- ✅
- ✅
- ✅
- ?
- [✗](https://github.com/vllm-project/vllm/issues/7968)
- [✗](gh-issue:7968)
- ✅
-
-
@@ -290,14 +290,14 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
- ✅
- ✅
- ✅
- [✗](https://github.com/vllm-project/vllm/issues/6137)
- [✗](gh-issue:6137)
- ✅
- ✗
- ✅
- ✅
- ✅
- ?
- [✗](https://github.com/vllm-project/vllm/issues/7968>)
- [✗](gh-issue:7968>)
- ?
- ✅
-
@@ -314,7 +314,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
- ✅
- ✅
- ✅
- [✗](https://github.com/vllm-project/vllm/issues/9893)
- [✗](gh-issue:9893)
- ?
- ✅
- ✅
@@ -338,7 +338,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
- CPU
- AMD
* - [CP](#chunked-prefill)
- [✗](https://github.com/vllm-project/vllm/issues/2729)
- [✗](gh-issue:2729)
- ✅
- ✅
- ✅
@@ -346,7 +346,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
- ✅
- ✅
* - [APC](#apc)
- [✗](https://github.com/vllm-project/vllm/issues/3687)
- [✗](gh-issue:3687)
- ✅
- ✅
- ✅
@@ -359,7 +359,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
- ✅
- ✅
- ✅
- [✗](https://github.com/vllm-project/vllm/pull/4830)
- [✗](gh-pr:4830)
- ✅
* - <abbr title="Prompt Adapter">prmpt adptr</abbr>
- ✅
@@ -367,7 +367,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
- ✅
- ✅
- ✅
- [✗](https://github.com/vllm-project/vllm/issues/8475)
- [✗](gh-issue:8475)
- ✅
* - [SD](#spec_decode)
- ✅
@@ -439,7 +439,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
- ✅
- ✅
- ✅
- [✗](https://github.com/vllm-project/vllm/issues/8477)
- [✗](gh-issue:8477)
- ✅
* - best-of
- ✅

View File

@@ -47,8 +47,7 @@ outputs = llm.generate(
)
```
Check out [examples/multilora_inference.py](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)
for an example of how to use LoRA adapters with the async engine and how to use more advanced configuration options.
Check out <gh-file:examples/multilora_inference.py> for an example of how to use LoRA adapters with the async engine and how to use more advanced configuration options.
## Serving LoRA Adapters

View File

@@ -5,7 +5,7 @@
This page teaches you how to pass multi-modal inputs to [multi-modal models](#supported-mm-models) in vLLM.
```{note}
We are actively iterating on multi-modal support. See [this RFC](https://github.com/vllm-project/vllm/issues/4194) for upcoming changes,
We are actively iterating on multi-modal support. See [this RFC](gh-issue:4194) for upcoming changes,
and [open an issue on GitHub](https://github.com/vllm-project/vllm/issues/new/choose) if you have any feedback or feature requests.
```
@@ -60,7 +60,7 @@ for o in outputs:
print(generated_text)
```
A code example can be found in [examples/offline_inference_vision_language.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language.py).
Full example: <gh-file:examples/offline_inference_vision_language.py>
To substitute multiple images inside the same text prompt, you can pass in a list of images instead:
@@ -91,7 +91,7 @@ for o in outputs:
print(generated_text)
```
A code example can be found in [examples/offline_inference_vision_language_multi_image.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language_multi_image.py).
Full example: <gh-file:examples/offline_inference_vision_language_multi_image.py>
Multi-image input can be extended to perform video captioning. We show this with [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) as it supports videos:
@@ -125,13 +125,13 @@ for o in outputs:
You can pass a list of NumPy arrays directly to the {code}`'video'` field of the multi-modal dictionary
instead of using multi-image input.
Please refer to [examples/offline_inference_vision_language.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language.py) for more details.
Full example: <gh-file:examples/offline_inference_vision_language.py>
### Audio
You can pass a tuple {code}`(array, sampling_rate)` to the {code}`'audio'` field of the multi-modal dictionary.
Please refer to [examples/offline_inference_audio_language.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_audio_language.py) for more details.
Full example: <gh-file:examples/offline_inference_audio_language.py>
### Embedding
@@ -208,7 +208,7 @@ A chat template is **required** to use Chat Completions API.
Although most models come with a chat template, for others you have to define one yourself.
The chat template can be inferred based on the documentation on the model's HuggingFace repo.
For example, LLaVA-1.5 (`llava-hf/llava-1.5-7b-hf`) requires a chat template that can be found [here](https://github.com/vllm-project/vllm/blob/main/examples/template_llava.jinja).
For example, LLaVA-1.5 (`llava-hf/llava-1.5-7b-hf`) requires a chat template that can be found here: <gh-file:examples/template_llava.jinja>
```
### Image
@@ -271,7 +271,7 @@ chat_response = client.chat.completions.create(
print("Chat completion output:", chat_response.choices[0].message.content)
```
A full code example can be found in [examples/openai_chat_completion_client_for_multimodal.py](https://github.com/vllm-project/vllm/blob/main/examples/openai_chat_completion_client_for_multimodal.py).
Full example: <gh-file:examples/openai_chat_completion_client_for_multimodal.py>
```{tip}
Loading from local file paths is also supported on vLLM: You can specify the allowed local media path via `--allowed-local-media-path` when launching the API server/engine,
@@ -296,7 +296,7 @@ $ export VLLM_IMAGE_FETCH_TIMEOUT=<timeout>
Instead of {code}`image_url`, you can pass a video file via {code}`video_url`.
You can use [these tests](https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_video.py) as reference.
You can use [these tests](gh-file:entrypoints/openai/test_video.py) as reference.
````{note}
By default, the timeout for fetching videos through HTTP URL url is `30` seconds.
@@ -399,7 +399,7 @@ result = chat_completion_from_url.choices[0].message.content
print("Chat completion output from audio url:", result)
```
A full code example can be found in [examples/openai_chat_completion_client_for_multimodal.py](https://github.com/vllm-project/vllm/blob/main/examples/openai_chat_completion_client_for_multimodal.py).
Full example: <gh-file:examples/openai_chat_completion_client_for_multimodal.py>
````{note}
By default, the timeout for fetching audios through HTTP URL is `10` seconds.
@@ -435,7 +435,7 @@ Since VLM2Vec has the same model architecture as Phi-3.5-Vision, we have to expl
to run this model in embedding mode instead of text generation mode.
The custom chat template is completely different from the original one for this model,
and can be found [here](https://github.com/vllm-project/vllm/blob/main/examples/template_vlm2vec.jinja).
and can be found here: <gh-file:examples/template_vlm2vec.jinja>
```
Since the request schema is not defined by OpenAI client, we post a request to the server using the lower-level `requests` library:
@@ -475,7 +475,7 @@ vllm serve MrLight/dse-qwen2-2b-mrl-v1 --task embed \
Like with VLM2Vec, we have to explicitly pass `--task embed`.
Additionally, `MrLight/dse-qwen2-2b-mrl-v1` requires an EOS token for embeddings, which is handled
by [this custom chat template](https://github.com/vllm-project/vllm/blob/main/examples/template_dse_qwen2_vl.jinja).
by a custom chat template: <gh-file:examples/template_dse_qwen2_vl.jinja>
```
```{important}
@@ -483,4 +483,4 @@ Also important, `MrLight/dse-qwen2-2b-mrl-v1` requires a placeholder image of th
example below for details.
```
A full code example can be found in [examples/openai_chat_embedding_client_for_multimodal.py](https://github.com/vllm-project/vllm/blob/main/examples/openai_chat_embedding_client_for_multimodal.py).
Full example: <gh-file:examples/openai_chat_embedding_client_for_multimodal.py>

View File

@@ -4,8 +4,8 @@
```{warning}
Please note that speculative decoding in vLLM is not yet optimized and does
not usually yield inter-token latency reductions for all prompt datasets or sampling parameters. The work
to optimize it is ongoing and can be followed in [this issue.](https://github.com/vllm-project/vllm/issues/4630)
not usually yield inter-token latency reductions for all prompt datasets or sampling parameters.
The work to optimize it is ongoing and can be followed here: <gh-issue:4630>
```
```{warning}
@@ -176,7 +176,7 @@ speculative decoding, breaking down the guarantees into three key areas:
> distribution. [View Test Code](https://github.com/vllm-project/vllm/blob/47b65a550866c7ffbd076ecb74106714838ce7da/tests/samplers/test_rejection_sampler.py#L252)
> - **Greedy Sampling Equality**: Confirms that greedy sampling with speculative decoding matches greedy sampling
> without it. This verifies that vLLM's speculative decoding framework, when integrated with the vLLM forward pass and the vLLM rejection sampler,
> provides a lossless guarantee. Almost all of the tests in [this directory](https://github.com/vllm-project/vllm/tree/b67ae00cdbbe1a58ffc8ff170f0c8d79044a684a/tests/spec_decode/e2e)
> provides a lossless guarantee. Almost all of the tests in <gh-dir:tests/spec_decode/e2e>.
> verify this property using [this assertion implementation](https://github.com/vllm-project/vllm/blob/b67ae00cdbbe1a58ffc8ff170f0c8d79044a684a/tests/spec_decode/e2e/conftest.py#L291)
3. **vLLM Logprob Stability**
@@ -202,4 +202,4 @@ For mitigation strategies, please refer to the FAQ entry *Can the output of a pr
- [A Hacker's Guide to Speculative Decoding in vLLM](https://www.youtube.com/watch?v=9wNAgpX6z_4)
- [What is Lookahead Scheduling in vLLM?](https://docs.google.com/document/d/1Z9TvqzzBPnh5WHcRwjvK2UEeFeq5zMZb5mFE8jR0HCs/edit#heading=h.1fjfb0donq5a)
- [Information on batch expansion](https://docs.google.com/document/d/1T-JaS2T1NRfdP51qzqpyakoCXxSXTtORppiwaj5asxA/edit#heading=h.kk7dq05lc6q8)
- [Dynamic speculative decoding](https://github.com/vllm-project/vllm/issues/4565)
- [Dynamic speculative decoding](gh-issue:4565)

View File

@@ -131,7 +131,7 @@ completion = client.chat.completions.create(
print(completion.choices[0].message.content)
```
The complete code of the examples can be found on [examples/openai_chat_completion_structured_outputs.py](https://github.com/vllm-project/vllm/blob/main/examples/openai_chat_completion_structured_outputs.py).
Full example: <gh-file:examples/openai_chat_completion_structured_outputs.py>
## Experimental Automatic Parsing (OpenAI API)
@@ -257,4 +257,4 @@ outputs = llm.generate(
print(outputs[0].outputs[0].text)
```
A complete example with all options can be found in [examples/offline_inference_structured_outputs.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_structured_outputs.py).
Full example: <gh-file:examples/offline_inference_structured_outputs.py>

View File

@@ -4,7 +4,7 @@ vLLM collects anonymous usage data by default to help the engineering team bette
## What data is collected?
You can see the up to date list of data collected by vLLM in the [usage_lib.py](https://github.com/vllm-project/vllm/blob/main/vllm/usage/usage_lib.py).
The list of data collected by the latest version of vLLM can be found here: <gh-file:vllm/usage/usage_lib.py>
Here is an example as of v0.4.0: