[Doc] Improve GitHub links (#11491)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
@@ -82,7 +82,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
|
||||
-
|
||||
-
|
||||
* - [LoRA](#lora-adapter)
|
||||
- [✗](https://github.com/vllm-project/vllm/pull/9057)
|
||||
- [✗](gh-pr:9057)
|
||||
- ✅
|
||||
-
|
||||
-
|
||||
@@ -168,10 +168,10 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
|
||||
-
|
||||
* - <abbr title="Encoder-Decoder Models">enc-dec</abbr>
|
||||
- ✗
|
||||
- [✗](https://github.com/vllm-project/vllm/issues/7366)
|
||||
- [✗](gh-issue:7366)
|
||||
- ✗
|
||||
- ✗
|
||||
- [✗](https://github.com/vllm-project/vllm/issues/7366)
|
||||
- [✗](gh-issue:7366)
|
||||
- ✅
|
||||
- ✅
|
||||
-
|
||||
@@ -205,7 +205,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
- [✗](https://github.com/vllm-project/vllm/pull/8199)
|
||||
- [✗](gh-pr:8199)
|
||||
- ✅
|
||||
- ✗
|
||||
- ✅
|
||||
@@ -244,7 +244,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
|
||||
- ✗
|
||||
- ✗
|
||||
- ✅
|
||||
- [✗](https://github.com/vllm-project/vllm/issues/8198)
|
||||
- [✗](gh-issue:8198)
|
||||
- ✅
|
||||
-
|
||||
-
|
||||
@@ -253,8 +253,8 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
|
||||
-
|
||||
* - <abbr title="Multimodal Inputs">mm</abbr>
|
||||
- ✅
|
||||
- [✗](https://github.com/vllm-project/vllm/pull/8348)
|
||||
- [✗](https://github.com/vllm-project/vllm/pull/7199)
|
||||
- [✗](gh-pr:8348)
|
||||
- [✗](gh-pr:7199)
|
||||
- ?
|
||||
- ?
|
||||
- ✅
|
||||
@@ -273,14 +273,14 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
- [✗](https://github.com/vllm-project/vllm/issues/6137)
|
||||
- [✗](gh-issue:6137)
|
||||
- ✅
|
||||
- ✗
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
- ?
|
||||
- [✗](https://github.com/vllm-project/vllm/issues/7968)
|
||||
- [✗](gh-issue:7968)
|
||||
- ✅
|
||||
-
|
||||
-
|
||||
@@ -290,14 +290,14 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
- [✗](https://github.com/vllm-project/vllm/issues/6137)
|
||||
- [✗](gh-issue:6137)
|
||||
- ✅
|
||||
- ✗
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
- ?
|
||||
- [✗](https://github.com/vllm-project/vllm/issues/7968>)
|
||||
- [✗](gh-issue:7968>)
|
||||
- ?
|
||||
- ✅
|
||||
-
|
||||
@@ -314,7 +314,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
- [✗](https://github.com/vllm-project/vllm/issues/9893)
|
||||
- [✗](gh-issue:9893)
|
||||
- ?
|
||||
- ✅
|
||||
- ✅
|
||||
@@ -338,7 +338,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
|
||||
- CPU
|
||||
- AMD
|
||||
* - [CP](#chunked-prefill)
|
||||
- [✗](https://github.com/vllm-project/vllm/issues/2729)
|
||||
- [✗](gh-issue:2729)
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
@@ -346,7 +346,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
|
||||
- ✅
|
||||
- ✅
|
||||
* - [APC](#apc)
|
||||
- [✗](https://github.com/vllm-project/vllm/issues/3687)
|
||||
- [✗](gh-issue:3687)
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
@@ -359,7 +359,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
- [✗](https://github.com/vllm-project/vllm/pull/4830)
|
||||
- [✗](gh-pr:4830)
|
||||
- ✅
|
||||
* - <abbr title="Prompt Adapter">prmpt adptr</abbr>
|
||||
- ✅
|
||||
@@ -367,7 +367,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
- [✗](https://github.com/vllm-project/vllm/issues/8475)
|
||||
- [✗](gh-issue:8475)
|
||||
- ✅
|
||||
* - [SD](#spec_decode)
|
||||
- ✅
|
||||
@@ -439,7 +439,7 @@ Check the '✗' with links to see tracking issue for unsupported feature/hardwar
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
- [✗](https://github.com/vllm-project/vllm/issues/8477)
|
||||
- [✗](gh-issue:8477)
|
||||
- ✅
|
||||
* - best-of
|
||||
- ✅
|
||||
|
||||
@@ -47,8 +47,7 @@ outputs = llm.generate(
|
||||
)
|
||||
```
|
||||
|
||||
Check out [examples/multilora_inference.py](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)
|
||||
for an example of how to use LoRA adapters with the async engine and how to use more advanced configuration options.
|
||||
Check out <gh-file:examples/multilora_inference.py> for an example of how to use LoRA adapters with the async engine and how to use more advanced configuration options.
|
||||
|
||||
## Serving LoRA Adapters
|
||||
|
||||
|
||||
@@ -5,7 +5,7 @@
|
||||
This page teaches you how to pass multi-modal inputs to [multi-modal models](#supported-mm-models) in vLLM.
|
||||
|
||||
```{note}
|
||||
We are actively iterating on multi-modal support. See [this RFC](https://github.com/vllm-project/vllm/issues/4194) for upcoming changes,
|
||||
We are actively iterating on multi-modal support. See [this RFC](gh-issue:4194) for upcoming changes,
|
||||
and [open an issue on GitHub](https://github.com/vllm-project/vllm/issues/new/choose) if you have any feedback or feature requests.
|
||||
```
|
||||
|
||||
@@ -60,7 +60,7 @@ for o in outputs:
|
||||
print(generated_text)
|
||||
```
|
||||
|
||||
A code example can be found in [examples/offline_inference_vision_language.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language.py).
|
||||
Full example: <gh-file:examples/offline_inference_vision_language.py>
|
||||
|
||||
To substitute multiple images inside the same text prompt, you can pass in a list of images instead:
|
||||
|
||||
@@ -91,7 +91,7 @@ for o in outputs:
|
||||
print(generated_text)
|
||||
```
|
||||
|
||||
A code example can be found in [examples/offline_inference_vision_language_multi_image.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language_multi_image.py).
|
||||
Full example: <gh-file:examples/offline_inference_vision_language_multi_image.py>
|
||||
|
||||
Multi-image input can be extended to perform video captioning. We show this with [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) as it supports videos:
|
||||
|
||||
@@ -125,13 +125,13 @@ for o in outputs:
|
||||
You can pass a list of NumPy arrays directly to the {code}`'video'` field of the multi-modal dictionary
|
||||
instead of using multi-image input.
|
||||
|
||||
Please refer to [examples/offline_inference_vision_language.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language.py) for more details.
|
||||
Full example: <gh-file:examples/offline_inference_vision_language.py>
|
||||
|
||||
### Audio
|
||||
|
||||
You can pass a tuple {code}`(array, sampling_rate)` to the {code}`'audio'` field of the multi-modal dictionary.
|
||||
|
||||
Please refer to [examples/offline_inference_audio_language.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_audio_language.py) for more details.
|
||||
Full example: <gh-file:examples/offline_inference_audio_language.py>
|
||||
|
||||
### Embedding
|
||||
|
||||
@@ -208,7 +208,7 @@ A chat template is **required** to use Chat Completions API.
|
||||
|
||||
Although most models come with a chat template, for others you have to define one yourself.
|
||||
The chat template can be inferred based on the documentation on the model's HuggingFace repo.
|
||||
For example, LLaVA-1.5 (`llava-hf/llava-1.5-7b-hf`) requires a chat template that can be found [here](https://github.com/vllm-project/vllm/blob/main/examples/template_llava.jinja).
|
||||
For example, LLaVA-1.5 (`llava-hf/llava-1.5-7b-hf`) requires a chat template that can be found here: <gh-file:examples/template_llava.jinja>
|
||||
```
|
||||
|
||||
### Image
|
||||
@@ -271,7 +271,7 @@ chat_response = client.chat.completions.create(
|
||||
print("Chat completion output:", chat_response.choices[0].message.content)
|
||||
```
|
||||
|
||||
A full code example can be found in [examples/openai_chat_completion_client_for_multimodal.py](https://github.com/vllm-project/vllm/blob/main/examples/openai_chat_completion_client_for_multimodal.py).
|
||||
Full example: <gh-file:examples/openai_chat_completion_client_for_multimodal.py>
|
||||
|
||||
```{tip}
|
||||
Loading from local file paths is also supported on vLLM: You can specify the allowed local media path via `--allowed-local-media-path` when launching the API server/engine,
|
||||
@@ -296,7 +296,7 @@ $ export VLLM_IMAGE_FETCH_TIMEOUT=<timeout>
|
||||
|
||||
Instead of {code}`image_url`, you can pass a video file via {code}`video_url`.
|
||||
|
||||
You can use [these tests](https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_video.py) as reference.
|
||||
You can use [these tests](gh-file:entrypoints/openai/test_video.py) as reference.
|
||||
|
||||
````{note}
|
||||
By default, the timeout for fetching videos through HTTP URL url is `30` seconds.
|
||||
@@ -399,7 +399,7 @@ result = chat_completion_from_url.choices[0].message.content
|
||||
print("Chat completion output from audio url:", result)
|
||||
```
|
||||
|
||||
A full code example can be found in [examples/openai_chat_completion_client_for_multimodal.py](https://github.com/vllm-project/vllm/blob/main/examples/openai_chat_completion_client_for_multimodal.py).
|
||||
Full example: <gh-file:examples/openai_chat_completion_client_for_multimodal.py>
|
||||
|
||||
````{note}
|
||||
By default, the timeout for fetching audios through HTTP URL is `10` seconds.
|
||||
@@ -435,7 +435,7 @@ Since VLM2Vec has the same model architecture as Phi-3.5-Vision, we have to expl
|
||||
to run this model in embedding mode instead of text generation mode.
|
||||
|
||||
The custom chat template is completely different from the original one for this model,
|
||||
and can be found [here](https://github.com/vllm-project/vllm/blob/main/examples/template_vlm2vec.jinja).
|
||||
and can be found here: <gh-file:examples/template_vlm2vec.jinja>
|
||||
```
|
||||
|
||||
Since the request schema is not defined by OpenAI client, we post a request to the server using the lower-level `requests` library:
|
||||
@@ -475,7 +475,7 @@ vllm serve MrLight/dse-qwen2-2b-mrl-v1 --task embed \
|
||||
Like with VLM2Vec, we have to explicitly pass `--task embed`.
|
||||
|
||||
Additionally, `MrLight/dse-qwen2-2b-mrl-v1` requires an EOS token for embeddings, which is handled
|
||||
by [this custom chat template](https://github.com/vllm-project/vllm/blob/main/examples/template_dse_qwen2_vl.jinja).
|
||||
by a custom chat template: <gh-file:examples/template_dse_qwen2_vl.jinja>
|
||||
```
|
||||
|
||||
```{important}
|
||||
@@ -483,4 +483,4 @@ Also important, `MrLight/dse-qwen2-2b-mrl-v1` requires a placeholder image of th
|
||||
example below for details.
|
||||
```
|
||||
|
||||
A full code example can be found in [examples/openai_chat_embedding_client_for_multimodal.py](https://github.com/vllm-project/vllm/blob/main/examples/openai_chat_embedding_client_for_multimodal.py).
|
||||
Full example: <gh-file:examples/openai_chat_embedding_client_for_multimodal.py>
|
||||
|
||||
@@ -4,8 +4,8 @@
|
||||
|
||||
```{warning}
|
||||
Please note that speculative decoding in vLLM is not yet optimized and does
|
||||
not usually yield inter-token latency reductions for all prompt datasets or sampling parameters. The work
|
||||
to optimize it is ongoing and can be followed in [this issue.](https://github.com/vllm-project/vllm/issues/4630)
|
||||
not usually yield inter-token latency reductions for all prompt datasets or sampling parameters.
|
||||
The work to optimize it is ongoing and can be followed here: <gh-issue:4630>
|
||||
```
|
||||
|
||||
```{warning}
|
||||
@@ -176,7 +176,7 @@ speculative decoding, breaking down the guarantees into three key areas:
|
||||
> distribution. [View Test Code](https://github.com/vllm-project/vllm/blob/47b65a550866c7ffbd076ecb74106714838ce7da/tests/samplers/test_rejection_sampler.py#L252)
|
||||
> - **Greedy Sampling Equality**: Confirms that greedy sampling with speculative decoding matches greedy sampling
|
||||
> without it. This verifies that vLLM's speculative decoding framework, when integrated with the vLLM forward pass and the vLLM rejection sampler,
|
||||
> provides a lossless guarantee. Almost all of the tests in [this directory](https://github.com/vllm-project/vllm/tree/b67ae00cdbbe1a58ffc8ff170f0c8d79044a684a/tests/spec_decode/e2e)
|
||||
> provides a lossless guarantee. Almost all of the tests in <gh-dir:tests/spec_decode/e2e>.
|
||||
> verify this property using [this assertion implementation](https://github.com/vllm-project/vllm/blob/b67ae00cdbbe1a58ffc8ff170f0c8d79044a684a/tests/spec_decode/e2e/conftest.py#L291)
|
||||
|
||||
3. **vLLM Logprob Stability**
|
||||
@@ -202,4 +202,4 @@ For mitigation strategies, please refer to the FAQ entry *Can the output of a pr
|
||||
- [A Hacker's Guide to Speculative Decoding in vLLM](https://www.youtube.com/watch?v=9wNAgpX6z_4)
|
||||
- [What is Lookahead Scheduling in vLLM?](https://docs.google.com/document/d/1Z9TvqzzBPnh5WHcRwjvK2UEeFeq5zMZb5mFE8jR0HCs/edit#heading=h.1fjfb0donq5a)
|
||||
- [Information on batch expansion](https://docs.google.com/document/d/1T-JaS2T1NRfdP51qzqpyakoCXxSXTtORppiwaj5asxA/edit#heading=h.kk7dq05lc6q8)
|
||||
- [Dynamic speculative decoding](https://github.com/vllm-project/vllm/issues/4565)
|
||||
- [Dynamic speculative decoding](gh-issue:4565)
|
||||
|
||||
@@ -131,7 +131,7 @@ completion = client.chat.completions.create(
|
||||
print(completion.choices[0].message.content)
|
||||
```
|
||||
|
||||
The complete code of the examples can be found on [examples/openai_chat_completion_structured_outputs.py](https://github.com/vllm-project/vllm/blob/main/examples/openai_chat_completion_structured_outputs.py).
|
||||
Full example: <gh-file:examples/openai_chat_completion_structured_outputs.py>
|
||||
|
||||
## Experimental Automatic Parsing (OpenAI API)
|
||||
|
||||
@@ -257,4 +257,4 @@ outputs = llm.generate(
|
||||
print(outputs[0].outputs[0].text)
|
||||
```
|
||||
|
||||
A complete example with all options can be found in [examples/offline_inference_structured_outputs.py](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_structured_outputs.py).
|
||||
Full example: <gh-file:examples/offline_inference_structured_outputs.py>
|
||||
|
||||
@@ -4,7 +4,7 @@ vLLM collects anonymous usage data by default to help the engineering team bette
|
||||
|
||||
## What data is collected?
|
||||
|
||||
You can see the up to date list of data collected by vLLM in the [usage_lib.py](https://github.com/vllm-project/vllm/blob/main/vllm/usage/usage_lib.py).
|
||||
The list of data collected by the latest version of vLLM can be found here: <gh-file:vllm/usage/usage_lib.py>
|
||||
|
||||
Here is an example as of v0.4.0:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user