feat(multimodal): Add customizable background color for RGBA to RGB conversion (#22052)

Signed-off-by: Jinheng Li <ahengljh@gmail.com>
Co-authored-by: Jinheng Li <ahengljh@gmail.com>
This commit is contained in:
Gamhang
2025-08-01 21:07:33 +08:00
committed by GitHub
parent f81c1bb055
commit 0a6d305e0f
3 changed files with 190 additions and 6 deletions

View File

@@ -172,6 +172,36 @@ Multi-image input can be extended to perform video captioning. We show this with
print(generated_text)
```
#### Custom RGBA Background Color
When loading RGBA images (images with transparency), vLLM converts them to RGB format. By default, transparent pixels are replaced with white background. You can customize this background color using the `rgba_background_color` parameter in `media_io_kwargs`.
??? code
```python
from vllm import LLM
# Default white background (no configuration needed)
llm = LLM(model="llava-hf/llava-1.5-7b-hf")
# Custom black background for dark theme
llm = LLM(
model="llava-hf/llava-1.5-7b-hf",
media_io_kwargs={"image": {"rgba_background_color": [0, 0, 0]}}
)
# Custom brand color background (e.g., blue)
llm = LLM(
model="llava-hf/llava-1.5-7b-hf",
media_io_kwargs={"image": {"rgba_background_color": [0, 0, 255]}}
)
```
!!! note
- The `rgba_background_color` accepts RGB values as a list `[R, G, B]` or tuple `(R, G, B)` where each value is 0-255
- This setting only affects RGBA images with transparency; RGB images are unchanged
- If not specified, the default white background `(255, 255, 255)` is used for backward compatibility
### Video Inputs
You can pass a list of NumPy arrays directly to the `'video'` field of the multi-modal dictionary
@@ -478,6 +508,20 @@ Full example: <gh-file:examples/online_serving/openai_chat_completion_client_for
export VLLM_VIDEO_FETCH_TIMEOUT=<timeout>
```
#### Custom RGBA Background Color
To use a custom background color for RGBA images, pass the `rgba_background_color` parameter via `--media-io-kwargs`:
```bash
# Example: Black background for dark theme
vllm serve llava-hf/llava-1.5-7b-hf \
--media-io-kwargs '{"image": {"rgba_background_color": [0, 0, 0]}}'
# Example: Custom gray background
vllm serve llava-hf/llava-1.5-7b-hf \
--media-io-kwargs '{"image": {"rgba_background_color": [128, 128, 128]}}'
```
### Audio Inputs
Audio input is supported according to [OpenAI Audio API](https://platform.openai.com/docs/guides/audio?audio-generation-quickstart-example=audio-in).