diff --git a/docs/usage/security.md b/docs/usage/security.md index bb920ff43..9efb8b022 100644 --- a/docs/usage/security.md +++ b/docs/usage/security.md @@ -219,6 +219,47 @@ The most effective approach is to deploy vLLM behind a reverse proxy (such as ng - Blocks all other endpoints, including the unauthenticated inference and operational control endpoints - Implements additional authentication, rate limiting, and logging at the proxy layer +## Tool Server and MCP Security + +vLLM supports connecting to external tool servers via the `--tool-server` argument. This enables models to call tools through the Responses API (`/v1/responses`). Tool server support works with all models — it is not limited to specific model architectures. + +**Important:** No tool servers are enabled by default. They must be explicitly opted into via configuration. + +### Built-in Demo Tools (GPT-OSS) + +Passing `--tool-server demo` enables built-in demo tools that work with any model that supports tool calling. The tool implementations are not part of vLLM — they are provided by the separately installed [`gpt-oss`](https://github.com/openai/gpt-oss) package. vLLM provides thin wrappers that delegate to `gpt-oss`. + +- **Code interpreter** (`python`): Python execution via Docker (via `gpt_oss.tools.python_docker`) +- **Web browser** (`browser`): Search via Exa API, requires `EXA_API_KEY` (via `gpt_oss.tools.simple_browser`) + +#### Code Interpreter (Python Tool) Security Risks + +The code interpreter executes model-generated code inside a Docker container. However, the container is **not configured with network isolation by default**. It inherits the host's Docker networking configuration (e.g., default bridge network or `--network=host`), which means: + +- The container may be able to access the host network and LAN. +- Internal services reachable from the container may be exploited via SSRF (Server-Side Request Forgery). +- Cloud metadata services (e.g., `169.254.169.254`) may be accessible. +- If vulnerable internal services (such as `torch.distributed` endpoints) are reachable from the container, this could be used to attack them. + +This is particularly concerning because the code being executed is generated by the model, which may be influenced by adversarial inputs (prompt injection). + +#### Controlling Built-in Tool Availability + +Built-in demo tools are controlled by two settings: + +1. **`--tool-server demo`**: Enables the built-in demo tools (browser and Python code interpreter). + +2. **`VLLM_GPT_OSS_SYSTEM_TOOL_MCP_LABELS`**: When built-in tools are requested via the `mcp` tool type in the Responses API, this comma-separated allowlist controls which tool labels are permitted. Valid values are: + - `container` - Container tool + - `code_interpreter` - Python code execution tool + - `web_search_preview` - Web search/browser tool + + If this variable is not set or is empty, no built-in tools requested via MCP tool type will be enabled. + +To disable the Python code interpreter specifically, omit `code_interpreter` from `VLLM_GPT_OSS_SYSTEM_TOOL_MCP_LABELS`. + +**Consider a custom implementation**: The GPT-OSS Python tool is a reference implementation. For production deployments, consider implementing a custom code execution sandbox with stricter isolation guarantees. See the [GPT-OSS documentation](https://github.com/openai/gpt-oss?tab=readme-ov-file#python) for guidance. + ## Reporting Security Vulnerabilities If you believe you have found a security vulnerability in vLLM, please report it following the project's security policy. For more information on how to report security issues and the project's security policy, please see the [vLLM Security Policy](https://github.com/vllm-project/vllm/blob/main/SECURITY.md). diff --git a/vllm/entrypoints/openai/cli_args.py b/vllm/entrypoints/openai/cli_args.py index d3a66c183..fa95e8984 100644 --- a/vllm/entrypoints/openai/cli_args.py +++ b/vllm/entrypoints/openai/cli_args.py @@ -125,8 +125,11 @@ class BaseFrontendArgs: `--tool-call-parser`.""" tool_server: str | None = None """Comma-separated list of host:port pairs (IPv4, IPv6, or hostname). - Examples: 127.0.0.1:8000, [::1]:8000, localhost:1234. Or `demo` for demo - purpose.""" + Examples: 127.0.0.1:8000, [::1]:8000, localhost:1234. Or `demo` for + built-in demo tools (browser and Python code interpreter). WARNING: + The `demo` Python tool executes model-generated code in Docker without + network isolation by default. See the security guide for more + information.""" log_config_file: str | None = envs.VLLM_LOGGING_CONFIG_PATH """Path to logging config JSON file for both vllm and uvicorn""" max_log_len: int | None = None