[Perf][Frontend] eliminate api_key and x_request_id headers middleware overhead (#19946)

Signed-off-by: Yazan-Sharaya <yazan.sharaya.yes@gmail.com>
2025-06-27 07:44:14 +03:00
parent cd4cfee689
commit 6e244ae091
4 changed files with 190 additions and 33 deletions
--- a/docs/serving/openai_compatible_server.md
+++ b/docs/serving/openai_compatible_server.md
@@ -146,11 +146,6 @@ completion = client.chat.completions.create(
 Only `X-Request-Id` HTTP request header is supported for now. It can be enabled
 with `--enable-request-id-headers`.

-> Note that enablement of the headers can impact performance significantly at high QPS
-> rates. We recommend implementing HTTP headers at the router level (e.g. via Istio),
-> rather than within the vLLM layer for this reason.
-> See [this PR](https://github.com/vllm-project/vllm/pull/11529) for more details.
-
 ??? Code

    ```python