[Frontend] Exclude anthropic billing header to avoid prefix cache miss (#36829)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This commit is contained in:
Nick Hill
2026-03-11 18:20:34 -07:00
committed by GitHub
parent c34ba6b961
commit 262b76a09f
3 changed files with 56 additions and 0 deletions

View File

@@ -60,6 +60,9 @@ The environment variables:
!!! tip
You can add these environment variables to your shell profile (e.g., `.bashrc`, `.zshrc`), Claude Code configuration file (`~/.claude/settings.json`), or create a wrapper script for convenience.
!!! warning
Claude Code recently started injecting a per-request hash in the system prompt, which can defeat [prefix caching](../../design/prefix_caching.md) because the prompt changes on every request, causing greatly reduced performance. This is addressed automatically in vLLM versions > 0.17.1 but for older versions `"CLAUDE_CODE_ATTRIBUTION_HEADER": "0"` should be added to the `"env"` section of `~/.claude/settings.json` (see this [blog post](https://unsloth.ai/docs/basics/claude-code#fixing-90-slower-inference-in-claude-code) from Unsloth).
## Testing the Setup
Once Claude Code launches, try a simple prompt to verify the connection: