[Frontend] Exclude anthropic billing header to avoid prefix cache miss (#36829)
Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This commit is contained in:
@@ -60,6 +60,9 @@ The environment variables:
|
||||
!!! tip
|
||||
You can add these environment variables to your shell profile (e.g., `.bashrc`, `.zshrc`), Claude Code configuration file (`~/.claude/settings.json`), or create a wrapper script for convenience.
|
||||
|
||||
!!! warning
|
||||
Claude Code recently started injecting a per-request hash in the system prompt, which can defeat [prefix caching](../../design/prefix_caching.md) because the prompt changes on every request, causing greatly reduced performance. This is addressed automatically in vLLM versions > 0.17.1 but for older versions `"CLAUDE_CODE_ATTRIBUTION_HEADER": "0"` should be added to the `"env"` section of `~/.claude/settings.json` (see this [blog post](https://unsloth.ai/docs/basics/claude-code#fixing-90-slower-inference-in-claude-code) from Unsloth).
|
||||
|
||||
## Testing the Setup
|
||||
|
||||
Once Claude Code launches, try a simple prompt to verify the connection:
|
||||
|
||||
Reference in New Issue
Block a user