Update bnb.md with example for OpenAI (#11718)

2025-01-04 00:29:02 -06:00
parent 9c93636d84
commit d1d49397e7
1 changed files with 7 additions and 0 deletions
--- a/docs/source/quantization/bnb.md
+++ b/docs/source/quantization/bnb.md
@@ -37,3 +37,10 @@ model_id = "huggyllama/llama-7b"
 llm = LLM(model=model_id, dtype=torch.bfloat16, trust_remote_code=True, \
 quantization="bitsandbytes", load_format="bitsandbytes")
 ```
+## OpenAI Compatible Server
+
+Append the following to your 4bit model arguments:
+
+```
+--quantization bitsandbytes --load-format bitsandbytes
+```