2025-07-08 11:27:40 +01:00
# BitBLAS
2025-04-22 16:01:36 +08:00
vLLM now supports [BitBLAS ](https://github.com/microsoft/BitBLAS ) for more efficient and flexible model inference. Compared to other quantization frameworks, BitBLAS provides more precision combinations.
2025-05-23 11:09:53 +02:00
!!! note
Ensure your hardware supports the selected `dtype` (`torch.bfloat16` or `torch.float16` ).
Most recent NVIDIA GPUs support `float16` , while `bfloat16` is more common on newer architectures like Ampere or Hopper.
2025-08-26 23:26:46 +01:00
For details see [supported hardware ](README.md#supported-hardware ).
2025-04-23 22:32:16 +08:00
2025-04-22 16:01:36 +08:00
Below are the steps to utilize BitBLAS with vLLM.
2025-06-23 18:59:09 +01:00
```bash
2025-04-22 16:01:36 +08:00
pip install bitblas>=0.1.0
```
vLLM reads the model's config file and supports pre-quantized checkpoints.
You can find pre-quantized models on:
2025-04-26 00:40:54 +08:00
- [Hugging Face (BitBLAS) ](https://huggingface.co/models?search=bitblas )
- [Hugging Face (GPTQ) ](https://huggingface.co/models?search=gptq )
2025-04-22 16:01:36 +08:00
Usually, these repositories have a `quantize_config.json` file that includes a `quantization_config` section.
## Read bitblas format checkpoint
```python
from vllm import LLM
import torch
# "hxbgsyxh/llama-13b-4bit-g-1-bitblas" is a pre-quantized checkpoint.
model_id = "hxbgsyxh/llama-13b-4bit-g-1-bitblas"
2025-05-25 16:40:31 +08:00
llm = LLM(
model=model_id,
dtype=torch.bfloat16,
trust_remote_code=True,
2025-10-15 16:25:49 +08:00
quantization="bitblas",
2025-05-25 16:40:31 +08:00
)
2025-04-22 16:01:36 +08:00
```
## Read gptq format checkpoint
2025-07-08 03:55:28 +01:00
??? code
2025-06-23 13:24:23 +08:00
```python
from vllm import LLM
import torch
# "hxbgsyxh/llama-13b-4bit-g-1" is a pre-quantized checkpoint.
model_id = "hxbgsyxh/llama-13b-4bit-g-1"
llm = LLM(
model=model_id,
dtype=torch.float16,
trust_remote_code=True,
quantization="bitblas",
2025-10-15 16:25:49 +08:00
max_model_len=1024,
2025-06-23 13:24:23 +08:00
)
```