[Frontend] [Core] feat: Add model loading using tensorizer (#3476)

This commit is contained in:
Sanger Steel
2024-04-13 20:13:01 -04:00
committed by GitHub
parent 989ae2538d
commit 711a000255
20 changed files with 1351 additions and 51 deletions

View File

@@ -36,7 +36,7 @@ Below, you can find an explanation of every engine argument for vLLM:
Directory to download and load the weights, default to the default cache dir of huggingface.
.. option:: --load-format {auto,pt,safetensors,npcache,dummy}
.. option:: --load-format {auto,pt,safetensors,npcache,dummy,tensorizer}
The format of the model weights to load.
@@ -45,6 +45,7 @@ Below, you can find an explanation of every engine argument for vLLM:
* "safetensors" will load the weights in the safetensors format.
* "npcache" will load the weights in pytorch format and store a numpy cache to speed up the loading.
* "dummy" will initialize the weights with random values, mainly for profiling.
* "tensorizer" will load serialized weights using `CoreWeave's Tensorizer model deserializer. <https://github.com/coreweave/tensorizer>`_. See `tensorized_vllm_model.py` in the examples folder to serialize a vLLM model, and for more information. Tensorizer support for vLLM can be installed with `pip install vllm[tensorizer]`.
.. option:: --dtype {auto,half,float16,bfloat16,float,float32}