[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update tensorizer to version 2.9.0 (#4208)

This commit is contained in:
Sanger Steel
2024-05-13 17:57:07 -04:00
committed by GitHub
parent 0fca3cdcf2
commit 8bc68e198c
10 changed files with 259 additions and 523 deletions

View File

@@ -167,8 +167,8 @@ class EngineArgs:
'* "dummy" will initialize the weights with random values, '
'which is mainly for profiling.\n'
'* "tensorizer" will load the weights using tensorizer from '
'CoreWeave which assumes tensorizer_uri is set to the location of '
'the serialized weights.')
'CoreWeave. See the Tensorize vLLM Model script in the Examples'
'section for more information.\n')
parser.add_argument(
'--dtype',
type=str,