vllm/__init__.py

"""vLLM: a high-throughput and memory-efficient inference engine for LLMs"""

from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
from vllm.engine.async_llm_engine import AsyncLLMEngine
from vllm.engine.llm_engine import LLMEngine
from vllm.entrypoints.llm import LLM
from vllm.executor.ray_utils import initialize_ray_cluster
from vllm.model_executor.models import ModelRegistry
from vllm.outputs import CompletionOutput, RequestOutput
from vllm.sampling_params import SamplingParams

__version__ = "0.4.1"

__all__ = [
    "LLM",
    "ModelRegistry",
    "SamplingParams",
    "RequestOutput",
    "CompletionOutput",
    "LLMEngine",
    "EngineArgs",
    "AsyncLLMEngine",
    "AsyncEngineArgs",
    "initialize_ray_cluster",
]
[Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00			`"""vLLM: a high-throughput and memory-efficient inference engine for LLMs"""`

[FIX] Make `flash_attn` optional (#3269) 2024-03-08 10:52:20 -08:00			`from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs`
			`from vllm.engine.async_llm_engine import AsyncLLMEngine`
			`from vllm.engine.llm_engine import LLMEngine`
			`from vllm.entrypoints.llm import LLM`
[Core] Move ray_utils.py from `engine` to `executor` package (#4347) 2024-04-24 23:52:22 -07:00			`from vllm.executor.ray_utils import initialize_ray_cluster`
[Core] enable out-of-tree model register (#3871) 2024-04-06 17:11:41 -07:00			`from vllm.model_executor.models import ModelRegistry`
[FIX] Make `flash_attn` optional (#3269) 2024-03-08 10:52:20 -08:00			`from vllm.outputs import CompletionOutput, RequestOutput`
			`from vllm.sampling_params import SamplingParams`
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
Bump version of 0.4.1 (#4177) 2024-04-19 01:00:22 -07:00			`__version__ = "0.4.1"`
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
			`__all__ = [`
			`"LLM",`
[Core] enable out-of-tree model register (#3871) 2024-04-06 17:11:41 -07:00			`"ModelRegistry",`
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00			`"SamplingParams",`
			`"RequestOutput",`
			`"CompletionOutput",`
			`"LLMEngine",`
			`"EngineArgs",`
			`"AsyncLLMEngine",`
			`"AsyncEngineArgs",`
Add distributed model executor abstraction (#3191) 2024-03-11 11:03:45 -07:00			`"initialize_ray_cluster",`
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00			`]`