Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Using vLLM
First, vLLM must be installed for your chosen device in either a Python or Docker environment.
Then, vLLM supports the following usage patterns:
- Inference and Serving: Run a single instance of a model.
- Deployment: Scale up model instances for production.
- Training: Train or fine-tune a model.