diff --git a/docs/deployment/k8s.md b/docs/deployment/k8s.md index 77a159009..3d613d00b 100644 --- a/docs/deployment/k8s.md +++ b/docs/deployment/k8s.md @@ -59,11 +59,15 @@ First, create a Kubernetes PVC and Secret for downloading and storing Hugging Fa Here, the `token` field stores your **Hugging Face access token**. For details on how to generate a token, see the [Hugging Face documentation](https://huggingface.co/docs/hub/en/security-tokens). -Next, start the vLLM server as a Kubernetes Deployment and Service: +Next, start the vLLM server as a Kubernetes Deployment and Service. + +Note that you will want to configure your vLLM image based on your processor arch: ??? console "Config" ```bash + VLLM_IMAGE=public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:latest # use this for x86_64 + VLLM_IMAGE=public.ecr.aws/q9t5s3a7/vllm-arm64-cpu-release-repo:latest # use this for arm64 cat <" \ + public.ecr.aws/q9t5s3a7/vllm-arm64-cpu-release-repo: +``` + You can also access the latest code with Docker images. These are not intended for production use and are meant for CI and testing only. They will expire after several days. The latest code can contain bugs and may not be stable. Please use it with caution. diff --git a/docs/getting_started/installation/cpu.x86.inc.md b/docs/getting_started/installation/cpu.x86.inc.md index 5887b779a..f31ae8e0e 100644 --- a/docs/getting_started/installation/cpu.x86.inc.md +++ b/docs/getting_started/installation/cpu.x86.inc.md @@ -161,7 +161,23 @@ uv pip install dist/*.whl # --8<-- [end:build-wheel-from-source] # --8<-- [start:pre-built-images] -[https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo](https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo) +You can pull the latest available CPU image here via: + +```bash +docker pull public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:latest +``` + +If you want a more specific build you can find all published CPU based images here: [https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo](https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo) + +You can run these images via: + +```bash +docker run \ + -v ~/.cache/huggingface:/root/.cache/huggingface \ + -p 8000:8000 \ + --env "HF_TOKEN=" \ + public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo: +``` !!! warning If deploying the pre-built images on machines without `avx512f`, `avx512_bf16`, or `avx512_vnni` support, an `Illegal instruction` error may be raised. See the build-image-from-source section below for build arguments to match your target CPU capabilities.