In this tutorial, you will learn to deploy Large Language Models (LLMs) on Amazon Elastic Kubernetes Service (Amazon EKS) using vLLM Deep Learning Containers (DLCs) ! ๐๐ค๐โจ
Organizations today face significant challenges when deploying LLMs efficiently at scale. These challenges include optimizing GPU resource utilization, managing network infrastructure, and providing efficient access to model weights. This tutorial addresses these challenges by leveraging AWS DLCs for vLLM, which provide pre-configured, optimized Docker environments that eliminate the complexity of building inference environments from scratch.
In this tutorial, you will build a scalable, high-performance inference system for serving models such as Qwen 2.5 0.5B Instruct using AWS-optimized containers and modern cloud-native technologies.
git clone https://github.com/aws-samples/sample-vllm-on-eks-with-dlc
cd sample-vllm-on-eks-with-dlc/bash
chmod +x config.sh && ./config.shClick to expand detailed IAM setup instructions
Navigate to IAM in the AWS console. In the left navigation panel, select Users and Create user.
Name it eks-admin-cli (or whatever you prefer), and click on Next twice, and then click on Create user.
In the left panel, select Policies and then Create policy. Click on the JSON tab.
Paste the following JSON policy, and then click Next.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "EKSAndInfra",
"Effect": "Allow",
"Action": [
"eks:*",
"ec2:*",
"elasticloadbalancing:*",
"fsx:*",
"cloudformation:*"
],
"Resource": "*"
},
{
"Sid": "IAMManagement",
"Effect": "Allow",
"Action": [
"iam:*"
],
"Resource": "*"
}
]
}You can name it EKS-Infra-Admin-Policy or something similar. Click on Create policy.
Go back to Users and click on eks-admin-cli. In the Permissions tab, look for Add permissions, and then Attach policies directly. Search for EKS-Infra-Admin-Policy. Click on Next, and then on Add permissions.
Go back to the User, and navigate to the Security credentials tab. Click on Create access key.
Click on Command Line Interface (CLI), confirm that you want to use CLI access, and click on Next.
In your local machine or the EC2 instance, run the following script:
aws configure --profile vllm-profileWarning
This workshop was designed to run in us-west-2. Please define your profile in that AWS region.
# Deploy EKS cluster
chmod +x create_cluster.sh && ./create_cluster.sh
# Deploy GPU node group
chmod +x create_node_group.sh && ./create_node_group.sh
# Setup high-performance storage
chmod +x storage.sh && ./storage.shchmod +x controllers.sh && ./controllers.shchmod +x application.sh && ./application.shReplace <YOUR_ALB_ENDPOINT> with the endpoint from step 5:
curl -X POST http://<YOUR_ALB_ENDPOINT>/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-0.5B-Instruct",
"messages": [{"role": "user", "content": "Hello, how are you?"}],
"max_tokens": 100
}'- EKS Cluster: Kubernetes foundation with GPU-optimized configuration
- G5 Node Group: GPU instances for model serving
- FSx for Lustre: High-performance storage for model weights
- Application Load Balancer: External access with health checks
- vLLM DLC: Pre-optimized container for efficient inference
TODO: Fill this README out!
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.