Codestin Search App

Deploy Large Language Models on Amazon EKS using vLLM Deep Learning Containers

In this tutorial, you will learn to deploy Large Language Models (LLMs) on Amazon Elastic Kubernetes Service (Amazon EKS) using vLLM Deep Learning Containers (DLCs) ! 🎉🤗🚀✨

Organizations today face significant challenges when deploying LLMs efficiently at scale. These challenges include optimizing GPU resource utilization, managing network infrastructure, and providing efficient access to model weights. This tutorial addresses these challenges by leveraging AWS DLCs for vLLM, which provide pre-configured, optimized Docker environments that eliminate the complexity of building inference environments from scratch.

In this tutorial, you will build a scalable, high-performance inference system for serving models such as Qwen 2.5 0.5B Instruct using AWS-optimized containers and modern cloud-native technologies.

Quick Start

1. Setup (5 mins)

git clone https://github.com/aws-samples/sample-vllm-on-eks-with-dlc
cd sample-vllm-on-eks-with-dlc/bash
chmod +x config.sh && ./config.sh

2. Configure AWS Profile

Click to expand detailed IAM setup instructions

Navigate to IAM in the AWS console. In the left navigation panel, select Users and Create user.

Name it eks-admin-cli (or whatever you prefer), and click on Next twice, and then click on Create user.

In the left panel, select Policies and then Create policy. Click on the JSON tab.

Paste the following JSON policy, and then click Next.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "EKSAndInfra",
      "Effect": "Allow",
      "Action": [
        "eks:*",
        "ec2:*",
        "elasticloadbalancing:*",
        "fsx:*",
        "cloudformation:*"
      ],
      "Resource": "*"
    },
    {
      "Sid": "IAMManagement",
      "Effect": "Allow",
      "Action": [
        "iam:*"
      ],
      "Resource": "*"
    }
  ]
}

You can name it EKS-Infra-Admin-Policy or something similar. Click on Create policy.

Go back to Users and click on eks-admin-cli. In the Permissions tab, look for Add permissions, and then Attach policies directly. Search for EKS-Infra-Admin-Policy. Click on Next, and then on Add permissions.

Go back to the User, and navigate to the Security credentials tab. Click on Create access key.

Click on Command Line Interface (CLI), confirm that you want to use CLI access, and click on Next.

In your local machine or the EC2 instance, run the following script:

aws configure --profile vllm-profile

Warning
This workshop was designed to run in us-west-2. Please define your profile in that AWS region.

3. Deploy Infrastructure (15-20 mins)

# Deploy EKS cluster
chmod +x create_cluster.sh && ./create_cluster.sh

# Deploy GPU node group  
chmod +x create_node_group.sh && ./create_node_group.sh

# Setup high-performance storage
chmod +x storage.sh && ./storage.sh

4. Install Controllers (5 mins)

chmod +x controllers.sh && ./controllers.sh

5. Deploy vLLM Application (10-15 mins)

chmod +x application.sh && ./application.sh

Test Your Deployment

Replace <YOUR_ALB_ENDPOINT> with the endpoint from step 5:

curl -X POST http://<YOUR_ALB_ENDPOINT>/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
      "model": "Qwen/Qwen2.5-0.5B-Instruct",
      "messages": [{"role": "user", "content": "Hello, how are you?"}],
      "max_tokens": 100
  }'

Architecture Overview

EKS Cluster: Kubernetes foundation with GPU-optimized configuration
G5 Node Group: GPU instances for model serving
FSx for Lustre: High-performance storage for model weights
Application Load Balancer: External access with health checks
vLLM DLC: Pre-optimized container for efficient inference

TODO: Fill this README out!

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
bash		bash
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deploy Large Language Models on Amazon EKS using vLLM Deep Learning Containers

Quick Start

1. Setup (5 mins)

2. Configure AWS Profile

3. Deploy Infrastructure (15-20 mins)

4. Install Controllers (5 mins)

5. Deploy vLLM Application (10-15 mins)

Test Your Deployment

Architecture Overview

Security

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deploy Large Language Models on Amazon EKS using vLLM Deep Learning Containers

Quick Start

1. Setup (5 mins)

2. Configure AWS Profile

3. Deploy Infrastructure (15-20 mins)

4. Install Controllers (5 mins)

5. Deploy vLLM Application (10-15 mins)

Test Your Deployment

Architecture Overview

Security

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages