Thanks to visit codestin.com
Credit goes to github.com

Skip to content

aws-samples/sample-vllm-on-eks-with-dlc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Deploy Large Language Models on Amazon EKS using vLLM Deep Learning Containers

In this tutorial, you will learn to deploy Large Language Models (LLMs) on Amazon Elastic Kubernetes Service (Amazon EKS) using vLLM Deep Learning Containers (DLCs) ! ๐ŸŽ‰๐Ÿค—๐Ÿš€โœจ

Organizations today face significant challenges when deploying LLMs efficiently at scale. These challenges include optimizing GPU resource utilization, managing network infrastructure, and providing efficient access to model weights. This tutorial addresses these challenges by leveraging AWS DLCs for vLLM, which provide pre-configured, optimized Docker environments that eliminate the complexity of building inference environments from scratch.

In this tutorial, you will build a scalable, high-performance inference system for serving models such as Qwen 2.5 0.5B Instruct using AWS-optimized containers and modern cloud-native technologies.

Quick Start

1. Setup (5 mins)

git clone https://github.com/aws-samples/sample-vllm-on-eks-with-dlc
cd sample-vllm-on-eks-with-dlc/bash
chmod +x config.sh && ./config.sh

2. Configure AWS Profile

Click to expand detailed IAM setup instructions

Navigate to IAM in the AWS console. In the left navigation panel, select Users and Create user.

Name it eks-admin-cli (or whatever you prefer), and click on Next twice, and then click on Create user.

In the left panel, select Policies and then Create policy. Click on the JSON tab.

Paste the following JSON policy, and then click Next.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "EKSAndInfra",
      "Effect": "Allow",
      "Action": [
        "eks:*",
        "ec2:*",
        "elasticloadbalancing:*",
        "fsx:*",
        "cloudformation:*"
      ],
      "Resource": "*"
    },
    {
      "Sid": "IAMManagement",
      "Effect": "Allow",
      "Action": [
        "iam:*"
      ],
      "Resource": "*"
    }
  ]
}

You can name it EKS-Infra-Admin-Policy or something similar. Click on Create policy.

Go back to Users and click on eks-admin-cli. In the Permissions tab, look for Add permissions, and then Attach policies directly. Search for EKS-Infra-Admin-Policy. Click on Next, and then on Add permissions.

Go back to the User, and navigate to the Security credentials tab. Click on Create access key.

Click on Command Line Interface (CLI), confirm that you want to use CLI access, and click on Next.

In your local machine or the EC2 instance, run the following script:

aws configure --profile vllm-profile

Warning
This workshop was designed to run in us-west-2. Please define your profile in that AWS region.

3. Deploy Infrastructure (15-20 mins)

# Deploy EKS cluster
chmod +x create_cluster.sh && ./create_cluster.sh

# Deploy GPU node group  
chmod +x create_node_group.sh && ./create_node_group.sh

# Setup high-performance storage
chmod +x storage.sh && ./storage.sh

4. Install Controllers (5 mins)

chmod +x controllers.sh && ./controllers.sh

5. Deploy vLLM Application (10-15 mins)

chmod +x application.sh && ./application.sh

Test Your Deployment

Replace <YOUR_ALB_ENDPOINT> with the endpoint from step 5:

curl -X POST http://<YOUR_ALB_ENDPOINT>/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
      "model": "Qwen/Qwen2.5-0.5B-Instruct",
      "messages": [{"role": "user", "content": "Hello, how are you?"}],
      "max_tokens": 100
  }'

Architecture Overview

  • EKS Cluster: Kubernetes foundation with GPU-optimized configuration
  • G5 Node Group: GPU instances for model serving
  • FSx for Lustre: High-performance storage for model weights
  • Application Load Balancer: External access with health checks
  • vLLM DLC: Pre-optimized container for efficient inference

TODO: Fill this README out!

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages