Benchmark_Model_inference

Image Classification Models Supported by TensorRT Server (Triton Server)

The purpose of this project is to use the Triton server to be able to host model inferencing to test the run time of multiple batches and models.

This repo is to be run on AWS services: EC2 and S3 but can be modified to include Lambda and EFS

##Creating the Model Repository The first thing is create the model repository that stores all the models that we will use. Here the mode_repository directory will be the models that we will use. model_repository_main holds possible models that can be used but for this project we won't be using all of them. The structure of the model_repository directory is important because that is how the Triton Server will be able to read the models. The structure of the model repository can be read here.

###Models Supported by Triton

resnet10/resnet18/resnet34/resnet50/resnet101
vgg16/vgg19
googlenet
mobilenet_v1/mobilenet_v2
squeezenet
darknet19/darknet53
efficientnet_b0
cspdarknet19/cspdarknet53

Note that some of the models may require a GPU and preset the max batch size.

We will be hosting our models in an S3 bucket. This is because the models may be large and it also isolates them from the rest of the code. Here we will be hosting our models on s3://modelsyppatel

##Running the Triton Server Due to the lack of resources our Triton Server will need to be hosted without a GPU.

We first need to create an EC2 instance that will host the Triton Server. Ideally, we use the EC2 marketplace and get the NVIDIA Triton Server AMI which contains a p3.2large ec2 instance with the following specifications:

One NVIDIA Tesla V100 GPU
High frequency Intel Xeon E5-2686 v4 processor
61 GiB memory
16 GiB GPU memory

However, we can also host it without a GPU with an EC2 instance with these specifications

t2.2xlarge instance type
32 GiB Memory
IAM policy with S3 Full Access
Key-Value Pair <-- necessary for file transfer

Now once we have our EC2 instance up and running we will proceed to pull Triton Inference server container in our instance using docker with the following command.

###Install docker

sudo yum install docker

sudo systemctl start docker

###Pull the Triton Server Docker Image

sudo docker pull nvcr.io/nvidia/tritonserver:22.03-py3

###Run the Docker Image without a GPU

sudo docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 nvcr.io/nvidia/tritonserver:22.03-py3 tritonserver --model-repository=s3://modelsyppatel/model_repository

The Triton Server should be running at this point but you can check on running this command for status. Run this line on the same EC2 instance, may need a nsew session.

curl -v localhost:8000/v2/health/ready

##Running the Triton Client

Now we need to run a client image to do the inferencing. We need this since it has the libraries that is needed for our script.

###Get Docker image

docker pull nvcr.io/nvidia/tritonserver:22.03-py3-sdk

###Run the docker

docker pull nvcr.io/nvidia/tritonserver:22.03-py3-sdk

Test to see if the inferencing works

/workspace/install/bin/image_client -m inception_graphdef -c 3 -s INCEPTION /workspace/images/mug.jpg

###Install some python libraries

pip3 install boto3
pip3 install pandas

###Moving the benchmarking file Copy our benchmark.py file to the EC2 instance. We used Filezilla to do so.

Copy our benchmark.py file to /workspace/install/bin. Can be done by

sudo docker container ls

Get the container ID. Note that the ID will only show if the container is running

sudo docker cp benchmark.py <container-id>:/workspace/install/bin

###Running the benchmark file

python3 benchmark.py -m <model_name> -c 3 -s INCEPTION -b <batch_size> image-for-benchmark

-m is the model name that is being hosted by the Triton Server -c number of outputs of a single image request -s Type of scaling to apply to image pixels. -b number of images in one batch. Default is 1. image-for-benchmark File name that will be used for the S3 bucket where we host the images

##Output After running benchmark.py. Two csv files will be created with the file name being <model_name>_<batch_size>Throughput.csv and <model_name><batch_size>_Time.csv The Time csv has the elapsed time after every batch has completed. The rows are the number attempts or number of times we went through the entire dataset. The Throughout csv is the elapsed time per batch / batch size. It is the rate of time it takes to inference one image. The files would be moved to the Results directory

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
Benchmark		Benchmark
Graph Results		Graph Results
Results		Results
model_repository		model_repository
model_repository_main		model_repository_main
README.md		README.md
fetch_models.sh		fetch_models.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmark_Model_inference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Benchmark_Model_inference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages