0% found this document useful (0 votes)

12 views25 pages

04 IntroductionGPUsCUDA

Uploaded by

chirag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views25 pages

04 IntroductionGPUsCUDA

Uploaded by

chirag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

CS516: Parallelization of Programs

Introduction GPUs and CUDA Programming

Vishwesh Jatala
Assistant Professor
Department of CSE
Indian Institute of Technology Bhilai
[email protected]

2023-24 W
1
Course Outline

■ Introduction
■ Overview of Parallel Architectures
■ Performance
■ Parallel Programming
❑ GPUs and CUDA programming
❑ CUDA thread organization
❑ Instruction execution
❑ GPU memories
❑ Synchronization
❑ Unified memory
■ Case studies
■ Extracting Parallelism from Sequential Programs
Automatically

2
Outline

■ GPUs and CUDA Programming Demos

3
Motivation

■ For many decades, the single core processors

were popular
❑ Instruction-level parallelism
❑ Core clock frequency
❑ Moore’s law
■ Mid-to late-1990s - power wall
❑ Power constraints
❑ Heat dissipation
■ Multicore processors, accelerators, such as
GPUs.

4
Why GPUs?

■ Multicore processors
❑ Task level parallelism
❑ Graphics rendering is
computationally
expensive
❑ Not efficient for
graphics applications

Images Source: Internet 5

Graphics Processing Units

■ The early GPU designs

❑ Specialized for graphics
processing only
❑ Exhibit SIMD execution
❑ Less programmable
NVIDIA GeForce 256
■ In 2007, fully
programmable GPUs
❑ CUDA released

Images Source: Internet 6

GPU Architecture

7
GPU Architecture

8
Parallelizing Programs on GPUs

9
Programming Models

■ CUDA (Compute Unified Device Architecture)

❑ Supports NVIDIA GPUs
❑ Extension of C programming language
❑ Popular in academia

■ OpenCL (Open Computing Language)

❑ Open source
❑ Supports various GPU devices

10
Introduction to CUDA Programming

GPU (Device)

(2) Kernel SM SM SM

Device Memory

(1) CPU to GPU (3) GPU to CPU

Data transfer Data transfer

Memory

CPU (Host)

11
Hello World

#include <stdio.h>
int main() {
printf("Hello World.\n"); Compile: gcc hello.c
Run: ./a.out
return 0; Hello World.
}

12
Hello World in GPU

#include <stdio.h>
#include <cuda.h>
__global__ void dkernel() {
printf(“Hello World.\n”); Compile: nvcc hello.cu
Run: ./a.out
}
Hello World.
int main() {
dkernel<<<1, 1>>>();
cudaDeviceSynchronize();
return 0;
}

13
Hello World in GPU

#include <stdio.h>
#include <cuda.h>
__global__ void dkernel() {
printf(“Hello World.\n”); Compile: nvcc hello.cu
} Run: ./a.out
No output
int main() {
dkernel<<<1, 1>>>();
return 0;
GPU Kernel launch is asynchronous!
}

14
Hello World in GPU

15
Hello World in Parallel in GPU

#include <stdio.h>
#include <cuda.h>
__global__ void dkernel() {
printf(“Hello World.\n”); Compile: nvcc hello.cu
} Run: ./a.out
Hello World.
int main() { Hello World.
32 times
dkernel<<<1, 32>>>(); …………….
Hello World.
cudaDeviceSynchronize();
return 0;
}

16
Example-1

#include <stdio.h>
#define N 100
int main() {
int i;
for (i = 0; i < N; ++i)
printf("%d\n", i * i);
return 0;
}

17
Example-1

#include <stdio.h>
#include <stdio.h> #include <cuda.h>
#define N 100 #define N 100
int main() { __global__ void fun() {
int i; printf("%d\n", threadIdx.x*threadIdx.x);
for (i = 0; i < N; ++i) }

printf("%d\n", i * i); int main() {

return 0; fun<<<1, N>>>();

} cudaDeviceSynchronize();
return 0;
}

18
GPU Hello World with a Global

19
Separate Memories

DRAM DRAM

PCI Express
Bus
CPU GPU

■ CPU and its associated (discrete) GPUs have separate

physical memory (RAM).
■ A variable in CPU memory cannot be accessed directly in
a GPU kernel.
■ A programmer needs to maintain copies of variables.

■ It is programmer's responsibility to keep them in sync.

20
CUDA Programs with Data Transfers

GPU (Device)

(2) Kernel SM SM SM

Device Memory

(1) CPU to GPU (3) GPU to CPU

Data transfer Data transfer

Memory

CPU (Host)

21
Data Transfer

■ Copy data from CPU to GPU

cudaMemcpy(gpulocation, cpulocation, size,
cudaMemcpyHostToDevice);
■ Copy data from CPU to GPU
cudaMemcpy(cpulocation, ppulocation, size,
cudaMemcpyDeviceToHost);

This means we need two copies of the same variable –

one on CPU another on GPU.
e.g., int *cpuarr, *gpuarr;

22
CPU-GPU Communication
#include <stdio.h>
#include <cuda.h>
__global__ void dkernel(char *arr, int arrlen) {
unsigned id = threadIdx.x;
if (id < arrlen) {
++arr[id];
}
}

int main() {
char cpuarr[] = "CS516", *gpuarr;
cudaMalloc(&gpuarr, sizeof(char) * (1 + strlen(cpuarr)));
cudaMemcpy(gpuarr, cpuarr, sizeof(char) * (1 + strlen(cpuarr)), cudaMemcpyHostToDevice);
dkernel<<<1, 32>>>(gpuarr, strlen(cpuarr));
cudaDeviceSynchronize(); // unnecessary.
cudaMemcpy(cpuarr, gpuarr, sizeof(char) * (1 + strlen(cpuarr)), cudaMemcpyDeviceToHost);
printf(cpuarr);
return 0;
}

23
Example

#include <stdio.h>
#include <stdio.h> #include <cuda.h>
#define N 100 #define N 100
__global__ void fun(int *a) {
int main() { a[threadIdx.x] = threadIdx.x * threadIdx.x;
}
int a[N], i;
int main() {
int a[N], *da;
for (i = 0; i < N; ++i)
int i;
a[i] = i * i;
cudaMalloc(&da, N * sizeof(int));
return 0; fun<<<1, N>>>(da);
cudaMemcpy(a, da, N * sizeof(int),
}
cudaMemcpyDeviceToHost);
Takeaway for (i = 0; i < N; ++i)
printf("%d\n", a[i]);
return 0;
}

24
References

■ CS6023 GPU Programming

❑ https://www.cse.iitm.ac.in/~rupesh/teaching/gpu/jan
20/
■ Miscellaneous resources from internet
■ https://developer.nvidia.com/blog/cuda-refresh
er-cuda-programming-model/

Cuda C
No ratings yet
Cuda C
70 pages
Input/Output Organization in Computer Organisation and Architecture
100% (1)
Input/Output Organization in Computer Organisation and Architecture
99 pages
Cuda Talk
100% (1)
Cuda Talk
82 pages
Computer System Basics
No ratings yet
Computer System Basics
11 pages
CraftyType Nudist Numbers
No ratings yet
CraftyType Nudist Numbers
43 pages
Line Printer
No ratings yet
Line Printer
11 pages
Introduction To CUDA C 3
No ratings yet
Introduction To CUDA C 3
67 pages
Build Computers with Nand2Tetris
No ratings yet
Build Computers with Nand2Tetris
13 pages
1 Cuda
100% (1)
1 Cuda
173 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
Computer GK MCQs Top 300 (Part 1) ?
No ratings yet
Computer GK MCQs Top 300 (Part 1) ?
31 pages
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
38 pages
GPU History & CUDA Programming Basics
No ratings yet
GPU History & CUDA Programming Basics
44 pages
Overview of GPGPU's
No ratings yet
Overview of GPGPU's
81 pages
Updated Chapter One - Introduction
No ratings yet
Updated Chapter One - Introduction
24 pages
CUDA Programming for Engineers
No ratings yet
CUDA Programming for Engineers
17 pages
From CPU To GPU With CUDA C Language: Michele Tuttafesta Dottorato Di Ricerca in Fisica 25 Ciclo
No ratings yet
From CPU To GPU With CUDA C Language: Michele Tuttafesta Dottorato Di Ricerca in Fisica 25 Ciclo
71 pages
Introduction To CUDA: CAP 4730 Spring 2012
No ratings yet
Introduction To CUDA: CAP 4730 Spring 2012
35 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
121 pages
Intro to CUDA Programming Guide
No ratings yet
Intro to CUDA Programming Guide
33 pages
Dumpstate Board
No ratings yet
Dumpstate Board
463 pages
Topic GPU1
No ratings yet
Topic GPU1
32 pages
Introduccion CUDA C
No ratings yet
Introduccion CUDA C
51 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
Aca Lab Manual Final
No ratings yet
Aca Lab Manual Final
28 pages
Module - 1-1
No ratings yet
Module - 1-1
58 pages
Gpu Cuda
No ratings yet
Gpu Cuda
204 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
Pipelining & Vector Processing Guide
No ratings yet
Pipelining & Vector Processing Guide
28 pages
21.L18 Intro To GPU and CUDA C
No ratings yet
21.L18 Intro To GPU and CUDA C
89 pages
Multi Core Processor
No ratings yet
Multi Core Processor
11 pages
Cuda C/C++ Basics: NVIDIA Corporation
No ratings yet
Cuda C/C++ Basics: NVIDIA Corporation
67 pages
Linux Boot Process
100% (2)
Linux Boot Process
9 pages
CUDA Programming for Engineers
No ratings yet
CUDA Programming for Engineers
84 pages
CUDA Introduction
No ratings yet
CUDA Introduction
39 pages
2023 CSC14120 Lecture00 CourseIntroduction
No ratings yet
2023 CSC14120 Lecture00 CourseIntroduction
30 pages
GPU & CUDA Programming Guide
No ratings yet
GPU & CUDA Programming Guide
31 pages
Lec 1
No ratings yet
Lec 1
27 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
OS Basics for Beginners
No ratings yet
OS Basics for Beginners
15 pages
Getting Started Guide
No ratings yet
Getting Started Guide
23 pages
Introduction To CUDA C
No ratings yet
Introduction To CUDA C
67 pages
3 Computation
No ratings yet
3 Computation
28 pages
HPC Final 4-8
No ratings yet
HPC Final 4-8
25 pages
Cuda 1
No ratings yet
Cuda 1
45 pages
CH04 Solution
No ratings yet
CH04 Solution
24 pages
Moving To Parallel With CUDA - Hello Program
No ratings yet
Moving To Parallel With CUDA - Hello Program
14 pages
2 Computation
No ratings yet
2 Computation
15 pages
CUDAProg Model
No ratings yet
CUDAProg Model
24 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
CUDA Basics for Engineering Students
No ratings yet
CUDA Basics for Engineering Students
4 pages
Lecture 12 GPU Programming
No ratings yet
Lecture 12 GPU Programming
65 pages
Panasonic BN Series
No ratings yet
Panasonic BN Series
12 pages
Chapter 8
No ratings yet
Chapter 8
58 pages
Basic Computer Operations
No ratings yet
Basic Computer Operations
5 pages
Table of Specifications
No ratings yet
Table of Specifications
1 page
Computer Architecture Quiz
No ratings yet
Computer Architecture Quiz
10 pages
Computer Hardware Components Guide
No ratings yet
Computer Hardware Components Guide
10 pages
Lecture3 Fundamentals of CUDA (Part1) - 2025
No ratings yet
Lecture3 Fundamentals of CUDA (Part1) - 2025
52 pages
Intro GPUs
No ratings yet
Intro GPUs
36 pages
Com 126theory Book PC Upgrade Maintenance
No ratings yet
Com 126theory Book PC Upgrade Maintenance
83 pages
Simple CUDA Programming Guide
No ratings yet
Simple CUDA Programming Guide
4 pages
Computer Applications: Amity Business School
No ratings yet
Computer Applications: Amity Business School
27 pages
Form 1 5th Seq Eval 20 - 21
No ratings yet
Form 1 5th Seq Eval 20 - 21
1 page
GPU Programming Slides 2
No ratings yet
GPU Programming Slides 2
37 pages
CUDA Introduction Mod
No ratings yet
CUDA Introduction Mod
50 pages
Extensa 5635/5635Z/5235 Series: Quick Guide
No ratings yet
Extensa 5635/5635Z/5235 Series: Quick Guide
12 pages
ICT 10 - Summative
No ratings yet
ICT 10 - Summative
3 pages
Activity: Computer Components
No ratings yet
Activity: Computer Components
3 pages
IoT Simulation Guide for Beginners
No ratings yet
IoT Simulation Guide for Beginners
5 pages
Cat6 B6 2023
No ratings yet
Cat6 B6 2023
3 pages
01 Cuda C Basics
No ratings yet
01 Cuda C Basics
32 pages
Transfer of Equipment - Accountability Form
No ratings yet
Transfer of Equipment - Accountability Form
1 page
CUDA
No ratings yet
CUDA
18 pages
Cuuda Nvidai Guide - Part1
No ratings yet
Cuuda Nvidai Guide - Part1
15 pages
Pny Linecard SSD
No ratings yet
Pny Linecard SSD
4 pages
DS1822 - Parallel Computing-Unit3
No ratings yet
DS1822 - Parallel Computing-Unit3
17 pages
Chapter7 GPU
No ratings yet
Chapter7 GPU
45 pages
Cuda PPT
No ratings yet
Cuda PPT
54 pages
CUDA Class Lecture02
No ratings yet
CUDA Class Lecture02
24 pages
CUDA Class Lecture01
No ratings yet
CUDA Class Lecture01
26 pages

04 IntroductionGPUsCUDA

Uploaded by

04 IntroductionGPUsCUDA

Uploaded by

CS516: Parallelization of Programs

Introduction GPUs and CUDA Programming

■ GPUs and CUDA Programming Demos

■ For many decades, the single core processors

Images Source: Internet 5

■ The early GPU designs

Images Source: Internet 6

■ CUDA (Compute Unified Device Architecture)

■ OpenCL (Open Computing Language)

(1) CPU to GPU (3) GPU to CPU

printf("%d\n", i * i); int main() {

return 0; fun<<<1, N>>>();

■ CPU and its associated (discrete) GPUs have separate

■ It is programmer's responsibility to keep them in sync.

(1) CPU to GPU (3) GPU to CPU

■ Copy data from CPU to GPU

This means we need two copies of the same variable –

■ CS6023 GPU Programming

You might also like