Vector Processing & GPU Basics
MODULE III
Contents/Syllabus
Vector processing and array processing
CPU v/s GPU
GPU Architecture
Introduction to GPU programming – CUDA,
Memory Hierarchy Design
Vector Processor
Vector processor is basically a central processing
unit that has the ability to execute the complete
vector input in a single instruction.
It is a complete unit of hardware resources that
executes a sequential set of similar data items in
the memory using a single instruction.
Architecture & Working
Vectorized Code
Vectorized Code
Scalar Processing v/s Vector Processing
Loop 10 iterations
Read instruction and decode
Read ith instruction and decode
Fetch all 10 elements of A[]
Fetch the A[i] element
Fetch all 10 elements of B[]
Fetch the B[i] element
Add A[ ]+B[ ]
Add A[i] + B[i]
Store result in C[ ]
Store result in C[i]
Increment i till i=10
Classification of Vector
Processors
Vector Processor
Architectures
Register to Register Memory to Memory
Architecture Architecture
Register to Register
Architecture
Highly used in vector computers.
The fetching of the operand or previous results
indirectly takes place through the main memory by
the use of registers.
The several vector pipelines present in the vector
computer help in retrieving the data from the
registers and also storing the results in the
desired register.
Register to Register
Architecture
These vector registers are user instruction
programmable.
According to the register address present in the
instruction, the data is fetched and stored in the
desired register.
Memory to Memory Architecture
The operands or the results are directly fetched
from the memory despite using registers.
The address of the desired data to be accessed
must be present in the vector instruction.
Memory to Memory Architecture
This architecture enables the fetching of data of
size 512 bits from memory to pipeline.
Due to high memory access time, the pipelines of
the vector computer requires higher startup time,
as higher time is required to initiate the vector
instruction.
Graphics Processing Unit
Highlights
What is a GPU?
What is the Difference between a CPU and a GPU?
Why should you use a GPU?
GPU - Introduction
The GPU accelerate applications running on the CPU
by offloading some of the compute-intensive and
time consuming portions of the code.
The rest of the application still runs on the CPU.
This is known as "heterogeneous" or "hybrid"
computing.
Uses massive Parallel Processing Power
GPU-Introduction
A CPU consists of two to eight CPU cores, while
the GPU consists of hundreds of smaller cores.
Together, they operate to crunch through the data
in the application.
This massively parallel architecture is what gives
the GPU its high compute performance.
CPU V/S GPU
Check out these YouTube Videos
CPU V/S GPU
CPU GPU
Central Processing Unit Graphics Processing Unit
Several cores Many cores
Low latency High throughput
Good for serial processing Good for parallel processing
Can do a handful of operations at once Can do thousands of operations at once
Best GPU Manufacturers
CUDA Architecture
CUDA (an acronym for Compute Unified Device
Architecture) is a parallel computing platform and
application programming interface (API) model
created by Nvidia.
It allows software developers and software
engineers to use a CUDA-enabled graphics
processing unit (GPU) for general purpose
CUDA Architecture
The CUDA platform is designed to work with
programming languages such as C, C++, and Fortran.
This accessibility makes it easier for specialists
in parallel programming to use GPU resources
CUDA - GPU PROCESS
1. Copy data from main memory
to GPU memory
2. CPU initiates the GPU
compute kernel
3. GPU's CUDA cores execute the
kernel in parallel
4. Copy the resulting data from
GPU memory to main memory
CUDA ARCHITECTURE
The CUDA Architecture consists of several
components, in the green boxes below:
1. Parallel compute engines inside NVIDIA GPUs
2. OS kernel-level support for hardware
initialization, configuration, etc.
3. User-mode driver, which provides a device-level
API for developers
4. PTX instruction set architecture (ISA) for
parallel computing kernels and functions
CUDA ARCHITECTURE