0% found this document useful (0 votes)

17 views15 pages

Lecture 2

Uploaded by

nadiabha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views15 pages

Lecture 2

Uploaded by

nadiabha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

CSC 391/691: GPU Programming Fall 2011

Introduction to GPUs for HPC

Copyright © 2011 Samuel S. Cho

High Performance Computing

• Speed: Many problems that are

interesting to scientists and engineers
would take a long time to execute on
a PC or laptop: months, years, “never”.

• Size: Many problems that are

interesting to scientists and engineers
can’t fit on a PC or laptop with a few
GB of RAM or a few 100s of GB of
disk space.

• Supercomputers or clusters of
computers can make these problems
practically numerically solvable.
Scientific and Engineering Problems
• Simulations of physical phenomena such as:
• Weather forecasting
• Earthquake forecasting
• Galaxy formation
• Oil reservoir management
• Molecular dynamics

• Data Mining: Finding needles of critical

information in a haystack of data such as:
• Bioinformatics
• Signal processing
• Detecting storms that might turn into
hurricanes

• Visualization: turning a vast sea of data into

pictures that scientists can understand.

• At its most basic level, all of these problems

involve many, many floating point
operations.
Hardware Accelerators

• In HPC, an accelerator is hardware

component whose role is to speed up
some aspect of the computing
workload.

• In the olden days (1980s), Not an accelerator*

supercomputers sometimes had array
processors, which did vector
operations on arrays

• PCs sometimes had floating point

accelerators: little chips that did the
floating point calculations in hardware
rather than software.

Not an accelerator*

*Okay, I lied.
To Accelerate Or Not To Accelerate

• Pro:
• They make your code
run faster.
• Cons:
• They’re expensive.
• They’re hard to program.
• Your code may not be
cross-platform.
Why GPU for HPC?

• Graphics Processing Units (GPUs) were

originally designed to accelerate graphics tasks
like image rendering.

• They became very very popular with video

gamers, because they’ve produced better and
better images, and lightning fast.

• And, prices have been extremely good, ranging

from three figures at the low end to four figures
at the high end.

• Chips are expensive to design (hundreds of

millions of $$$), expensive to build the factory
for (billions of $$$), but cheap to produce.

• For example, in 2006 – 2007, GPUs sold at a

rate of about 80 million cards per year,
generating about $20 billion per year in revenue.

• This means that the GPU companies have been

able to recoup the huge fixed costs.

• Remember: GPUs mostly do stuff like rendering

images. This is done through mostly floating
point arithmetic – the same stuff people use
supercomputing for!
What are GPUs?

• GPUs have developed from graphics cards into a

platform for high performance computing (HPC)
-- perhaps the most important development in Early design
HPC for many years.

• Co-processors -- very old idea that appeared in Memory CPU

1970s and 1980s with floating point co-
processors attached to microprocessors that
did not then have floating point capability.

• These coprocessors simply executed floating Graphics

point instructions that were fetched from card
memory.

• Around same time, interest to provide hardware

support for displays, especially with increasing
use of graphics and PC games. Display
• Led to graphics processing units (GPUs)
attached to CPU to create video display.
Modern GPU Design

• By late 1990’s, graphics chips needed Input stage

to support 3-D graphics, especially
for games and graphics APIs such as
DirectX and OpenGL. Vertex shader
stage

• Graphics chips generally had a Graphics

pipeline structure with individual memory
Geometry
stages performing specialized shader stage
operations, finally leading to loading
frame buffer for display.
Rasterizer stage

• Individual stages may have access to Frame

buffer
graphics memory for storing Pixel shading
intermediate computed data. stage
General Purpose GPU (GPGPU) Designs

• High performance pipelines call for high-speed (IEEE) floating point operations.

• Known as GPGPU (General-purpose computing on graphics processing units) -- Difficult

to do with specialized graphics pipelines, but possible.)

• By mid 2000’s, recognized that individual stages of graphics pipeline could be implemented
by a more general purpose processor core (although with a data-parallel paradigm)

• 2006 -- First GPU for general high performance computing as well as graphics processing,
NVIDIA GT 80 chip/GeForce 8800 card.

• Unified processors that could perform vertex, geometry, pixel, and general
computing operations

• Could now write programs in C rather than graphics APIs.

• Single-instruction multiple thread (SIMT) programming model

NVIDIA Tesla Platform

• NVIDIA Tesla series was their first platform for the high performance
computing market.

• Named for Nikola Tesla, a pioneering mechanical and electrical engineer and
inventor.

GTX 480 C2070

NVIDIA GTX 480 Specs

• 3 billion transistors

• 480 compute cores

• 1.401 GHz

• Single precision floating point performance:

1.35 TFLOPs (2 single precision flops per clock
per core)

• Double precision floating point performance:

168 GFLOPs (1 double precision flop per clock
per core)

• Internal RAM: 1.5 GB DDR5 VRAM

• Internal RAM speed: 177.4 GB/sec (compared

21-25 GB/sec for regular RAM)

• PCIe slot (at most 8 GB/sec per GPU card)

• 250 W thermal power

Coming: Kepler and Maxwell

• NVIDIA’s 20-series is also known by the codename “Fermi.” It runs at about

0.5 TFLOPs per GPU card (peak).

• The next generation, to be released in 2011, is codenamed “Kepler” and will

be capable of something like 1.4 TFLOPs double precision per GPU card.

• After “Kepler” will come “Maxwell” in 2013, capable of something like 4

TFLOPs double precision per GPU card.

• So, the increase in performance is likely to be roughly 2.5x – 3x per

generation, roughly every two years.
Maryland CPU/GPU Cluster Infrastructure

http://www.umiacs.umd.edu/
research/GPU/facilities.html
Intel’s Response to NVIDIA GPUs
Does it work?

Example Applications URL Speedup

Seismic Database http://www.headwave.com 66x – 100x
Mobile Phone Antenna Simulation http://www.accelware.com 45x
Molecular Dynamics http://www.ks.uiuc.edu/Research/vmd 21x – 100x
Neuron Simulation http://www.evolvedmachines.com 100x
MRI Processing http://bic-test.beckman.uiuc.edu 245x – 415x
Atmospheric Cloud Simulation http://www.cs.clemson.edu/~jesteel/clouds.html 50x

• Looks like remarkable speedup compared

to traditional CPU-based HPC approaches

Report On Gpu
No ratings yet
Report On Gpu
39 pages
Demand Analysis of Maggi
83% (6)
Demand Analysis of Maggi
8 pages
StuffIt Expander Read Me
No ratings yet
StuffIt Expander Read Me
10 pages
Lab Report On Basics Logic Gate
80% (10)
Lab Report On Basics Logic Gate
9 pages
NVIDIAFermiComputeArchitectureWhitepaper PDF
No ratings yet
NVIDIAFermiComputeArchitectureWhitepaper PDF
21 pages
10 - Introduction and Overview GPGPU
100% (1)
10 - Introduction and Overview GPGPU
69 pages
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
43 pages
IR-ADV C3530 C3525 C3520 III Series Partscatalog E EUR
No ratings yet
IR-ADV C3530 C3525 C3520 III Series Partscatalog E EUR
138 pages
1 Cuda
100% (1)
1 Cuda
173 pages
The 7 Balkan Conference On Operational Research Constanta, May 2005, Romania
No ratings yet
The 7 Balkan Conference On Operational Research Constanta, May 2005, Romania
11 pages
KITI FHK Technik 2015 Engl INT PDF
No ratings yet
KITI FHK Technik 2015 Engl INT PDF
140 pages
Algorithmic Considerations For Graphical Hardware Accelerated Applications
No ratings yet
Algorithmic Considerations For Graphical Hardware Accelerated Applications
9 pages
PNB vs. CA 217 Scra 347
100% (1)
PNB vs. CA 217 Scra 347
2 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
Gpu Cuda Part1
No ratings yet
Gpu Cuda Part1
27 pages
Lecture - 01 - CUDA Programming
No ratings yet
Lecture - 01 - CUDA Programming
52 pages
Lecture 0: Cpus and Gpus: Prof. Mike Giles
No ratings yet
Lecture 0: Cpus and Gpus: Prof. Mike Giles
36 pages
GPGPU
No ratings yet
GPGPU
139 pages
GPU Gpgpu Computing: Rajan Panigrahi
No ratings yet
GPU Gpgpu Computing: Rajan Panigrahi
24 pages
CUDA
No ratings yet
CUDA
46 pages
AHA U4
No ratings yet
AHA U4
199 pages
The Evolution of Gpus For General Purpose Computing
No ratings yet
The Evolution of Gpus For General Purpose Computing
38 pages
Accelerating Large Graph Algorithms On The GPU Using Cuda
No ratings yet
Accelerating Large Graph Algorithms On The GPU Using Cuda
12 pages
Apex Freebitcoin High Odds Long Runner Intelligent Bot
No ratings yet
Apex Freebitcoin High Odds Long Runner Intelligent Bot
16 pages
10 GPU-IntroCUDA3
No ratings yet
10 GPU-IntroCUDA3
141 pages
Evolution of The Graphics Process Units: Dr. Zhijie Xu Z.xu@hud - Ac.uk
No ratings yet
Evolution of The Graphics Process Units: Dr. Zhijie Xu Z.xu@hud - Ac.uk
24 pages
CUDA for Developers & Researchers
No ratings yet
CUDA for Developers & Researchers
77 pages
AVR128DA28 32 48 64 Data Sheet 40002183C
No ratings yet
AVR128DA28 32 48 64 Data Sheet 40002183C
684 pages
Gpgpu Workshop Cuda
No ratings yet
Gpgpu Workshop Cuda
10 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
Gpu IEEE Paper
No ratings yet
Gpu IEEE Paper
14 pages
CUDA Wikipedia
No ratings yet
CUDA Wikipedia
10 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
Parallel Processing Using GPU's
No ratings yet
Parallel Processing Using GPU's
34 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
Introduction To Massively Parallel Computing
No ratings yet
Introduction To Massively Parallel Computing
44 pages
HPC 5th Unit - 240504 - 160548
No ratings yet
HPC 5th Unit - 240504 - 160548
18 pages
Lim Vs Villarosa
No ratings yet
Lim Vs Villarosa
23 pages
Whitepaper NVIDIA's Next Generation CUDA Compute Architecture
No ratings yet
Whitepaper NVIDIA's Next Generation CUDA Compute Architecture
21 pages
Unit 2 - GPU DFG
No ratings yet
Unit 2 - GPU DFG
27 pages
The Role of Field-Programmable Gate Arrays in The Acceleration of Modern High - Performance Computing Workloads
No ratings yet
The Role of Field-Programmable Gate Arrays in The Acceleration of Modern High - Performance Computing Workloads
11 pages
Developers Had To Map Scientific Calculations Onto Problems That Could Be Represented by Triangles and Polygons
No ratings yet
Developers Had To Map Scientific Calculations Onto Problems That Could Be Represented by Triangles and Polygons
2 pages
Pari 1
No ratings yet
Pari 1
35 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
Unit 4
100% (1)
Unit 4
48 pages
Volume 3 ENG
0% (1)
Volume 3 ENG
475 pages
Memory Hierarchy in Computer Architecture
No ratings yet
Memory Hierarchy in Computer Architecture
4 pages
GPUIntro
No ratings yet
GPUIntro
21 pages
NVIDIA GPU Evolution: Gaming to AI
100% (1)
NVIDIA GPU Evolution: Gaming to AI
91 pages
Resumos GLO
No ratings yet
Resumos GLO
20 pages
TC74VHC240F, TC74VHC240FK TC74VHC244F, TC74VHC244FK
No ratings yet
TC74VHC240F, TC74VHC240FK TC74VHC244F, TC74VHC244FK
10 pages
Employee Rights in Bereavement Cases
No ratings yet
Employee Rights in Bereavement Cases
1 page
GPU Computing Course Overview
No ratings yet
GPU Computing Course Overview
17 pages
Lecture-12-PDC - CUDA
No ratings yet
Lecture-12-PDC - CUDA
25 pages
Programming For Graphics Processing Units (Gpus) : Parallel
No ratings yet
Programming For Graphics Processing Units (Gpus) : Parallel
35 pages
NLP-Based Intelligent Tagging System
No ratings yet
NLP-Based Intelligent Tagging System
23 pages
Bank Deposit Secrecy Law Overview
No ratings yet
Bank Deposit Secrecy Law Overview
7 pages
Lab 3
No ratings yet
Lab 3
16 pages
APPROVED Vendor Pending List
No ratings yet
APPROVED Vendor Pending List
177 pages
Intro GPUs
No ratings yet
Intro GPUs
36 pages
GPU Programming Course Schedule
No ratings yet
GPU Programming Course Schedule
33 pages
GPU Insights for Tech Enthusiasts
No ratings yet
GPU Insights for Tech Enthusiasts
35 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
Gpus
No ratings yet
Gpus
32 pages
NIKE Bleed Blue Integrated Campaign
No ratings yet
NIKE Bleed Blue Integrated Campaign
2 pages
Case Study
No ratings yet
Case Study
9 pages
Hygromatik Electrode Steam Humidifiers EU 2011
No ratings yet
Hygromatik Electrode Steam Humidifiers EU 2011
6 pages
EMA Literature Review Guide
No ratings yet
EMA Literature Review Guide
7 pages
SME Report English
No ratings yet
SME Report English
28 pages
13 SQLite
No ratings yet
13 SQLite
61 pages
Android Programming: Data Persistence
No ratings yet
Android Programming: Data Persistence
51 pages
Lecture01 Intro ToHPC
No ratings yet
Lecture01 Intro ToHPC
48 pages
KSP Response To LINK Nky Records Request
No ratings yet
KSP Response To LINK Nky Records Request
2 pages
Owens
No ratings yet
Owens
67 pages
p10 Cuda
No ratings yet
p10 Cuda
28 pages
06 Intro Gpus
No ratings yet
06 Intro Gpus
33 pages
MunicipalBank E-Passbook13-05-2024 195315
No ratings yet
MunicipalBank E-Passbook13-05-2024 195315
3 pages
Improving Population Health Using Electronic Health Records: Methods For Data Management and Epidemiological Analysis 1st Edition Goldstein
No ratings yet
Improving Population Health Using Electronic Health Records: Methods For Data Management and Epidemiological Analysis 1st Edition Goldstein
60 pages
UNIT 4 GPU Computing - HPC
No ratings yet
UNIT 4 GPU Computing - HPC
13 pages
DCFC Exam Dumps
No ratings yet
DCFC Exam Dumps
3 pages
Note2 4
No ratings yet
Note2 4
11 pages
Ratpapertoy 2020 Tinakrausnatural
No ratings yet
Ratpapertoy 2020 Tinakrausnatural
3 pages
AHA Unit - 4
No ratings yet
AHA Unit - 4
173 pages
Musa Idrisa Seminar
No ratings yet
Musa Idrisa Seminar
13 pages
AMPE Tema4 GPU Architecture
No ratings yet
AMPE Tema4 GPU Architecture
95 pages
Unit 4 Programming With Cuda
No ratings yet
Unit 4 Programming With Cuda
31 pages
Ada2024 Gpu 1
No ratings yet
Ada2024 Gpu 1
47 pages
Lecture 6 3
No ratings yet
Lecture 6 3
15 pages
Alshooaa Althaqib Company For General Trading, Contracting and Technical Services LTD
No ratings yet
Alshooaa Althaqib Company For General Trading, Contracting and Technical Services LTD
12 pages
CUDA Class Lecture01
No ratings yet
CUDA Class Lecture01
26 pages

Lecture 2

Uploaded by

Lecture 2

Uploaded by

CSC 391/691: GPU Programming Fall 2011

Introduction to GPUs for HPC

Copyright © 2011 Samuel S. Cho

• Speed: Many problems that are

• Size: Many problems that are

• Data Mining: Finding needles of critical

• Visualization: turning a vast sea of data into

• At its most basic level, all of these problems

• In HPC, an accelerator is hardware

• In the olden days (1980s), Not an accelerator*

• PCs sometimes had floating point

• Graphics Processing Units (GPUs) were

• They became very very popular with video

• And, prices have been extremely good, ranging

• Chips are expensive to design (hundreds of

• For example, in 2006 – 2007, GPUs sold at a

• This means that the GPU companies have been

• Remember: GPUs mostly do stuff like rendering

• GPUs have developed from graphics cards into a

• Co-processors -- very old idea that appeared in Memory CPU

• These coprocessors simply executed floating Graphics

• Around same time, interest to provide hardware

• By late 1990’s, graphics chips needed Input stage

• Graphics chips generally had a Graphics

• Individual stages may have access to Frame

• Known as GPGPU (General-purpose computing on graphics processing units) -- Difficult

• Could now write programs in C rather than graphics APIs.

• Single-instruction multiple thread (SIMT) programming model

GTX 480 C2070

• 480 compute cores

• Single precision floating point performance:

• Double precision floating point performance:

• Internal RAM: 1.5 GB DDR5 VRAM

• Internal RAM speed: 177.4 GB/sec (compared

• PCIe slot (at most 8 GB/sec per GPU card)

• 250 W thermal power

• NVIDIA’s 20-series is also known by the codename “Fermi.” It runs at about

• The next generation, to be released in 2011, is codenamed “Kepler” and will

• After “Kepler” will come “Maxwell” in 2013, capable of something like 4

• So, the increase in performance is likely to be roughly 2.5x – 3x per

Example Applications URL Speedup

• Looks like remarkable speedup compared

You might also like