0% found this document useful (0 votes)

110 views17 pages

High Performance Scientific Computing: S. Gopalakrishnan!

HPSC Notes

Uploaded by

Pratik Shirsath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

110 views17 pages

High Performance Scientific Computing: S. Gopalakrishnan!

HPSC Notes

Uploaded by

Pratik Shirsath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

High Performance Scientific computing

Lecture 4
S. Gopalakrishnan!
Memory Issues
Memory hierarchy

Faster

Costlier
Typical Hierarchy
Memory Latency Problem
Cache/MM virtual memory
Processor-DRAM Memory Performance Gap
Motivation for Memory Hierarchy
C µProc
1000CPU 8B a 32 B Memory 4 KB
CPU Memory disk
60%/yr.
disk
regs c (2X/1.5yr)
Performance

regs
h
100 Processor-Memory
e
Performance Gap:
! Notice 10
that the data width is changing (grows 50% / year)
• Why? DRAM
! Bandwidth: Transfer rate between various levels 5%/yr.
1 (2X/15 yrs)
• CPU-Cache: 24 GBps
1980

1984

1986

1988
1989
1990

1992

1994

1996

1998
1999
1981
1982
1983

1985

1987

1991

1993

1995

1997

2000
• Cache-Main: 0.5-6.4GBps
• Main-Disk: 187MBps (serial ATA/1500)
Time
ECE232: Memory Hierarchy 5 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

Source:Ece
ECE232: Memory Hierarchy 12 232 Umass-Amherst
Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Virtual Memory and Paging
Virtual memory Physical
(per process) memory

Another
process's
memory

RAM

Source: www. wikipedia.com

Disk
Memory Hierarchy Terminology
Memory Hierarchy: Terminology
! Hit: data appears in upper level in block X
! Hit Rate: the fraction of memory accesses found in the upper
level
! Miss: data needs to be retrieved from a block in the lower
level (Block Y)
! Miss Rate = 1 - (Hit Rate)
! Hit Time: Time to access the upper level which consists of
Time to determine hit/miss + upper level access time
! Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block to the processor
! Note: Hit Time << Miss Penalty Lower Level
To Processor Upper Level
Block Y

From Processor
Block X

Source: ECE 232 Umass-Amherst

ECE232: Memory Hierarchy 15 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Current Memory Hierarchy
Current Memory Hierarchy
Memory Latency Problem
Processor
Processor-DRAM Memory Performance Gap
Motivation for Memory Hierarchy
Control µProc
Secondary
1000
Main 60%/yr.
Memory
(2X/1.5yr)
Performance

L2 Memory
Data-
regs

100 L1 Cache Processor-Memory

path Cache
Performance Gap:
10 (grows 50% / year)
DRAM
Speed(ns): 1ns 2ns 6ns 100ns 10,000,000ns
5%/yr.
Size (MB): 1 0.0005 0.1 1-4 1000-6000 500,000
(2X/15 yrs)
Cost ($/MB): -- $10 $3 $0.01 $0.002
1980

1984

1986

1988
1989
1990

1992

1994

1996

1998
1999
1981
1982
1983

1985

1987

1991

1993

1995

1997

2000
Technology: Regs SRAM SRAM DRAM Disk

• Cache - Main memory: Time

Speed
• Main
ECE232: memory
Memory Hierarchy 5 – Disk (virtual memory): Capacity
Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

Source:Ece
ECE232: Memory Hierarchy 16 232 Umass-Amherst
Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Introduction to Parallel Programming
Shared'Memory,Processing,
,Each,processor,can,access,the,en6re,data,space,
,
–  Pro’s,
•  Easier,to,program,
•  Amenable,to,automa6c,parallelism,
•  Can,be,used,to,run,large,memory,serial,programs,
–  Con’s,
•  Expensive,
•  Diﬃcult,to,implement,on,the,hardware,level,
•  Processor,count,limited,by,conten6on/coherency,(currently,around,512),
•  Watch,out,for,“NU”,part,of,“NUMA”,
Distributed*–*Memory*Machines*
!  Each*node*in*the*computer*has*a*locally*addressable*memory*space*
!  The*computers*are*connected*together*via*some*high:speed*network*
–  Inﬁniband,*Myrinet,*Giganet,*etc..*

•  Pros*
–  Really*large*machines*
–  Size*limited*only*by*gross*physical*
consideraFons:*
•  Room*size*
•  Cable*lengths*(10’s*of*meters)*
•  Power/cooling*capacity*
•  Money!*
–  Cheaper*to*build*and*run*
•  Cons*
–  Harder*to*program*
* *Data*Locality*
MPPs$(Massively$Parallel$Processors)$

Distributed$memory$at$largest$scale.$$OTen$shared$memory$
$at$lower$hierarchies.$

•  IBM$BlueGene/L$(LLNL)$
–  131,072$700$Mhz$processors$
–  256$MB$or$RAM$per$processor$
–  Balanced$compute$speed$with$interconnect$

!  Red$Storm$(Sandia$NaJonal$Labs)$
–  12,960$Dual$Core$2.4$Ghz$Opterons$
–  4$GB$of$RAM$per$processor$
–  Proprietary$SeaStar$interconnect$
fundamentally different design
Comparison of CPU vs GPU Architecture
philosophies.
ALU ALU
Control
ALU ALU
CPU GPU

Cache

DRAM DRAM

Source: Prof. Wen-mei W. Hwu UIUC

GPU vs CPU computingGPU CPU Analogy

It is more effective to deliver Pizza’s through light duty scooters

rather than big truck. Similarly effective to use several lightweight
GPU processors for parallel tasks.
GPU Performance
Performance Advantage of GPUs
Peak performance increase
• An enlarging peak performance
Calculation advantage:
~ 1 TFlop on Desktop
– Calculation:
Memory1 TFLOPS vs. 100~GFLOPS
Bandwidth 150 GB/s
– Memory Bandwidth: 100-150 GB/s vs. 32-64 GB/s

Courtesy: John Owens

– GPU in every PC and workstation – massive volume and potential

source: top500.org
source: top500.org
Compute Unified Device Architecture
(CUDA)

• CUDA set of APIs (application program interface)

to use GPU’s for general purpose computing

• Developed and released by NVIDIA Inc. Works

only on NVIDIA GPU hardware

• Works on commercial GPU’s and as well as

specialized ones for scientific computing (Tesla)

• CUDA compiler supports C programming

language. Extensions to FORTRAN are possible.

• Opensource alternative is OpenCL.

Cs 903advanced Computer Architecture Unit - I
No ratings yet
Cs 903advanced Computer Architecture Unit - I
57 pages
Memory Hierarchy
No ratings yet
Memory Hierarchy
3 pages
An Expanded View of The Memory System: Processor
No ratings yet
An Expanded View of The Memory System: Processor
3 pages
ECE4680 Computer Organization and Architecture Memory Hierarchy
No ratings yet
ECE4680 Computer Organization and Architecture Memory Hierarchy
7 pages
ECE 152 Introduction To Computer Architecture Where We Are in This Course Right Now
No ratings yet
ECE 152 Introduction To Computer Architecture Where We Are in This Course Right Now
12 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Anatomy of A Computer
No ratings yet
Anatomy of A Computer
9 pages
Memory Hierarchy
100% (1)
Memory Hierarchy
47 pages
USIT103 Operating System
No ratings yet
USIT103 Operating System
222 pages
Chapter-10 Parallel Programming Models, Languages and Compilers
No ratings yet
Chapter-10 Parallel Programming Models, Languages and Compilers
30 pages
Data Storage Hierarchy
No ratings yet
Data Storage Hierarchy
14 pages
Memory Hirecracy
No ratings yet
Memory Hirecracy
3 pages
551 10 14 2010 Memory PDF
No ratings yet
551 10 14 2010 Memory PDF
66 pages
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
No ratings yet
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
112 pages
How To Code v7
No ratings yet
How To Code v7
90 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
No ratings yet
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
11 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
CM303
No ratings yet
CM303
17 pages
CS 3853 Computer Architecture - Memory Hierarchy
No ratings yet
CS 3853 Computer Architecture - Memory Hierarchy
37 pages
Unit I Instruction Level Parallelism Two Mark Questions: Dept of Cse G.SURESH. M.Tech, Asst Prof / CSE
No ratings yet
Unit I Instruction Level Parallelism Two Mark Questions: Dept of Cse G.SURESH. M.Tech, Asst Prof / CSE
12 pages
FALLSEM2021-22 CSE2001 TH VL2021220104187 Reference Material I 20-09-2021 Memory Hierarchy Design and Its Characteristics
No ratings yet
FALLSEM2021-22 CSE2001 TH VL2021220104187 Reference Material I 20-09-2021 Memory Hierarchy Design and Its Characteristics
3 pages
Zesto Manual
No ratings yet
Zesto Manual
43 pages
Computer Memory Hierarchy Guide
No ratings yet
Computer Memory Hierarchy Guide
42 pages
Unit 2: Bus Cycles and System Architecture
No ratings yet
Unit 2: Bus Cycles and System Architecture
17 pages
Computer Organization and Architecture
67% (3)
Computer Organization and Architecture
111 pages
IOMMU Optimization for Experts
No ratings yet
IOMMU Optimization for Experts
121 pages
Chapter 3 P1
No ratings yet
Chapter 3 P1
57 pages
Stud-CSA Memory Mod2-Part2 (Autosaved) (Autosaved)
No ratings yet
Stud-CSA Memory Mod2-Part2 (Autosaved) (Autosaved)
48 pages
Case Study Allotment
No ratings yet
Case Study Allotment
4 pages
Memory Hierarchy Explained
No ratings yet
Memory Hierarchy Explained
14 pages
Lec13 Memory 1 Notes
No ratings yet
Lec13 Memory 1 Notes
27 pages
Chapter 3
No ratings yet
Chapter 3
31 pages
Aku-Eb Model Paper 2020 For Teaching & Learning
No ratings yet
Aku-Eb Model Paper 2020 For Teaching & Learning
8 pages
Memory Organization Memory Hierarchy 2.2.1
No ratings yet
Memory Organization Memory Hierarchy 2.2.1
3 pages
CNE302 Computer Organization and Architecture: Lecture 01 - Introduction Instructor
No ratings yet
CNE302 Computer Organization and Architecture: Lecture 01 - Introduction Instructor
39 pages
Memory Optimization Techniques
No ratings yet
Memory Optimization Techniques
80 pages
Week6 Slides
No ratings yet
Week6 Slides
18 pages
H27UBG8T2A Hynix
No ratings yet
H27UBG8T2A Hynix
67 pages
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
No ratings yet
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
32 pages
ACPI and SMI Handlers Some Limits To Trusted Computing
No ratings yet
ACPI and SMI Handlers Some Limits To Trusted Computing
22 pages
Fluid Mechanics Lab (Me 224) : Sr. No. Roll No. Name Section Batch Tuesday Batch
No ratings yet
Fluid Mechanics Lab (Me 224) : Sr. No. Roll No. Name Section Batch Tuesday Batch
4 pages
Lecture4-Ch2-Memory Hierarchy Design
No ratings yet
Lecture4-Ch2-Memory Hierarchy Design
34 pages
Ddca 2024 Lecture24 Memory Hierarchy and Caches Beforelecture
No ratings yet
Ddca 2024 Lecture24 Memory Hierarchy and Caches Beforelecture
304 pages
Sardar Jokes for Laughter Lovers
No ratings yet
Sardar Jokes for Laughter Lovers
5 pages
Overhead Memory On Virtual Machines
No ratings yet
Overhead Memory On Virtual Machines
7 pages
CPU Cycles and Pipeline Performance
No ratings yet
CPU Cycles and Pipeline Performance
16 pages
Advanced Memory Systems Lecture
No ratings yet
Advanced Memory Systems Lecture
88 pages
Memory Hierarchy and Performance
No ratings yet
Memory Hierarchy and Performance
28 pages
Essbase Application Performance Tuning
No ratings yet
Essbase Application Performance Tuning
4 pages
Tutorial 2 Solution: Kinematics and Dynamics of Machines (Me 316)
No ratings yet
Tutorial 2 Solution: Kinematics and Dynamics of Machines (Me 316)
5 pages
CPU Organization and Functions
100% (3)
CPU Organization and Functions
5 pages
Memory Design
No ratings yet
Memory Design
36 pages
Module 5
100% (1)
Module 5
7 pages
N-Way Set Associative Cache Guide
No ratings yet
N-Way Set Associative Cache Guide
70 pages
Model Mania 2003 Phase 1
No ratings yet
Model Mania 2003 Phase 1
1 page
CH05 COA11e
No ratings yet
CH05 COA11e
43 pages
Memory Hierarchy Explained
No ratings yet
Memory Hierarchy Explained
57 pages
Memory Hierarchy Basics
No ratings yet
Memory Hierarchy Basics
12 pages
Low Power DSP for SAR Processing
No ratings yet
Low Power DSP for SAR Processing
6 pages
Munshi Meraj Hossain - PCCCS302
No ratings yet
Munshi Meraj Hossain - PCCCS302
10 pages
Memory Hierarchy & Technology Trends
No ratings yet
Memory Hierarchy & Technology Trends
8 pages
CPU and Memory Basics
No ratings yet
CPU and Memory Basics
10 pages
Workshop CAT-3
No ratings yet
Workshop CAT-3
14 pages
Dell Poweredge Raid Controller (Perc) H310, H710, H710P, and H810 User'S Guide
No ratings yet
Dell Poweredge Raid Controller (Perc) H310, H710, H710P, and H810 User'S Guide
90 pages
Module 6 - Memory
No ratings yet
Module 6 - Memory
32 pages
Dpco Unit 5 Notes
No ratings yet
Dpco Unit 5 Notes
49 pages
Memory and Cache
No ratings yet
Memory and Cache
9 pages
Sol - Kvpy 2014 Stage-I - SX (Xii) - Answer Keys (NW)
No ratings yet
Sol - Kvpy 2014 Stage-I - SX (Xii) - Answer Keys (NW)
1 page
Hierarchy of Memory in Computer Organization and Architecture
No ratings yet
Hierarchy of Memory in Computer Organization and Architecture
8 pages
Lecture Slides 07 073-Caches-Hierar
No ratings yet
Lecture Slides 07 073-Caches-Hierar
7 pages
COA Micro Ahsham
No ratings yet
COA Micro Ahsham
9 pages
Linux System Performance Metrics
No ratings yet
Linux System Performance Metrics
14 pages
Memory Hierachy
No ratings yet
Memory Hierachy
4 pages
Lecture 2.2.1 (Memory Organization-Memory Hierarchy)
No ratings yet
Lecture 2.2.1 (Memory Organization-Memory Hierarchy)
10 pages
Memory HIerarchy
No ratings yet
Memory HIerarchy
53 pages
LECTURE 1 - Memory Hierarchy and Locality of Reference
No ratings yet
LECTURE 1 - Memory Hierarchy and Locality of Reference
58 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
26 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
Week 10
No ratings yet
Week 10
59 pages
Computer Architecture Unit 2
No ratings yet
Computer Architecture Unit 2
32 pages
OS Module 4
No ratings yet
OS Module 4
32 pages
CSC 206 Exam Study Questions
No ratings yet
CSC 206 Exam Study Questions
13 pages
The Context-Switch Overhead Inflicted by Hardware Interrupts (And The Enigma of Do-Nothing Loops)
No ratings yet
The Context-Switch Overhead Inflicted by Hardware Interrupts (And The Enigma of Do-Nothing Loops)
14 pages
Coaint
No ratings yet
Coaint
16 pages
CompArch 18a Cache-1
No ratings yet
CompArch 18a Cache-1
14 pages
Memory Hierarchy
No ratings yet
Memory Hierarchy
19 pages
Lecture 5
No ratings yet
Lecture 5
44 pages
PDC 23 - Memory Consistency Model and Hierarchies
No ratings yet
PDC 23 - Memory Consistency Model and Hierarchies
22 pages
Multi Level Memory
No ratings yet
Multi Level Memory
27 pages
Memory Hierarchy
No ratings yet
Memory Hierarchy
3 pages
Manual BVME4000-6000
No ratings yet
Manual BVME4000-6000
82 pages
Memory Hierarchy
No ratings yet
Memory Hierarchy
2 pages
Chapter 5
No ratings yet
Chapter 5
17 pages

High Performance Scientific Computing: S. Gopalakrishnan!

Uploaded by

High Performance Scientific Computing: S. Gopalakrishnan!

Uploaded by

High Performance Scientific computing

Source: www. wikipedia.com

Source: ECE 232 Umass-Amherst

100 L1 Cache Processor-Memory

• Cache - Main memory: Time

Source: Prof. Wen-mei W. Hwu UIUC

GPU vs CPU computingGPU CPU Analogy

It is more effective to deliver Pizza’s through light duty scooters

Courtesy: John Owens

– GPU in every PC and workstation – massive volume and potential

• CUDA set of APIs (application program interface)

• Developed and released by NVIDIA Inc. Works

• Works on commercial GPU’s and as well as

• CUDA compiler supports C programming

• Opensource alternative is OpenCL.

You might also like