0% found this document useful (0 votes)

12 views38 pages

Memory & Cache Fundamentals

The lecture discusses the performance gap between processors and memory, emphasizing the need for fast, inexpensive, and large memory solutions. It explains the concepts of locality, memory hierarchy, cache memory, and various cache types, including direct mapped and associative caches. Additionally, it covers cache performance metrics, replacement policies, and the impact of memory access patterns on overall system performance.

Uploaded by

Umar Ahmed Sani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views38 pages

Memory & Cache Fundamentals

Uploaded by

Umar Ahmed Sani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

Computer Architecture and Operating

Systems
Lecture 8: Memory and Caches

Andrei Tatarnikov
[email protected]
@andrewt0301
Processor-Memory
Performance Gap
 Computer performance depends on:
 Processor performance
 Memory performance

2
Memory Challenge
 Make memory appear as fast as processor
 Ideal memory:
 Fast
 Cheap (inexpensive)
 Large (capacity)

But can only choose two!

3
Memory Technology
 Static RAM (SRAM)
 0.5 – 2.5 ns, $500 – $1000 per GB
 Dynamic RAM (DRAM)
 50 – 70 ns, $10 – $20 per GB
 Flash Memory
 5 000 – 50 000 ns, $0.75 – $1.00 per GB
 Magnetic Disk
 5 000 000 – 20 000 000 ns, $0.05 – $0.1 per GB
 Ideal Memory
 Access time of SRAM
 Capacity and cost/GB of disk 4
Locality
No need for large memory to access it fast
Just exploit locality
 Temporal Locality:
 Locality in time
 If data used recently, likely to use it again soon
 How to exploit: keep recently accessed data in higher levels of
memory hierarchy
 Spatial Locality:
 Locality in space
 If data used recently, likely to use nearby data soon
 How to exploit: when access data, bring nearby data into 5
higher levels of memory hierarchy too
Taking Advantage of Locality
 Memory hierarchy
 Store everything on disk
 Copy recently accessed (and nearby) items from disk
to smaller DRAM memory
 Main memory
 Copy more recently accessed (and nearby) items from
DRAM to smaller SRAM memory
 Cache memory attached to CPU
6
Memory Hierarchy
 Personal mobile
device

 Laptop or
desktop

 Server
7
How It Works?
 Block (aka line): unit of copying
 May be multiple words Processor

 If accessed data is present in upper level

 Hit: access satisfied by upper level L1

 Hit ratio: hits/accesses

 If accessed data is absent L2
 Miss: block copied from lower level
 Time taken: miss penalty
 Miss ratio: misses/accesses = 1 – hit ratio Memory
 Then accessed data supplied from upper level
8
Hits and Misses
 On cache hit, CPU proceeds normally
 On cache miss
 Stall the CPU pipeline
 Fetch block from next level of hierarchy
 Instruction cache miss
 Restart instruction fetch
 Data cache miss
 Complete data access
9
Miss Types
 Compulsory: first time data accessed

 Capacity: cache too small to hold all data of interest

 Conflict: data of interest maps to a location in cache

mapped to different data

10
Memory Performance
 Hit: data found in that level of memory hierarchy
 Miss: data not found (must go to next level)
 Hit Rate = # hits / # memory accesses = 1 – Miss Rate
 Miss Rate = # misses / # memory accesses = 1 – Hit Rate
 Average memory access time (AMAT): average time for
processor to access data
 AMAT = tcache + MRcache[tMM + MRMM(tVM)]
11
Cache Memory
 Cache memory
 The level of the memory hierarchy closest to the CPU
 Given accesses X1, …, Xn–1, Xn

 How do we know if the

data is present?
 Where do we look?

12
Direct Mapped Cache
 Location determined by address
 Direct mapped: only one choice
 (Block address) modulo (#Blocks in cache)
Cache
 #Blocks is a power of 2
Memory
 Use low-order address bits

13
Tags and Valid Bits
 How do we know which particular block is stored in a
cache location?
 Store block address as well as the data
 Actually, only need the high-order bits
 Called the tag
 What if there is no data in a location?
 Valid bit: 1 = present, 0 = not present
 Initially 0
14
Direct Mapped Cache Example
 8-blocks, 1 word/block, direct mapped
 Initial state

15
Direct Mapped Cache Example
Word addr Binary addr Hit/miss Cache block
22 10 110 Miss 110

22
Spectrum of Associativity
 For a cache with 8 entries

23
Associativity Example
 Compare 4-block caches
 Direct mapped, 2-way set associative, fully associative
 Block access sequence: 0, 8, 0, 6, 8
 Direct mapped
Block Cache Hit/miss Cache content after access
address index 0 1 2 3

0 0 miss Mem[0]
8 0 miss Mem[8]
0 0 miss Mem[0]
6 2 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]
24
Associativity Example
 2-way set associative
Block Cache Hit/miss Cache content after access
address index Set 0 Set 1

0 0 miss Mem[0]
8 0 miss Mem[0] Mem[8]
0 0 hit Mem[0] Mem[8]
6 0 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]
 Fully associative
Block Hit/miss Cache content after access
address
0 miss Mem[0]
8 miss Mem[0] Mem[8]
0 hit Mem[0] Mem[8]
6 miss Mem[0] Mem[8] Mem[6]
8 hit Mem[0] Mem[8] Mem[6]
25
How Much Associativity
 Increased associativity decreases miss rate
 But with diminishing returns
 Simulation of a system with 64KB
D-cache, 16-word blocks, SPEC2000
 1-way: 10.3%
 2-way: 8.6%
 4-way: 8.3%
 8-way: 8.1%
26
Replacement Policy
 Direct mapped
 No choice
 Set associative
 Prefer non-valid entry, if there is one
 Otherwise, choose among entries in the set
 Least-recently used (LRU)
 Choose the one unused for the longest time
 Simple for 2-way, manageable for 4-way, too hard beyond that
 Random
 Gives approximately the same performance as LRU for
high associativity 27
Write-Through
 On data-write hit, could just update the block in cache
 But then cache and memory would be inconsistent
 Write through: also update memory
 But makes writes take longer
 e.g., if base CPI = 1, 10% of instructions are stores, write
to memory takes 100 cycles
 Effective CPI = 1 + 0.1×100 = 11
 Solution: write buffer
 Holds data waiting to be written to memory
 CPU continues immediately
 Only stalls on write if write buffer is already full 28
Write-Back
 Alternative: On data-write hit, just update the block in
cache
 Keep track of whether each block is dirty

 When a dirty block is replaced

 Write it back to memory
 Can use a write buffer to allow replacing block to be read first
29
Write Allocation
 What should happen on a write miss?
 Alternatives for write-through
 Allocate on miss: fetch the block
 Write around: don’t fetch the block
 Since programs often write a whole block before reading it
(e.g., initialization)
 For write-back
 Usually fetch the block
30
Multilevel Caches
 Primary cache attached to CPU
 Small, but fast

 Level-2 cache services misses from primary cache

 Larger, slower, but still faster than main memory

 Main memory services L-2 cache misses

 Some high-end systems include L-3 cache 31

Measuring Cache
Performance
 Components of CPU time
 Program execution cycles
 Includes cache hit time
 Memory stall cycles
 Mainly from cache misses
 With simplifying assumptions:
Memory Accesses
Memory Stall Cycles = × Miss Rate × Miss Penalty
Program
Instructions Misses
= × × Miss Penalty
Program Instructions
32
Cache Performance Example
 Given
 I-cache miss rate = 2%
 D-cache miss rate = 4%
 Miss penalty = 100 cycles
 Base CPI (ideal cache) = 2
 Load & stores are 36% of instructions
 Miss cycles per instruction
 I-cache: 0.02 × 100 = 2
 D-cache: 0.36 × 0.04 × 100 = 1.44
 Actual CPI = 2 + 2 + 1.44 = 5.44
 Ideal CPU is 5.44/2 =2.72 times faster
33
Average Access Time
 Hit time is also important for performance
 Average memory access time (AMAT)
 AMAT = Hit time + Miss rate × Miss penalty
 Example
 CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20
cycles, I-cache miss rate = 5%
 AMAT = 1 + 0.05 × 20 = 2ns
 2 cycles per instruction

34
Overal Performance Summary
 When CPU performance increased
 Miss penalty becomes more significant
 Decreasing base CPI
 Greater proportion of time spent on memory stalls
 Increasing clock rate
 Memory stalls account for more CPU cycles
 Can’t neglect cache behavior when evaluating system
performance

35
Example: How Caches Affect Performance

Matrix Multiplication
Loop order: i, j, k Loop order: i, k, j Loop order: j, k, i
for (int i= 0; i < n; i++) { for (int i= 0; i < n; i++) { for (int j= 0; j < n; j++) {
for (int j= 0; j < n; j++) for (int k= 0; k < n; k++) for (int k= 0; k < n; k++)
{ { {
for (int k= 0; k < n; k+ for (int j= 0; j < n; j+ for (int i= 0; i < n; i+
+) { +) { +) {
C[i][j]+= A[i][k]*B[k] C[i][j]+= A[i][k]*B[k] C[i][j]+= A[i][k]*B[k]
[j]; [j]; [j];
} } }
} } }
}
Running time: }
Running time: }
Running time:
13.714264 sec. 2.739385 sec. 19.074106 sec.
Performance: Performance: Performance:
~ 153 MFLOPS ~ 795 MFLOPS ~ 113 MFLOPS
36
Memory Access Patterns
Loop order: i, j, k Loop order: i, k, j Loop order: j, k, i

A A A

B B B

C C C
37
Any Questions?
.text
__start: addi t1, zero, 0x18
addi t2, zero, 0x21
cycle: beq t1, t2, done
slt t0, t1, t2
bne t0, zero, if_less
nop
sub t1, t1, t2
j cycle
nop
if_less: sub t2, t2, t1
j cycle
done: add t3, t1, zero

10 ChatGPT Plugins For Data Science Cheat Sheet KDnuggets
No ratings yet
10 ChatGPT Plugins For Data Science Cheat Sheet KDnuggets
1 page
CH04 COA9e
No ratings yet
CH04 COA9e
58 pages
13 - Large and Fast Exploiting Memory Hierarchy Final
No ratings yet
13 - Large and Fast Exploiting Memory Hierarchy Final
118 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
Ficha Conector Obd de Renault
No ratings yet
Ficha Conector Obd de Renault
3 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
Gauri Shrimali: Class/Degree Board/University Year of Passing Percentage/CGPA
No ratings yet
Gauri Shrimali: Class/Degree Board/University Year of Passing Percentage/CGPA
5 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
Unit 4
No ratings yet
Unit 4
72 pages
Cache
No ratings yet
Cache
34 pages
ERC RESOLUTION NO 115 01 Amended 20191217-1-Yh36qh
No ratings yet
ERC RESOLUTION NO 115 01 Amended 20191217-1-Yh36qh
118 pages
Week12 Updated
No ratings yet
Week12 Updated
60 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
No ratings yet
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
13 pages
24-Cache Memory Mapping Techniques-14!03!2024
No ratings yet
24-Cache Memory Mapping Techniques-14!03!2024
36 pages
Coa PPT
No ratings yet
Coa PPT
158 pages
Unit II
No ratings yet
Unit II
9 pages
CA Chap5 Memory
No ratings yet
CA Chap5 Memory
91 pages
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
Unit Iv
No ratings yet
Unit Iv
61 pages
Week 13 - Lecture 13 - Memory (Cont)
No ratings yet
Week 13 - Lecture 13 - Memory (Cont)
31 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
23 pages
10 Cacheperf
No ratings yet
10 Cacheperf
24 pages
Lesson4 Peripheral Devices
75% (4)
Lesson4 Peripheral Devices
4 pages
Memory 2
No ratings yet
Memory 2
31 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
Chapter 2z
No ratings yet
Chapter 2z
54 pages
Cache 1 54
No ratings yet
Cache 1 54
54 pages
Lec 4a
No ratings yet
Lec 4a
25 pages
Cache Memory Organization Guide
No ratings yet
Cache Memory Organization Guide
19 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
CH10 - Memory Hierarchy
No ratings yet
CH10 - Memory Hierarchy
106 pages
Cache
No ratings yet
Cache
36 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
ZTE Technical Proposal of TELCEL CSR Project
100% (1)
ZTE Technical Proposal of TELCEL CSR Project
20 pages
5 1
No ratings yet
5 1
39 pages
MSC Troubleshooting With MGW
No ratings yet
MSC Troubleshooting With MGW
28 pages
Cache Design
No ratings yet
Cache Design
59 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
10 Cache Memories
No ratings yet
10 Cache Memories
49 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
Ca Mod 2
No ratings yet
Ca Mod 2
40 pages
Owners' Perspective in Construction
100% (1)
Owners' Perspective in Construction
286 pages
Cache Memory & Design Principles
No ratings yet
Cache Memory & Design Principles
47 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
Vit MCQ
No ratings yet
Vit MCQ
9 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
Lec2 PDF
No ratings yet
Lec2 PDF
21 pages
361 Computer Architecture Lecture 14: Cache Memory
No ratings yet
361 Computer Architecture Lecture 14: Cache Memory
20 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Service DEH-4000UB, 3050UB
No ratings yet
Service DEH-4000UB, 3050UB
77 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
COMP 740: Computer Architecture and Implementation: Montek Singh
No ratings yet
COMP 740: Computer Architecture and Implementation: Montek Singh
41 pages
Internal Architecture of 8085 Microprocessor: A. Control Unit
No ratings yet
Internal Architecture of 8085 Microprocessor: A. Control Unit
17 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Chapter 5 Large and Fast Exploiting Memory Hierarchy
No ratings yet
Chapter 5 Large and Fast Exploiting Memory Hierarchy
101 pages
CPSC 312 Cache Memories: Topics
No ratings yet
CPSC 312 Cache Memories: Topics
39 pages
Fault Codes for Bus Technicians
100% (2)
Fault Codes for Bus Technicians
29 pages
Computer Architecture and Organization: Lecture15: Cache Performance
No ratings yet
Computer Architecture and Organization: Lecture15: Cache Performance
17 pages
Ata Chapters 100 Memorize
No ratings yet
Ata Chapters 100 Memorize
3 pages
2022-2023 - SEM - 2 - Online B.Sc. CS-Batch 1 - BCS ZC313 - Introduction To Programming - EC-3 - REGULAR - 19-02-2023
No ratings yet
2022-2023 - SEM - 2 - Online B.Sc. CS-Batch 1 - BCS ZC313 - Introduction To Programming - EC-3 - REGULAR - 19-02-2023
11 pages
Assignment 1
No ratings yet
Assignment 1
57 pages
Fundamentals of Computer Systems: Caches
No ratings yet
Fundamentals of Computer Systems: Caches
28 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
N (0:1:40) A 1.2 F 0.1 X A Cos (2 Pi F N) Stem (N, X,'r','filled') Xlabel ('TIME') Ylabel ('AMPLITUDE')
No ratings yet
N (0:1:40) A 1.2 F 0.1 X A Cos (2 Pi F N) Stem (N, X,'r','filled') Xlabel ('TIME') Ylabel ('AMPLITUDE')
7 pages
ALC269
No ratings yet
ALC269
79 pages
CS 1550: Introduction To Operating Systems: Prof. Ahmed Amer
No ratings yet
CS 1550: Introduction To Operating Systems: Prof. Ahmed Amer
33 pages
terraTEM Fichatecnica
No ratings yet
terraTEM Fichatecnica
2 pages
Siemens Multimobil 5c Wiring Diagrams
100% (1)
Siemens Multimobil 5c Wiring Diagrams
17 pages
Garza
No ratings yet
Garza
6 pages
PLC Law 2016 Edition
No ratings yet
PLC Law 2016 Edition
10 pages
Dharmendra Sigh: Career Objective
No ratings yet
Dharmendra Sigh: Career Objective
2 pages
BH 214
No ratings yet
BH 214
4 pages
Technical Catalogo FORMULA
No ratings yet
Technical Catalogo FORMULA
106 pages
Linux Mint 18.3 Sylvia Guide for Windows Users
No ratings yet
Linux Mint 18.3 Sylvia Guide for Windows Users
34 pages
MC 16 Short - Rev. 2016-07
No ratings yet
MC 16 Short - Rev. 2016-07
34 pages
Log
No ratings yet
Log
20 pages
B156XW02 V0
No ratings yet
B156XW02 V0
36 pages
Exam Application
No ratings yet
Exam Application
1 page

Memory & Cache Fundamentals

Uploaded by

Memory & Cache Fundamentals

Uploaded by

Computer Architecture and Operating

But can only choose two!

 If accessed data is present in upper level

 Hit ratio: hits/accesses

 Capacity: cache too small to hold all data of interest

 Conflict: data of interest maps to a location in cache

 How do we know if the

Index V Tag Data

Index V Tag Data

Index V Tag Data

Index V Tag Data

Index V Tag Data

Index V Tag Data

 When a dirty block is replaced

 Level-2 cache services misses from primary cache

 Main memory services L-2 cache misses

 Some high-end systems include L-3 cache 31

You might also like