0% found this document useful (0 votes)

87 views6 pages

Cache Performance Optimization Guide

The document discusses techniques for measuring and improving cache performance. It defines average access time as the hit time plus the miss rate multiplied by the miss penalty. It describes ways to reduce the miss rate such as increasing the block size, cache size, and associativity. Prefetching, either in hardware or software, can also reduce miss rates by fetching data before it is needed. Reducing the miss penalty involves techniques like larger L2 caches and write buffers. Reducing the hit time focuses on aspects like avoiding address translation overhead and using simple, small caches.

Uploaded by

Alex Paige

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views6 pages

Cache Performance Optimization Guide

Uploaded by

Alex Paige

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Average access time (AAT)

• Recall:
Memory stall cycles = Number of misses x Miss penalty (in cycles)

• Equivalent measure in units of time:

Memory stall time = Number of misses x Miss penalty (in seconds)

• Total time spent on memory references, including both hits and misses:
Total access time = (Number of references) x (Hit time) + (Number of misses) x (Miss penalty)

Note, in above expression: (1) “Hit time” is the cache access time in seconds, (2) “Miss penalty” is in seconds.

• Average access time (AAT) for a single memory reference:

AAT = Total access time / Number of references
AAT = (Hit time) + (Number of misses / Number of references) x (Miss penalty)
AAT = (Hit time) + (Miss rate) x (Miss penalty)

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-1

Measuring cache performance

m Run a program and collect a trace of
accesses
m Simulate “tag store” part of caches under
consideration
m Measure miss rate
u Can use to estimate average access time
Average access time = Hit time + Miss rate × Miss penalty
block size (bytes)
Miss penalty = Memory access latency +
memory bandwidth (bytes/sec.)

Example Hit time = 1 ns

Miss rate = 0.01
Memory access latency = 100 ns
Memory bandwidth = 8 GB/s ( = 8 B/ns)
Block size = 64 B

64 B
Miss penalty = 100 ns + = 108 ns
8 B/ns

Average access time = 1 + 0.01 × 108 = 2.08 ns

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-2
Improving cache performance
m Reduce miss rate
u Block size, cache size, associativity
u Prefetching: Hardware, Software
u Layout of instructions and data
m Reduce miss penalty
u Write buffers
u L2 caches
u Victim cache
u Subblocking
u Early restart
u Critical word first
m Reduce hit time
u Avoid address translation (TLB accesses in parallel)
u Simple caches, small caches
u Pipeline writes

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-3

Categories of misses (3C’s model)

m Compulsory misses
u To have something in the cache, first it must be fetched
u The initial fetch of anything is a miss
u Also called unique references or first-time references
m Capacity misses
u A miss that occurs due to the limited capacity of the cache
u The block was replaced before it was re-referenced
u Also called dimensional misses
m Conflict misses
u For set-associative or direct-mapped only
u The difference between capacity and conflict misses: in the
latter, the sets have limited capacity, even if the cache does not
u For example...
◊ Suppose a 2-way set-assoc. cache has capacity for 256
blocks
◊ Suppose there are only 4 blocks accessed by a program,
all of which map to the same set
u Also called mapping misses

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-4
Reduce miss rate: Block size
m Increase block size
u Idea: exploit spatial locality
u Problems:
◊ Don’t over do it: cache pollution from useless data
◊ Also increases miss penalty (have to bring more in)

“cache pollution”

Miss rate

block size

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-5

Reduce miss rate: Cache size

m Advantages:
u Larger caches hold more
m Disadvantages:
u Increases hit time: Larger caches are slower to access
u Yields diminishing returns: double size != double performance
u Steals resources from other units (esp. for on-chip caches)

The larger this distance,

tag date (block) the longer it takes to drive
and latch contents of a block
Miss rate store store

“diminishing returns”

=? word select

log(cache size)

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-6
Reduce miss rate: Inc. assoc.
m Increase associativity
u Advantages:
◊ For same total cache size, fully-associative has
lower miss rate than direct-mapped
u Disadvantages:
◊ Increases hit time: slower (searching sets), for
same total cache size
◊ Diminishing returns
l 4-way set-associative is almost equivalent to fully-
associative in many cases

Miss rate
diminishing returns

log(associativity)

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-7

Reduce miss rate: Prefetch

m Idea: get it before you need it
m Prefetching can be implemented in hardware,
software (e.g., compiler), or both

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-8
Hardware prefetching
m General idea
u Autonomous hardware prefetcher sits alongside cache
u Predict which blocks may be accessed in the future
u Prefetch these predicted blocks
m Simplest hardware prefetchers:
stride prefetchers
u +1 prefetch (stride = 1): fetch missing block, and next
sequential block
◊ Works great for streams with high sequential
locality, e.g., instruction caches
◊ Uses unused memory bandwidth between misses
l Can “hurt” if there isn’t enough leftover bandwidth
u +n prefetch (stride = n): observe memory is being
accessed every n blocks, so prefetch block +n:
l example of code that has this behavior: block X b[0] b[1] b[2] b[3]
for (i = 1; i < MAX; i += 8)
X+1 b[4] b[5] b[6] b[7]
a[i] = b[i];
X+2 b[8] b[9] b[10] b[11]
X+3 b[12] b[13] b[14] b[15]

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-9

Compiler directed prefetching

m Need a “nonbinding prefetch” instruction
u Doesn’t cause a page fault
u Doesn’t change processor’s state
u Doesn’t delay processor on a miss
m Compiler estimates which accesses miss
m Inserts prefetch instructions well enough ahead to
prevent the disaster of a cache miss
m Reduces compulsory misses for the original
instructions (the compulsory misses simply move around,
since the prefetch instructions still generate the misses)

for (j = 0; j < 100; j++) for (j = 0; j < 100; j++)

for (i = 0; i < 100; i++) for (i = 0; i < 100; i++) {
x[i][j] = c * x[i][j]; prefetch(x[i+k][j]);
x[i][j] = c * x[i][j];
}
Where k depends on (1) the miss penalty
and (2) the time it takes to execute an
iteration assuming hits
ECE 463/521, Profs Conte/Rotenberg/Sair
Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-10
Compiler directed prefetching
(cont.)
for (j = 0; j < 100; j++)
for (i = 0; i < 100; i++) { Where k depends on (1) the miss penalty
prefetch(x[i+k][j]); and (2) the time it takes to execute an
x[i][j] = c * x[i][j]; iteration assuming hits
}

miss penalty
CPU is currently in iteration i
k=
time for 1 iter. assuming hits
In the example below: k = 11
prefetch
x[i+k][j]

... i i+k ...

execution time miss penalty: time to service a miss

for one iteration
of inner loop,
assuming cache hits

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-11

Potential issues with prefetching

m Cache pollution
u Inaccurate prefetches bring in useless blocks,
displacing useful ones
u Must be careful not to increase miss rate
u Solution: prefetch block into a “stream buffer” or
“candidate cache”, transfer block to main cache only
when the block is actually referenced by the program
m Bandwidth hog
u Inaccurate prefetches waste bandwidth throughout the
memory hierarchy
u Must be careful that prefetch misses (prefetch traffic)
do not delay demand misses (legitimate traffic)
u Solutions:
◊ Strike reasonable balance between prefetch
coverage and prefetch accuracy
◊ Request queues throughout memory hierarchy
should prioritize demand misses over prefetch
misses

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-12

Cache 2
No ratings yet
Cache 2
37 pages
Cache 2 Output
No ratings yet
Cache 2 Output
37 pages
24-Cache Memory Mapping Techniques-14!03!2024
No ratings yet
24-Cache Memory Mapping Techniques-14!03!2024
36 pages
Week12 Updated
No ratings yet
Week12 Updated
60 pages
Chapter # 05
No ratings yet
Chapter # 05
42 pages
4 Caches With Notes
No ratings yet
4 Caches With Notes
121 pages
Lect 12 Memory
No ratings yet
Lect 12 Memory
42 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
23 pages
Cache Performance
No ratings yet
Cache Performance
44 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
MCQ Technical Analsis
50% (4)
MCQ Technical Analsis
73 pages
EGC121lect19 Cache Prefetching
No ratings yet
EGC121lect19 Cache Prefetching
22 pages
Cache Memory Parameters Explained
No ratings yet
Cache Memory Parameters Explained
18 pages
CA11 2023S1 New
No ratings yet
CA11 2023S1 New
26 pages
53-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
No ratings yet
53-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
38 pages
Lecture 7
No ratings yet
Lecture 7
21 pages
MSC Solid State Physics Lecture#3
No ratings yet
MSC Solid State Physics Lecture#3
17 pages
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
Walmart Factory List
100% (2)
Walmart Factory List
5 pages
Week 13 - Lecture 13 - Memory (Cont)
No ratings yet
Week 13 - Lecture 13 - Memory (Cont)
31 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
Memory Hierarchies: Forecast - Memory (B5) - Motivation For Memory Hierarchy - Cache - Ecc - Virtual Memory
No ratings yet
Memory Hierarchies: Forecast - Memory (B5) - Motivation For Memory Hierarchy - Cache - Ecc - Virtual Memory
19 pages
Cache Performance Improving Cache Performance
No ratings yet
Cache Performance Improving Cache Performance
6 pages
Lec 34
No ratings yet
Lec 34
26 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
4 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
10 Caches
No ratings yet
10 Caches
34 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
Cache Presentation
No ratings yet
Cache Presentation
45 pages
Cache PPT
No ratings yet
Cache PPT
38 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
12 pages
Caches and Memory
No ratings yet
Caches and Memory
65 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
Lectures wk11
No ratings yet
Lectures wk11
21 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
Microprocessor System Design: Error Correcting Codes Principle of Locality Cache Architecture
No ratings yet
Microprocessor System Design: Error Correcting Codes Principle of Locality Cache Architecture
28 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
Media Ownership in India
No ratings yet
Media Ownership in India
11 pages
Improving Cache Performance Reducing Misses
No ratings yet
Improving Cache Performance Reducing Misses
9 pages
Computer Architecture
No ratings yet
Computer Architecture
5 pages
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
No ratings yet
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
23 pages
Cache Impact On Performance: An Example: Assuming The Following Execution and Cache Parameters
No ratings yet
Cache Impact On Performance: An Example: Assuming The Following Execution and Cache Parameters
32 pages
Simple Sabotage Field Manual
50% (2)
Simple Sabotage Field Manual
16 pages
Cache
No ratings yet
Cache
34 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
ECE4680 Computer Organization and Architecture Memory Hierarchy: Cache System
No ratings yet
ECE4680 Computer Organization and Architecture Memory Hierarchy: Cache System
25 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
30 pages
Lect12 Cache
No ratings yet
Lect12 Cache
39 pages
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
No ratings yet
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
20 pages
Lecture 16: Cache Memories - Last Time - Today
No ratings yet
Lecture 16: Cache Memories - Last Time - Today
32 pages
Lab 8
No ratings yet
Lab 8
10 pages
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
No ratings yet
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
7 pages
Unit 2 - Esp in Elt - Complete
No ratings yet
Unit 2 - Esp in Elt - Complete
35 pages
Memory Hierarchy Design Guide
No ratings yet
Memory Hierarchy Design Guide
54 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
VFD Application Checklist
No ratings yet
VFD Application Checklist
3 pages
5.2 Eleven Advanced Optimizations of Cache Performance
No ratings yet
5.2 Eleven Advanced Optimizations of Cache Performance
13 pages
Lecture 5 Cache Optimization
No ratings yet
Lecture 5 Cache Optimization
25 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
10 Caches Detail
No ratings yet
10 Caches Detail
45 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
MCB Types
No ratings yet
MCB Types
3 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
Executive Leadership Profile
No ratings yet
Executive Leadership Profile
2 pages
Distance Learning Courses DLEN
No ratings yet
Distance Learning Courses DLEN
35 pages
Reviewer On Police Photography by Mr. Herbert Tunac, RMT, MSMT
No ratings yet
Reviewer On Police Photography by Mr. Herbert Tunac, RMT, MSMT
4 pages
359 - EC8651 Transmission Lines and RF Systems - Anna University 2017 Regulation Syllabus
No ratings yet
359 - EC8651 Transmission Lines and RF Systems - Anna University 2017 Regulation Syllabus
2 pages
Object Oriented Programming in Java
No ratings yet
Object Oriented Programming in Java
5 pages
Goldfrank's Toxicologic Emergencies, 11E (TRUE PDF) 11th Edition Robert S. Hoffman PDF Download
100% (1)
Goldfrank's Toxicologic Emergencies, 11E (TRUE PDF) 11th Edition Robert S. Hoffman PDF Download
62 pages
18 Amazon Rally-1
No ratings yet
18 Amazon Rally-1
11 pages
M3JP M3KP M3HP M3GP 2020
No ratings yet
M3JP M3KP M3HP M3GP 2020
252 pages
Controledge Hc900 Io Modules Specifications: 51-52-03-41, November 2019
No ratings yet
Controledge Hc900 Io Modules Specifications: 51-52-03-41, November 2019
35 pages
Richland Technologies 5th Anniversary Press Release
No ratings yet
Richland Technologies 5th Anniversary Press Release
2 pages
SkyTrak 10054 Spec Sheet
No ratings yet
SkyTrak 10054 Spec Sheet
2 pages
Project Two
No ratings yet
Project Two
14 pages
Natural Disasters
No ratings yet
Natural Disasters
14 pages
Free Incoming Inspection Template
No ratings yet
Free Incoming Inspection Template
5 pages
Updated RFP Template and Mandatory Provisions For Federal Aid Projects
No ratings yet
Updated RFP Template and Mandatory Provisions For Federal Aid Projects
2 pages
DICA Lab Manual PDF
No ratings yet
DICA Lab Manual PDF
64 pages
551 1R-14 Preview
No ratings yet
551 1R-14 Preview
4 pages
Os Lec 4 Process
No ratings yet
Os Lec 4 Process
7 pages
Sample CV
No ratings yet
Sample CV
6 pages
Parthavi Electricals
No ratings yet
Parthavi Electricals
11 pages
Coanda Effects
No ratings yet
Coanda Effects
33 pages
2nd Exam TQ
No ratings yet
2nd Exam TQ
23 pages

Cache Performance Optimization Guide

Uploaded by

Cache Performance Optimization Guide

Uploaded by

Average access time (AAT)

• Equivalent measure in units of time:

• Average access time (AAT) for a single memory reference:

ECE 463/521, Profs Conte/Rotenberg/Sair

Measuring cache performance

Example Hit time = 1 ns

Average access time = 1 + 0.01 × 108 = 2.08 ns

ECE 463/521, Profs Conte/Rotenberg/Sair

ECE 463/521, Profs Conte/Rotenberg/Sair

Categories of misses (3C’s model)

ECE 463/521, Profs Conte/Rotenberg/Sair

ECE 463/521, Profs Conte/Rotenberg/Sair

Reduce miss rate: Cache size

The larger this distance,

ECE 463/521, Profs Conte/Rotenberg/Sair

ECE 463/521, Profs Conte/Rotenberg/Sair

Reduce miss rate: Prefetch

ECE 463/521, Profs Conte/Rotenberg/Sair

ECE 463/521, Profs Conte/Rotenberg/Sair

Compiler directed prefetching

for (j = 0; j < 100; j++) for (j = 0; j < 100; j++)

... i i+k ...

execution time miss penalty: time to service a miss

ECE 463/521, Profs Conte/Rotenberg/Sair

Potential issues with prefetching

ECE 463/521, Profs Conte/Rotenberg/Sair

You might also like