0% found this document useful (0 votes)

87 views6 pages

Cache Performance Optimization Guide

The document discusses techniques for measuring and improving cache performance. It defines average access time as the hit time plus the miss rate multiplied by the miss penalty. It describes ways to reduce the miss rate such as increasing the block size, cache size, and associativity. Prefetching, either in hardware or software, can also reduce miss rates by fetching data before it is needed. Reducing the miss penalty involves techniques like larger L2 caches and write buffers. Reducing the hit time focuses on aspects like avoiding address translation overhead and using simple, small caches.

Uploaded by

Alex Paige

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views6 pages

Cache Performance Optimization Guide

Uploaded by

Alex Paige

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Average access time (AAT)

• Recall:
Memory stall cycles = Number of misses x Miss penalty (in cycles)

• Equivalent measure in units of time:

Memory stall time = Number of misses x Miss penalty (in seconds)

• Total time spent on memory references, including both hits and misses:
Total access time = (Number of references) x (Hit time) + (Number of misses) x (Miss penalty)

Note, in above expression: (1) “Hit time” is the cache access time in seconds, (2) “Miss penalty” is in seconds.

• Average access time (AAT) for a single memory reference:

AAT = Total access time / Number of references
AAT = (Hit time) + (Number of misses / Number of references) x (Miss penalty)
AAT = (Hit time) + (Miss rate) x (Miss penalty)

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-1

Measuring cache performance

m Run a program and collect a trace of
accesses
m Simulate “tag store” part of caches under
consideration
m Measure miss rate
u Can use to estimate average access time
Average access time = Hit time + Miss rate × Miss penalty
block size (bytes)
Miss penalty = Memory access latency +
memory bandwidth (bytes/sec.)

Example Hit time = 1 ns

Miss rate = 0.01
Memory access latency = 100 ns
Memory bandwidth = 8 GB/s ( = 8 B/ns)
Block size = 64 B

64 B
Miss penalty = 100 ns + = 108 ns
8 B/ns

Average access time = 1 + 0.01 × 108 = 2.08 ns

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-2
Improving cache performance
m Reduce miss rate
u Block size, cache size, associativity
u Prefetching: Hardware, Software
u Layout of instructions and data
m Reduce miss penalty
u Write buffers
u L2 caches
u Victim cache
u Subblocking
u Early restart
u Critical word first
m Reduce hit time
u Avoid address translation (TLB accesses in parallel)
u Simple caches, small caches
u Pipeline writes

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-3

Categories of misses (3C’s model)

m Compulsory misses
u To have something in the cache, first it must be fetched
u The initial fetch of anything is a miss
u Also called unique references or first-time references
m Capacity misses
u A miss that occurs due to the limited capacity of the cache
u The block was replaced before it was re-referenced
u Also called dimensional misses
m Conflict misses
u For set-associative or direct-mapped only
u The difference between capacity and conflict misses: in the
latter, the sets have limited capacity, even if the cache does not
u For example...
◊ Suppose a 2-way set-assoc. cache has capacity for 256
blocks
◊ Suppose there are only 4 blocks accessed by a program,
all of which map to the same set
u Also called mapping misses

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-4
Reduce miss rate: Block size
m Increase block size
u Idea: exploit spatial locality
u Problems:
◊ Don’t over do it: cache pollution from useless data
◊ Also increases miss penalty (have to bring more in)

“cache pollution”

Miss rate

block size

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-5

Reduce miss rate: Cache size

m Advantages:
u Larger caches hold more
m Disadvantages:
u Increases hit time: Larger caches are slower to access
u Yields diminishing returns: double size != double performance
u Steals resources from other units (esp. for on-chip caches)

The larger this distance,

tag date (block) the longer it takes to drive
and latch contents of a block
Miss rate store store

“diminishing returns”

=? word select

log(cache size)

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-6
Reduce miss rate: Inc. assoc.
m Increase associativity
u Advantages:
◊ For same total cache size, fully-associative has
lower miss rate than direct-mapped
u Disadvantages:
◊ Increases hit time: slower (searching sets), for
same total cache size
◊ Diminishing returns
l 4-way set-associative is almost equivalent to fully-
associative in many cases

Miss rate
diminishing returns

log(associativity)

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-7

Reduce miss rate: Prefetch

m Idea: get it before you need it
m Prefetching can be implemented in hardware,
software (e.g., compiler), or both

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-8
Hardware prefetching
m General idea
u Autonomous hardware prefetcher sits alongside cache
u Predict which blocks may be accessed in the future
u Prefetch these predicted blocks
m Simplest hardware prefetchers:
stride prefetchers
u +1 prefetch (stride = 1): fetch missing block, and next
sequential block
◊ Works great for streams with high sequential
locality, e.g., instruction caches
◊ Uses unused memory bandwidth between misses
l Can “hurt” if there isn’t enough leftover bandwidth
u +n prefetch (stride = n): observe memory is being
accessed every n blocks, so prefetch block +n:
l example of code that has this behavior: block X b[0] b[1] b[2] b[3]
for (i = 1; i < MAX; i += 8)
X+1 b[4] b[5] b[6] b[7]
a[i] = b[i];
X+2 b[8] b[9] b[10] b[11]
X+3 b[12] b[13] b[14] b[15]

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-9

Compiler directed prefetching

m Need a “nonbinding prefetch” instruction
u Doesn’t cause a page fault
u Doesn’t change processor’s state
u Doesn’t delay processor on a miss
m Compiler estimates which accesses miss
m Inserts prefetch instructions well enough ahead to
prevent the disaster of a cache miss
m Reduces compulsory misses for the original
instructions (the compulsory misses simply move around,
since the prefetch instructions still generate the misses)

for (j = 0; j < 100; j++) for (j = 0; j < 100; j++)

for (i = 0; i < 100; i++) for (i = 0; i < 100; i++) {
x[i][j] = c * x[i][j]; prefetch(x[i+k][j]);
x[i][j] = c * x[i][j];
}
Where k depends on (1) the miss penalty
and (2) the time it takes to execute an
iteration assuming hits
ECE 463/521, Profs Conte/Rotenberg/Sair
Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-10
Compiler directed prefetching
(cont.)
for (j = 0; j < 100; j++)
for (i = 0; i < 100; i++) { Where k depends on (1) the miss penalty
prefetch(x[i+k][j]); and (2) the time it takes to execute an
x[i][j] = c * x[i][j]; iteration assuming hits
}

miss penalty
CPU is currently in iteration i
k=
time for 1 iter. assuming hits
In the example below: k = 11
prefetch
x[i+k][j]

... i i+k ...

execution time miss penalty: time to service a miss

for one iteration
of inner loop,
assuming cache hits

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-11

Potential issues with prefetching

m Cache pollution
u Inaccurate prefetches bring in useless blocks,
displacing useful ones
u Must be careful not to increase miss rate
u Solution: prefetch block into a “stream buffer” or
“candidate cache”, transfer block to main cache only
when the block is actually referenced by the program
m Bandwidth hog
u Inaccurate prefetches waste bandwidth throughout the
memory hierarchy
u Must be careful that prefetch misses (prefetch traffic)
do not delay demand misses (legitimate traffic)
u Solutions:
◊ Strike reasonable balance between prefetch
coverage and prefetch accuracy
◊ Request queues throughout memory hierarchy
should prioritize demand misses over prefetch
misses

ECE 463/521, Profs Conte/Rotenberg/Sair

Conte/Rotenberg/Sair,, Dept. of ECE, NC State University CACHE4-12

Top 100 MCQS of Computer Science
No ratings yet
Top 100 MCQS of Computer Science
98 pages
Cache 2
No ratings yet
Cache 2
37 pages
Cache 2 Output
No ratings yet
Cache 2 Output
37 pages
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
No ratings yet
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
7 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
Week12 Updated
No ratings yet
Week12 Updated
60 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
24-Cache Memory Mapping Techniques-14!03!2024
No ratings yet
24-Cache Memory Mapping Techniques-14!03!2024
36 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
30 pages
4 Caches With Notes
No ratings yet
4 Caches With Notes
121 pages
Chapter # 05
No ratings yet
Chapter # 05
42 pages
Lect 12 Memory
No ratings yet
Lect 12 Memory
42 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
23 pages
EGC121lect19 Cache Prefetching
No ratings yet
EGC121lect19 Cache Prefetching
22 pages
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
CA11 2023S1 New
No ratings yet
CA11 2023S1 New
26 pages
53-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
No ratings yet
53-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
38 pages
Lecture 7
No ratings yet
Lecture 7
21 pages
Cache Performance
No ratings yet
Cache Performance
44 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
Week 13 - Lecture 13 - Memory (Cont)
No ratings yet
Week 13 - Lecture 13 - Memory (Cont)
31 pages
Cache Presentation
No ratings yet
Cache Presentation
45 pages
Cache PPT
No ratings yet
Cache PPT
38 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
Cache Memory Parameters Explained
No ratings yet
Cache Memory Parameters Explained
18 pages
10 Caches
No ratings yet
10 Caches
34 pages
Caches and Memory
No ratings yet
Caches and Memory
65 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Memory Hierarchies: Forecast - Memory (B5) - Motivation For Memory Hierarchy - Cache - Ecc - Virtual Memory
No ratings yet
Memory Hierarchies: Forecast - Memory (B5) - Motivation For Memory Hierarchy - Cache - Ecc - Virtual Memory
19 pages
KNX Driver Manual for FieldServer
No ratings yet
KNX Driver Manual for FieldServer
15 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
ImageRUNNERFirmwareChart CANON
No ratings yet
ImageRUNNERFirmwareChart CANON
3 pages
Lec 34
No ratings yet
Lec 34
26 pages
Improving Cache Performance Reducing Misses
No ratings yet
Improving Cache Performance Reducing Misses
9 pages
Cache
No ratings yet
Cache
34 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
Cache Performance Improving Cache Performance
No ratings yet
Cache Performance Improving Cache Performance
6 pages
HP GC Cabling Guide
No ratings yet
HP GC Cabling Guide
21 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
4 pages
Lab 8
No ratings yet
Lab 8
10 pages
Lectures wk11
No ratings yet
Lectures wk11
21 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
12 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Computer Architecture
No ratings yet
Computer Architecture
5 pages
Microprocessor System Design: Error Correcting Codes Principle of Locality Cache Architecture
No ratings yet
Microprocessor System Design: Error Correcting Codes Principle of Locality Cache Architecture
28 pages
Guide Book Videotron PDF
No ratings yet
Guide Book Videotron PDF
34 pages
Memory Hierarchy Design Guide
No ratings yet
Memory Hierarchy Design Guide
54 pages
MSR User Usermanual
No ratings yet
MSR User Usermanual
8 pages
Cache Impact On Performance: An Example: Assuming The Following Execution and Cache Parameters
No ratings yet
Cache Impact On Performance: An Example: Assuming The Following Execution and Cache Parameters
32 pages
10 Caches Detail
No ratings yet
10 Caches Detail
45 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
No ratings yet
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
23 pages
Cisco AP Recovery Guide
No ratings yet
Cisco AP Recovery Guide
2 pages
Lecture 16: Cache Memories - Last Time - Today
No ratings yet
Lecture 16: Cache Memories - Last Time - Today
32 pages
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
No ratings yet
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
20 pages
Lecture 5 Cache Optimization
No ratings yet
Lecture 5 Cache Optimization
25 pages
ECE4680 Computer Organization and Architecture Memory Hierarchy: Cache System
No ratings yet
ECE4680 Computer Organization and Architecture Memory Hierarchy: Cache System
25 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
Green Computing: B.Sanjeeviraman Ii Mca
No ratings yet
Green Computing: B.Sanjeeviraman Ii Mca
11 pages
Position Applied: IT Persional Data
No ratings yet
Position Applied: IT Persional Data
3 pages
Multiprocessor Configuration Overview
No ratings yet
Multiprocessor Configuration Overview
3 pages
Lect12 Cache
No ratings yet
Lect12 Cache
39 pages
S6 AU Advanced Cad Lab Edited
No ratings yet
S6 AU Advanced Cad Lab Edited
18 pages
8255 PIO Programming Guide
No ratings yet
8255 PIO Programming Guide
6 pages
OSCA
No ratings yet
OSCA
7 pages
5.2 Eleven Advanced Optimizations of Cache Performance
No ratings yet
5.2 Eleven Advanced Optimizations of Cache Performance
13 pages
HP Thunderbolt Dock Specifications
No ratings yet
HP Thunderbolt Dock Specifications
20 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
Make Your Own Interactive Whiteboard Wherever You Are With Sony and Ebeam Edge
No ratings yet
Make Your Own Interactive Whiteboard Wherever You Are With Sony and Ebeam Edge
2 pages
Primary Function of A Power Supply: Component
No ratings yet
Primary Function of A Power Supply: Component
3 pages
Grass Valley Encore Control System
No ratings yet
Grass Valley Encore Control System
354 pages
How Caching Works: Computer
No ratings yet
How Caching Works: Computer
5 pages
Performing Open Heart Surgery On A Furby Recon 2014
No ratings yet
Performing Open Heart Surgery On A Furby Recon 2014
61 pages
Gilas PC Maintenance
No ratings yet
Gilas PC Maintenance
97 pages
Week 2 - COMPUTER PROGRAMMING OVERVIEW
No ratings yet
Week 2 - COMPUTER PROGRAMMING OVERVIEW
11 pages
IBM System x Rack Server Guide
No ratings yet
IBM System x Rack Server Guide
1 page
Asus S500te - Tech Specs
No ratings yet
Asus S500te - Tech Specs
7 pages
Application of Computer System To Law
No ratings yet
Application of Computer System To Law
5 pages
Chapter 20: Database System Architectures
No ratings yet
Chapter 20: Database System Architectures
38 pages
PCworth Product Pricelist
No ratings yet
PCworth Product Pricelist
9 pages
Sign To Letter Translator System Using A Hand Glove: April 2014
No ratings yet
Sign To Letter Translator System Using A Hand Glove: April 2014
6 pages
PIC18F4550 C Programming Guide
No ratings yet
PIC18F4550 C Programming Guide
6 pages
Spesifikasi ACER Travelmate P2-P245
No ratings yet
Spesifikasi ACER Travelmate P2-P245
12 pages
Lenovo Edge 15 80H1
No ratings yet
Lenovo Edge 15 80H1
40 pages

Cache Performance Optimization Guide

Uploaded by

Cache Performance Optimization Guide

Uploaded by

Average access time (AAT)

• Equivalent measure in units of time:

• Average access time (AAT) for a single memory reference:

ECE 463/521, Profs Conte/Rotenberg/Sair

Measuring cache performance

Example Hit time = 1 ns

Average access time = 1 + 0.01 × 108 = 2.08 ns

ECE 463/521, Profs Conte/Rotenberg/Sair

ECE 463/521, Profs Conte/Rotenberg/Sair

Categories of misses (3C’s model)

ECE 463/521, Profs Conte/Rotenberg/Sair

ECE 463/521, Profs Conte/Rotenberg/Sair

Reduce miss rate: Cache size

The larger this distance,

ECE 463/521, Profs Conte/Rotenberg/Sair

ECE 463/521, Profs Conte/Rotenberg/Sair

Reduce miss rate: Prefetch

ECE 463/521, Profs Conte/Rotenberg/Sair

ECE 463/521, Profs Conte/Rotenberg/Sair

Compiler directed prefetching

for (j = 0; j < 100; j++) for (j = 0; j < 100; j++)

... i i+k ...

execution time miss penalty: time to service a miss

ECE 463/521, Profs Conte/Rotenberg/Sair

Potential issues with prefetching

ECE 463/521, Profs Conte/Rotenberg/Sair

You might also like