0% found this document useful (0 votes)

130 views22 pages

Lecture 16

This document summarizes a lecture on memory hierarchy design. It discusses the motivation for caches due to the speed difference between small fast memories and large slow memories. It covers the principles of locality, levels of memory hierarchy including registers, caches, main memory and disks. It also describes cache organization, hit/miss rates, block replacement policies, and write policies including write-back and write-through caches.

Uploaded by

Alfian Try Putranto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

130 views22 pages

Lecture 16

Uploaded by

Alfian Try Putranto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 22

COMP 206: Computer Architecture and Implementation

Montek Singh
Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5)

(Caches, Main Memory and Virtual Memory)

Outline
Motivation for Caches Principle of locality
Levels of Memory Hierarchy

Cache Organization
Cache Read/Write Policies Block replacement policies Write-back vs. write-through caches Write buffers

Reading: HP3 Sections 5.1-5.2

The Big Picture: Where are We Now?

The Five Classic Components of a Computer

Processor
Input Control Memory Datapath

Output

This lecture (and next few): Memory System

The Motivation for Caches

Motivation Large (cheap) memories (DRAM) are slow Small (costly) memories (SRAM) are fast Make the average access time small service most accesses from a small, fast memory reduce the bandwidth required of the large memory
Memory System

Processor

Cache

DRAM

The Principle of Locality

Frequency of reference

Address Space

The Principle of Locality Program accesses a relatively small portion of the address space at any instant of time Example: 90% of time in 10% of the code Two different types of locality Temporal Locality (locality in time):
if an item is referenced, it will tend to be referenced again soon
Spatial Locality (locality in space):

if an item is referenced, items close by tend to be referenced soon

Levels of the Memory Hierarchy

Capacity Access Time Cost/bit CPU Registers 500 Bytes 0.25 ns ~$.01 Cache 16K-1M Bytes 1 ns ~$.0001 Main Memory 64M-2G Bytes 100ns ~$.0000001

Staging Transfer Unit

Upper Level Faster

Registers

Words
L1, L2, Cache Blocks

programmer/compiler 1-8 bytes

cache controller 8-128 bytes

Memory
Pages Disk Files
user/operator Mbytes OS 4-64K bytes

Disk 100 G Bytes 5 ms 10-5- 10-7 cents

Tape/Network infinite secs. 10-8 cents

Larger Lower Level

Tape/Network

Memory Hierarchy: Principles of Operation

At any given time, data is copied between only 2

adjacent levels

Upper Level (Cache): the one closer to the processor

Smaller, faster, and uses more expensive technology

Lower Level (Memory): the one further away from the

processor

Bigger, slower, and uses less expensive technology

Block The smallest unit of information that can either be present or not present in the two-level hierarchy
To Processor Upper Level (Cache)
Blk X

Lower Level (Memory)

From Processor

Blk Y 7

Memory Hierarchy: Terminology

Hit: data appears in some block in the upper level

(e.g.: Block X in previous slide)

Hit Rate = fraction of memory access found in upper level Hit Time = time to access the upper level

memory access time + Time to determine hit/miss

Miss: data needs to be retrieved from a block in the

lower level (e.g.: Block Y in previous slide)

Miss Rate = 1 - (Hit Rate)

Miss Penalty: includes time to fetch a new block from lower

level

Time to replace a block in the upper level from lower level + Time to deliver the block the processor

Hit Time: significantly less than Miss Penalty

Cache Addressing
Set 0

Set j-1 Block 0 Sector 0 Byte 0 Block k-1 Replacement info Sector m-1 Byte n-1 Valid Tag Dirty Shared

Block/line is unit of allocation Sector/sub-block is unit of transfer and coherence Cache parameters j, k, m, n are integers, and generally

powers of 2

Cache Shapes

Direct-mapped (A = 1, S = 16)

2-way set-associative (A = 2, S = 8)

4-way set-associative (A = 4, S = 4)

8-way set-associative (A = 8, S = 2)

Fully associative (A = 16, S = 1)

Examples of Cache Configurations

# Sets 1 j j j 64 # Blocks k 1 k 4 8 # Sectors m m 1 m 2 # Bytes n n n n 32 Name Fully associative Direct mapped A cache that is not sectored 4-way set-associative cache PowerPC 601

Storage Overhead of Cache

Total number of bits j repl k tag m 3 n 8 Number of data bits j k m n 8 repl k tag 3 k m 1 k m n 8
System IBM 360/85 IBM 3033 Motorola 68030 Intel i486 DEC Alpha AXP 21064 IBM PowerPC 601 # Address bits 24 32 32 32 34 32 (j,k,m,n) (1,16,16,64) (64,16,1,64) (24,4,2,2) (128,4,1,16) (256,1,1,32) (64,8,2,32) Cache size 16 KB 64 KB 256 B 8 KB 8 KB 32 KB Storage overhead 0.85% 5.95% 28.10% 19.90% 9.37% 5.76%

Cache Organization
Direct Mapped Cache Each memory location can only mapped to 1 cache location No need to make any decision :-)
Current item replaces previous item in that cache location

N-way Set Associative Cache Each memory location have a choice of N cache locations Fully Associative Cache Each memory location can be placed in ANY cache location Cache miss in a N-way Set Associative or Fully

Associative Cache

Bring in new block from memory Throw out a cache block to make room for the new block Need to decide which block to throw out!
13

Write Allocate versus Not Allocate

Assume that a write to a memory location causes a

cache miss

Do we read in the block?

Yes: Write Allocate No: Write No-Allocate

Basics of Cache Operation: Overview

READ HIT CPU reads from cache MISS Allocate and load block from MM, then CPU reads from it Write through into MM with or without write allocate

WRITE Write into cache plus write through into MM

WRITE Write into cache only Write allocate with write and set dirty bit (so that back on replacement, block is written back to MM only if modified)

Details of Simple Blocking Cache

HIT READ
CPU reads cache

MISS
CPU detects miss, stalls Cache selects replacement block New block loaded from MM Requested word sent to CPU CPU resumes operation CPU detects miss CPU writes MM (cache also if write allocate) stalls until write completes

Write Through
WRITE
CPU writes cache CPU writes MM and stalls until write completes

HIT READ
CPU reads cache

MISS
CPU detects miss, stalls Cache selects replacement block New block loaded from MM Word sent to CPU CPU resumes operation CPU detects miss, stalls Cache selects replacement block Old block evicted from cache New block loaded from MM (write allocate) CPU resumes operation
16

WRITE

CPU writes cache

Write Back

A-way Set-Associative Cache

A-way set associative: A entries for each cache index A direct-mapped caches operating in parallel
Example: Two-way set associative cache Cache Index selects a set from the cache The two tags in the set are compared in parallel Data is selected based on the tag result
Valid Cache Tag Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 Cache Tag Valid

:
Addr. Tag

Compare

SEL11

Mux

0 SEL0

Compare

OR Hit Cache Block

Fully Associative Cache

Push the set-associative idea to its limit! Forget about the Cache Index Compare the Cache Tags of all cache tag entries in parallel Example: Block Size = 32B, we need N 27-bit comparators
31 Cache Tag (27 bits long) 4 Byte Select Ex: 0x01 Cache Tag X X X X X Valid Bit Cache Data Byte 31 Byte 63 0

: :

Byte 1

Byte 0

Byte 33 Byte 32

:
18

Cache Block Replacement Policies

Random Replacement Hardware randomly selects a cache item and throw it out
Least Recently Used Hardware keeps track of the access history Replace the entry that has not been used for the longest time For 2-way set-associative cache, need one bit for LRU repl. Example of a Simple Pseudo LRU Implementation Assume 64 Fully Associative entries Hardware replacement pointer points to one cache entry Whenever access is made to the entry the pointer points to:
Move the pointer to the next entry
Otherwise: do not move the pointer
Replacement Pointer Entry 0 Entry 1

:
Entry 63
19

Cache Write Policy

Cache read is much easier to handle than cache write Instruction cache is much easier to design than data cache
Cache write How do we keep data in the cache and memory consistent? Two options (decision time again :-) Write Back: write to cache only. Write the cache block to memory when that cache block is being replaced on a cache miss
Need a dirty bit for each cache block Greatly reduce the memory bandwidth requirement Control can be complex
Write Through: write to cache and memory at the same time

What!!! How can this be? Isnt memory too slow for this?
20

Write Buffer for Write Through

Processor Cache DRAM

Write Buffer

Write Buffer: needed between cache and main mem Processor: writes data into the cache and the write buffer Memory controller: write contents of the buffer to memory Write buffer is just a FIFO Typical number of entries: 4 Works fine if store freq. (w.r.t. time) << 1 / DRAM write cycle Memory system designers nightmare Store frequency (w.r.t. time) > 1 / DRAM write cycle Write buffer saturation

Write Buffer Saturation

Processor Cache DRAM Write Buffer

Store frequency (w.r.t. time) > 1 / DRAM write cycle

If this condition exist for a long period of time (CPU cycle time too

quick and/or too many store instructions in a row)

Store buffer will overflow no matter how big you make it CPU Cycle Time << DRAM Write Cycle Time

Solutions for write buffer saturation

Use a write back cache

Install a second level (L2) cache

Processor

Cache

L2 Cache

DRAM
22

Write Buffer

Cache Memory Test Bank - Stallings
100% (87)
Cache Memory Test Bank - Stallings
7 pages
OS Memory Management Guide
No ratings yet
OS Memory Management Guide
7 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
No ratings yet
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
32 pages
5 Memory Hierarchy
No ratings yet
5 Memory Hierarchy
99 pages
Week 12 - Lecture 12 - Memory
No ratings yet
Week 12 - Lecture 12 - Memory
27 pages
Chapter5-The Memory System
No ratings yet
Chapter5-The Memory System
36 pages
Lec2 PDF
No ratings yet
Lec2 PDF
21 pages
Memory Hierarchy
100% (1)
Memory Hierarchy
47 pages
Memory Design
No ratings yet
Memory Design
36 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
Cache Memory & Design Principles
No ratings yet
Cache Memory & Design Principles
47 pages
Chapter 2z
No ratings yet
Chapter 2z
54 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Computer Architecture: Cache Memory
No ratings yet
Computer Architecture: Cache Memory
51 pages
Memory Systems for Engineers
No ratings yet
Memory Systems for Engineers
77 pages
CA09 2024S2 New
No ratings yet
CA09 2024S2 New
29 pages
CompArch 18a Cache-1
No ratings yet
CompArch 18a Cache-1
14 pages
Week 10
No ratings yet
Week 10
59 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
04 Cache Memory Comparc
No ratings yet
04 Cache Memory Comparc
47 pages
03-Chap4-Cache Memory Mapping
No ratings yet
03-Chap4-Cache Memory Mapping
24 pages
Cache
No ratings yet
Cache
36 pages
Memory Hierarchy and Performance
No ratings yet
Memory Hierarchy and Performance
28 pages
Unit 5 Memory System
No ratings yet
Unit 5 Memory System
77 pages
Lecture 8
No ratings yet
Lecture 8
41 pages
Memory Hierarchy and Cache Design
No ratings yet
Memory Hierarchy and Cache Design
53 pages
Chapter 03
No ratings yet
Chapter 03
57 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
Lecture 13 16 Post
No ratings yet
Lecture 13 16 Post
24 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
Chapter 6 Cache Memory
No ratings yet
Chapter 6 Cache Memory
22 pages
Ddca 2024 Lecture24 Memory Hierarchy and Caches Beforelecture
No ratings yet
Ddca 2024 Lecture24 Memory Hierarchy and Caches Beforelecture
304 pages
Computer Memory Systems Guide
No ratings yet
Computer Memory Systems Guide
26 pages
Memory Hierarchy Essentials
No ratings yet
Memory Hierarchy Essentials
60 pages
Unit III Memory Hierarchy
No ratings yet
Unit III Memory Hierarchy
21 pages
Cache Memory
No ratings yet
Cache Memory
89 pages
Memory Hierarchy for CS Students
No ratings yet
Memory Hierarchy for CS Students
29 pages
Advanced Memory Systems Lecture
No ratings yet
Advanced Memory Systems Lecture
88 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
76 pages
Chapter 3 P1
No ratings yet
Chapter 3 P1
57 pages
Cache Memory for CS Students
No ratings yet
Cache Memory for CS Students
45 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
CH04 COA9e
No ratings yet
CH04 COA9e
58 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
Cache&Virtual Memory
No ratings yet
Cache&Virtual Memory
50 pages
Module 4 Memory
100% (1)
Module 4 Memory
48 pages
6.module 2 - Part 2
No ratings yet
6.module 2 - Part 2
39 pages
Memory 2
No ratings yet
Memory 2
31 pages
13 - Large and Fast Exploiting Memory Hierarchy Final
No ratings yet
13 - Large and Fast Exploiting Memory Hierarchy Final
118 pages
Unit3 Coa
No ratings yet
Unit3 Coa
30 pages
Memory Organization Ch41
No ratings yet
Memory Organization Ch41
51 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Chap 4 Cache Memory
No ratings yet
Chap 4 Cache Memory
55 pages
MODULE 4 Memory System
No ratings yet
MODULE 4 Memory System
14 pages
CH06 Memory Organization
No ratings yet
CH06 Memory Organization
85 pages
Week 11
No ratings yet
Week 11
45 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
L-4 (Cache Memory)
No ratings yet
L-4 (Cache Memory)
61 pages
7 TrustedEnvironments
No ratings yet
7 TrustedEnvironments
50 pages
MX Series LU-chip Overview
No ratings yet
MX Series LU-chip Overview
9 pages
@vtucode - In-2022-Scheme-Module-4-3rd semester-CSE
No ratings yet
@vtucode - In-2022-Scheme-Module-4-3rd semester-CSE
35 pages
45DB161E-Adesto Flash Memory
No ratings yet
45DB161E-Adesto Flash Memory
73 pages
Computer Assignment: Qno:1) Write A Short Note On History of Computer From Start To Present?
No ratings yet
Computer Assignment: Qno:1) Write A Short Note On History of Computer From Start To Present?
6 pages
Flash Controller Limitations
No ratings yet
Flash Controller Limitations
5 pages
ALL VLSI BOOK Design-Curriculum-2018
No ratings yet
ALL VLSI BOOK Design-Curriculum-2018
42 pages
William Stallings Computer Organization and Architecture 6 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 6 Edition Computer Evolution and Performance
24 pages
2 CET MCQs Computer Memory
No ratings yet
2 CET MCQs Computer Memory
13 pages
All-Products Esuprt Server Int Esuprt Server Int Poweredge Poweredge-Mx750c Reference-Guide En-Us
No ratings yet
All-Products Esuprt Server Int Esuprt Server Int Poweredge Poweredge-Mx750c Reference-Guide En-Us
16 pages
Computer & Generations & AI
No ratings yet
Computer & Generations & AI
12 pages
Cache Controller
No ratings yet
Cache Controller
6 pages
OS Memory Fragmentation Guide
No ratings yet
OS Memory Fragmentation Guide
6 pages
11th Computer Applications EM 1 Marks Book Back Questions English Medium PDF Download
No ratings yet
11th Computer Applications EM 1 Marks Book Back Questions English Medium PDF Download
6 pages
CS140 Computer Organization: Chapter 6: Memory
No ratings yet
CS140 Computer Organization: Chapter 6: Memory
81 pages
Siemens PLC Series Overview
100% (2)
Siemens PLC Series Overview
23 pages
Quantum With Unity Pro: 141 MMS 425 01, 141 MMS 535 02 SERCOS Multi-Axis Motion Controller User Manual
No ratings yet
Quantum With Unity Pro: 141 MMS 425 01, 141 MMS 535 02 SERCOS Multi-Axis Motion Controller User Manual
36 pages
Experiment 3:: 1) D Flipflop
No ratings yet
Experiment 3:: 1) D Flipflop
10 pages
Ug583 Ultrascale PCB Design
No ratings yet
Ug583 Ultrascale PCB Design
326 pages
DELD - Unit 6 (Introduction To Computer Architecture)
No ratings yet
DELD - Unit 6 (Introduction To Computer Architecture)
72 pages
Chapter 3 - CPU Architecture
100% (1)
Chapter 3 - CPU Architecture
62 pages
Two Marks Questions With Answers Embedded System
100% (1)
Two Marks Questions With Answers Embedded System
4 pages
Intel 8086 Microprocessor Overview
No ratings yet
Intel 8086 Microprocessor Overview
39 pages
Memory Management in Operating System
No ratings yet
Memory Management in Operating System
61 pages
601GNT 622K 641K 641S 610K 610M 641M 641GNT 637GVRB 634GTZR: Kingston
No ratings yet
601GNT 622K 641K 641S 610K 610M 641M 641GNT 637GVRB 634GTZR: Kingston
54 pages
Computer Awareness Arihant MCQ Bitcg
No ratings yet
Computer Awareness Arihant MCQ Bitcg
37 pages
NAND Flash Memory Overview
No ratings yet
NAND Flash Memory Overview
37 pages
Book - DRAM Circuit Design A Tutorial by Brent Keeth R Jacob Baker (Z-Lib
No ratings yet
Book - DRAM Circuit Design A Tutorial by Brent Keeth R Jacob Baker (Z-Lib
213 pages

Lecture 16

Uploaded by

Lecture 16

Uploaded by

COMP 206: Computer Architecture and Implementation

(Caches, Main Memory and Virtual Memory)

Reading: HP3 Sections 5.1-5.2

The Big Picture: Where are We Now?

This lecture (and next few): Memory System

The Motivation for Caches

The Principle of Locality

if an item is referenced, items close by tend to be referenced soon

Levels of the Memory Hierarchy

Staging Transfer Unit

Upper Level Faster

programmer/compiler 1-8 bytes

cache controller 8-128 bytes

Disk 100 G Bytes 5 ms 10-5- 10-7 cents

Larger Lower Level

Memory Hierarchy: Principles of Operation

Upper Level (Cache): the one closer to the processor

Smaller, faster, and uses more expensive technology

Bigger, slower, and uses less expensive technology

Lower Level (Memory)

Memory Hierarchy: Terminology

(e.g.: Block X in previous slide)

memory access time + Time to determine hit/miss

Miss: data needs to be retrieved from a block in the

lower level (e.g.: Block Y in previous slide)

Miss Penalty: includes time to fetch a new block from lower

Hit Time: significantly less than Miss Penalty

Fully associative (A = 16, S = 1)

Examples of Cache Configurations

Storage Overhead of Cache

Write Allocate versus Not Allocate

Do we read in the block?

Yes: Write Allocate No: Write No-Allocate

Basics of Cache Operation: Overview

WRITE Write into cache plus write through into MM

Details of Simple Blocking Cache

CPU writes cache

A-way Set-Associative Cache

OR Hit Cache Block

Fully Associative Cache

Cache Block Replacement Policies

Cache Write Policy

Write Buffer for Write Through

Write Buffer Saturation

Store frequency (w.r.t. time) > 1 / DRAM write cycle

quick and/or too many store instructions in a row)

Solutions for write buffer saturation

Install a second level (L2) cache

You might also like