Computer Organization and Architecture
Designing for Performance
11th Edition
Chapter 5
Cache Memory
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.1: Cache and Main Memory
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Cache Memory Principles
• Block
– The minimum unit of transfer between cache and main memory
• Frame
– To distinguish between the data transferred and the chunk of
physical memory, the term frame, or block frame, is sometimes used
with reference to caches
• Line
– A portion of cache memory capable of holding one block, so-called
because it is usually drawn as a horizontal object
• Tag
– A portion of a cache line that is used for addressing purposes
• Line size
– The number of data bytes, or block size, contained in a line
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.3: Cache Read Operation
The processor generates
the read address (RA) of
a word to be read. If the
word is contained in the
cache (cache hit), it is
delivered to the
processor.
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.4: Typical Cache Organization
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Table 5.1: Elements of Cache Design
Cache Addresses Write Policy
Logical Write through
Physical Write back
Cache Size Line Size
Mapping Function Number of Caches
Direct Single or two level
Associative Unified or split
Set associative
Replacement Algorithm
Least recently used (LRU)
First in first out (FIFO)
Least frequently used (LFU)
Random
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Figure 5.5: Logical and Physical Caches
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Cache Memory
• Locality of Reference
– The references to memory at any given time interval tend to be confined
within a localized areas
– Temporal Locality -- The information which will be used in near future is
likely to be in use already (e.g. Reuse of information in loops)
– Spatial Locality -- If a word is accessed, adjacent (near) words are likely
accessed soon (e.g. Related data items (arrays) are usually stored
together; Instructions are executed sequentially)
• Cache
– The property of Locality of Reference makes the Cache memory systems
work
– Cache is a fast small capacity memory that should hold those information
which are most likely to be accessed
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Performance of Cache
• Memory Access
– All the memory accesses are directed first to Cache
– If the word is in Cache; Access cache to provide it to CPU
– If the word is not in Cache; Bring a block including that
word to replace a block now in Cache
• Main issues
– How can we know if the word that is required is there ?
– If a new block is to replace one of the old blocks, which
one should we choose ?
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Performance of Cache
10
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Table 5.3: Cache Access Methods
Mapping of Main
Memory Access using Main
Method Organization Blocks to Cache Memory Address
Direct Sequence of m Each block of main Line portion of address used
Mapping lines memory maps to one to access cache line; Tag
unique line of cache. portion used to check for hit
on that line.
Associative Sequence of m Each block of main Tag portion of address used
Mapping lines memory can map to to check every line for hit on
any line of cache. that line.
Set- Sequence of m Each block of main Line portion of address used to
Associative lines organized as memory maps to one access cache set; Tag portion
Mapping v sets of k lines unique cache set. used to check every line in that
each (m = v × k) set for hit on that line.
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Memory and Cache Mapping
• Mapping Function : Specification of correspondence between
main memory blocks and cache blocks
–Associative mapping
–Direct mapping
–Set-associative mapping
• To help discuss the three mapping techniques, we will use
the example shown below:
12
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Memory and Cache Mapping:
Associative Mapping
•Associative Mapping
–Any block location in Cache can store any block in memory
▪→ Most flexible
–Mapping Table is implemented in an associative memory
▪→ Fast, very Expensive
– Mapping Table: Stores both address and the content of the
memory word
13
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Memory and Cache Mapping:
Direct Mapping
• Direct Mapping
– Each memory block has only one place to load in Cache.
– Mapping Table is made of RAM instead of CAM.
– n-bit memory address consists of 2 parts: k bits of Index
field and n-k bits of Tag field.
– n-bit addresses are used to access main memory and k-bit
Index is used to access the Cache.
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Memory and Cache Mapping:
Direct Mapping – contd.
• Addressing relationships main and cache memories.
Octal Octal
Address Address
15
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Memory and Cache Mapping:
Direct Mapping – contd.
•Cache organization
16
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Memory and Cache Mapping:
Direct Mapping – contd.
•Operation
–CPU generates a memory request with (TAG;
INDEX)
–Access cache using INDEX; (tag; data)
▪Compare TAG and tag
–If matches: Hit
▪Provide Cache[INDEX](data) to CPU
–If not match: Miss
▪M[tag ; INDEX] ← Cache[INDEX](data)
▪Cache[INDEX] ← (TAG;M[TAG; INDEX])
▪CPU ← Cache[INDEX](data)
17
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Memory and Cache Mapping:
Direct Mapping – contd.
• Direct mapping with block size of 8 words
– 64 blocks (64 x 8 = 512)
• Each time a miss occurs, an entire block of 8 words must be
transferred from main memory to cache memory.
18
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Memory and Cache Mapping:
Set Associative Mapping
• Set Associative Mapping:
– Each memory block has a set of locations in the cache to load
• Set Associative Mapping cache with set size of two
Two-way set-associative mapping cache.
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Memory and Cache Mapping:
Set Associative Mapping
•Operation
– CPU generates a request memory address (TAG; INDEX)
– Access cache with INDEX, (Cache word = (tag 0, data 0);
(tag 1, data 1))
– Compare TAG and tag 0 and then tag 1
– If tag i = TAG: Hit
▪ CPU ← data i
– If tag i ≠ TAG: Miss,
▪//Replace either (tag 0, data 0) or (tag 1, data 1),
▪//Assume (tag 0, data 0) is selected for replacement.
▪M [tag 0, INDEX] ← Cache [INDEX] (data 0)
▪Cache [INDEX] (tag 0, data 0) ← (TAG, M [TAG,INDEX]),
▪CPU ← Cache [INDEX] (data 0)
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Block Replacement Policy
•Many different block replacement policies are
available:
– Random
▪Chooses one tag-data item for replacement at random
– FIFO (First In First Out)
▪Replaces the item that has been in the set the longest
– LRU (Least Recently Used)
▪Replaces the item that has been least recently used by
the CPU.
21
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Block Replacement Policy (LRU) algorithm
• The most easy policy to implement: LRU (Least Recently Used)
• Implementation of LRU in the Set Associative Mapping with set size=2
– Cache word = (tag 0, data 0, U0);(tag 1, data 1, U1), Ui = 0 or 1 (binary)
• Modifications
– Initially all U0 = U1 = 1
– When hit to (tag 0, data 0, U0), U1 ← 1(least recently used)
– When hit to (tag 1, data 1, U1), U0 ← 1(least recently used))
– When miss, find the least recently used one (Ui=1)
▪ If U0 = 1 and U1 = 0, then replace (tag 0, data 0) :
– M[tag 0, INDEX] ← Cache[INDEX](data 0)
– Cache[INDEX](tag 0, data 0, U0) ← (TAG,M[TAG,INDEX], 0); U1 ← 1
▪ If U0 = 0 and U1 = 1, then replace (tag 1, data 1) :
– Similar to above; U0 ← 1
▪ If U0 = U1 = 0, this condition does not exist
▪ If U0 = U1 = 1, both of them are candidates :
–Take random selection
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
The most common replacement algorithms
are (more explanation):
• Least recently used (LRU)
– Most effective
– Replace that block in the set that has been in the cache longest with
no reference to it
– Because of its simplicity of implementation, LRU is the most popular
replacement algorithm
• First-in-first-out (FIFO)
– Replace that block in the set that has been in the cache longest
– Easily implemented as a round-robin or circular buffer technique
• Least frequently used (LFU)
– Replace that block in the set that has experienced the fewest
references
– Could be implemented by associating a counter with each line
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Cache Write
• Write Through
– When writing into memory:
▪ If hit, both cache and memory is written in parallel
▪ If miss, memory is written
(For read miss, missing block may be overloaded onto a cache block)
– (+) Memory is always updated
▪ -> Important when CPU and DMA I/O are both executing
– (-) Slow due to the memory access time
• Write-Back (Copy-Back)
– When writing into memory:
▪ If hit, only cache is written
▪ If miss, missing block is brought to cache and write into cache
(For a read miss, candidate block must be written back to the memory)
– (-) Memory is not up-to-date, i.e., the same item in cache and memory
24
may have different value.
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Cache Timing Model
• Direct-mapped cache access
– The first operation is checking the Tag field of an address against the tag
value in the line designated by the Line field
– If there is not a match (miss), the operation is complete
– If there is a match (hit), the cache hardware reads the data block from the
line in the cache and then fetches the byte or word indicated by the Offset
field of the address
– An advantage is that it allows simple and fast speculation
• Fully associative cache
– The line number is not known until the tag comparison is competed
– The hit time is the same as for direct-mapped
– Because this is a content-addressable memory, the miss time is simply the
tag comparison time
• Set associative
– It is not possible to transmit bytes and compare tags in parallel as can be
done with direct-mapped with speculative access
– However, the circuitry can be designed so that the data block from each line
in a set can be loaded and then transmitted once the tag check is made
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Table 5.6: Cache Timing Equations
Time for hit Time for miss
Direct-Mapped thit = trl + txb + tct tmiss = trl + tct
Direct-Mapped with
thit = trl + txb tmiss = trl + tct
Speculation
Fully Associative thit = trl + txb + tct tmiss = tct
Set-Associative thit = trl + txb + tct tmiss = trl + tct
Set-Associative with Way thit = trl + txb + (1 –
T= = trl + tct
Prediction Fp) tct
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved
Copyright
This work is protected by United States copyright laws and is provided solely
for the use of instructions in teaching their courses and assessing student
learning. dissemination or sale of any part of this work (including on the
World Wide Web) will destroy the integrity of the work and is not permit-
ted. The work and materials from it should never be made available to
students except by instructors using the accompanying text in their
classes. All recipients of this work are expected to abide by these
restrictions and to honor the intended pedagogical purposes and the needs of
other instructors who rely on these materials.
Copyright © 2019, 2016, 2013 Pearson Education, Inc. All Rights Reserved