Cache Memory
Basic Philosophy
Temporal Locality Spatial Locality
Basic Terms
Cache Block Miss/Hit Miss Rate/Hit Rate Miss Penalty Hit Time 3-Cs of caches
Conflict Compulsory Capacity
Direct Mapped Cache
Assume 5-bit address bus and cache with 8 entries
D 4 D3
TAG Processor D2 - D0 Index
Valid
TAG
DATA
Index 000 001 010 011 100 101 110 111
Data Bus
HIT
Direct Mapped Cache
First Load Valid
D 4 D3
TAG
DATA
Index 000 001 010 011 100 101 110 111
TAG
= 01
Processor D2 - D0 = 010
0 0 0 0 0 0 0 0
Data Bus LD R1, (01010) ;remember 5-bit address bus, assume data is 8-bit and AA16 is stored at this location First time, cause a MISS, data loaded from memory and cache HIT bit is set to 1
Direct Mapped Cache
After first load
Valid
D 4 D3
TAG
DATA
Index 000 001 010 011 100 101 110 111
TAG
= 01
Processor D2 - D0 = 010
0 0 1 0 0 0 0 0
01
AA
Data Bus LD R1, (01010) ; AA16 is stored at this location, Cache HIT bit is set to 1
Direct Mapped Cache
Second Load TAG = 11 Processor D2 - D0 = 010
Valid
D 4 D3
TAG
DATA
Index 000 001 010 011 100 101 110 111
0 0 1 0 0 0 0 0
01
AA
Data Bus LD R1, (11010) ; assume 99 at address 11010 Same index but different TAG will cause a MISS, data loaded from memory
Direct Mapped Cache
After Second Load
Valid
D 4 D3
TAG
DATA
Index 000 001 010 011 100 101 110 111
TAG
= 11
Processor D2 - D0 = 010
0 0 1 0 0 0 0 0
11
99
Data Bus LD R1, (11010) ;remember 5-bit address bus, assume 99 First time, same index but different TAG will cause a MISS, data loaded from memory
Cache Size Example Direct Mapped
32K X 48-bit Memory Processor Address Bus (32-bit)
Valid
TAG (15 bit)
DATA (32 bit)
1 1111 1111 1111 1111 1111 1111
0 0
Address Bus (A17 A2)
0 0 0
32 K Entries
0
0
0 0000 0000 0000 0000 0000 0000
A31- A2=18
Processor Address bus = 32 bit (A) Number of blocks in cache (entries) = 32K Tag Size = A- N- 2 = 32 15 2 (Byte offset) = 15
(15-bit)
Cache Storage = 128KB = 32 K Words (2N) with N = 15
Data Out
Cache Size = 128KB (data) + 32K X 15-bit (tag) + 32K X 1-bit (Hit bit) = 192KB
Cache Size Example (1) Two-Way Set Associative
Assume same processor (A = 32, D= 32) Assume same total storage of data = 128KB Two sets means we will have two direct mapped caches with 64KB (128/2) each. 64KB = 16K words To address 16K X 32-bit memory we need 14-bit index. Hence Tag Size = 32-14-2 = 16
Cache Size Example (1) Two-Way Set Associative
Valid
SET 1
16K X 49-bit Memories
(1 bit)
TAG (16 bit)
DATA (32 bit)
1111 1111 1111 1111 1111 1111
Valid
SET 2
(1 bit)
TAG (16 bit)
DATA (32 bit)
0 0 0
0 0
16 K Entries
Address Bus
(A16 A2)
0 0 0 0
Address Bus (A16 A2)
0 0 0 0
0000 0000 0000 0000 0000 0000
A31- A17
(16-bit)
(16-bit)
Size = 2 (Sets) X 16K X (32-bit + 16-bit + 1-bit) = 196KB
Data Out 2:1 MUX
Data Out
A31- A17
Cache Size Example (1) 4-Way Set Associative
Assume same processor (A = 32, D= 32) Assume same total storage of data = 128MB Four sets means we will have four direct mapped caches with 32KB (128/4) each. 32KB = 8K words To address 8K X 32-bit memory we need 13-bit address. Hence Tag Size = 32-13-2 = 17
Cache Size Example (1) 4-Way Set Associative
V
SET 1
TAG
8K X 50-bit Memories
17
SET 2
TAG
17
SET 3
TAG
17
SET 4
TAG
17
0 0 0
8M Entries
Address Bus (A15 A2)
0 0 0 0 0 0
8M Entries
Address Bus (A15 A2)
0 0 0 0 0 0
8M Entries
Address Bus (A15 A2)
0 0 0 0 0 0
Address Bus (A15 A2)
0 0 0 0
A31- A16
Data Out
Data Out
(17-bit)
(17-bit)
(17-bit)
(17-bit)
4:1 MUX
Size = 4 (Sets) X 8K X (32-bit + 17-bit + 1-bit) = 200KB
Data Out to processor
Data Out
A31- A16
A31- A16
A31- A16
Alpha 21264 Processor44-Bit Virtual Address
Organization of the data cache Alpha 21264
Byte Offset (A5 A0)
Valid
SET 1
512 Entries Cache (2 Sets) (Block Size = 64 bit)
DATA (64 bit)
(1 bit)
TAG (29 bit)
Valid
SET 2
(1 bit)
TAG (29 bit)
DATA (64 bit)
0 0
Index 512 entries Address Bus (A14 A6)
0 0
Index 512 entries Address Bus
(A14 A6)
0 0 0 0 0
0 0 0 0 0
A44- A15 (29-bit Tag)
(29-bit)
(29-bit)
Size = 2 (Sets) X 16K X (32-bit + 16-bit + 1-bit) = 196KB
Data Out 2:1 MUX
Data Out
A44- A15 (29-bit Tag)
Four Memory Hierarchy Questions
Where can a block be placed
Direct Mapped to Fully Associative
How a block is found
Tag Comparison
Which block should be replaced on a cache miss (only for sets)
LRU, Random, FIFO
4 Qs (Contd..)
What Happens on a Write?
Write Back Main Memory only updated when data is replaced from cache Write Through The information is updated in upper as well as lower level.
Write Allocate: Allocate data in cache on write Write No-Allocate: Only write to next level.
Classifying Misses: 3 Cs
3 Cs of Caches
Compulsory The first access to a block is not in the cache, so the block must be brought into the cache. Also called cold start misses or first reference misses. (Misses in even an Infinite Cache) Capacity If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved. (Misses in Fully Associative Size Cache) Conflict If block-placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory & capacity misses) will occur because a block can be discarded and later retrieved if too many blocks map to its set. Also called collision misses or interference misses.