0% found this document useful (0 votes)

17 views72 pages

Unit 4

The document discusses memory technology, including types of memory (SRAM, DRAM, Flash, Magnetic Disk) and their access times and costs. It explains the concept of memory hierarchy, the principle of locality, and cache memory organization, including direct-mapped and associative caches. Additionally, it covers cache writing strategies, replacement policies, sources of cache misses, and the importance of multilevel caches in reducing miss rates.

Uploaded by

akisher987

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views72 pages

Unit 4

Uploaded by

akisher987

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

Unit-4

S Raghavendra Kumar
SSNCE
Memory Technology
Memory Technology Typical Access Time $ per GiB in 2012

SRAM 0.5–2.5 ns $500–$1000

DRAM 50–70 ns $10–$20
Flash 5,000–50,000 ns $0.75–$1.00
Magnetic Disk 5,000,000–20,000,000 ns $0.05–$0.10

Ideal memory
• Access time of SRAM
• Capacity and cost/GB of disk
Strategy: arrange memory in a hierarchy
• Smaller and faster memory for data currently being accessed
• Larger and slower memory for data not currently being accessed

2 v 1.2
Memory Hierarchy

Memory hierarchy
▪ Store everything on flash/disk
▪ Copy recently accessed (and nearby) items from disk
to smaller DRAM memory
▪ Main memory
▪ Copy more recently accessed (and nearby) items from
DRAM to smaller SRAM memory
▪ Cache memory attached to CPU
▪ Recently is a good predictor of Currently because of
the principle of locality.

3 v 1.2
Principle of Locality
Programs access a small proportion of their
address space at any time
• Temporal locality (locality in time) Items
accessed recently are likely to be accessed again
soon e.g., instructions in a loop, induction variables \
– Keep most recently accessed items into the cache
• Spatial locality (locality in space) Items near
those accessed recently are likely to be accessed
soon E.g., sequential instruction access, array data
– Move blocks consisting of contiguous words closer to
the processor
4 v 1.2
Memory Hierarchy Levels
• Block (aka cache line): unit of copying May be
multiple words
• If accessed data is present in upper level
– Hit: access satisfied by upper level
• Hit ratio: hits/accesses
– Hit time: Time to access the block + Time
to determine hit/miss
• If accessed data is absent
– Miss: data not in upper level
• Miss ratio: misses/accesses = 1 – hit ratio
– Miss penalty: Time to access the block in
the lower level + Time to transmit that
block to the level that experienced the miss
+ Time to insert the block in that level +
Time to pass the block to the requestor

5 v 1.2
Average Memory Access Time (AMAT)

Tavg= H* Hit_time+ (1-H)* Miss_time

Suppose CPU refers memory 100 times, the misses are

20% during the access of memory. The hit, memory access
time is 10nsec and miss, memory acess time is 100nsec.
Find AMAT.

6 v 1.2
Cache Memory

• Cache memory The level of the memory hierarchy closest to the

CPU
• Given accesses X1, …, Xn–1, Xn

• How do we know if the

data is present?
• Where do we look?

7 v 1.2
Direct Mapped Cache
• Location determined by address
• Direct mapped: only one choice
– (Block address) modulo (#Blocks in cache)

• #Blocks is a power of 2
• Use low-order address
bits

8 v 1.2
Tags and Valid Bits

• How do we know which particular block is

stored in a cache location?
– Store block address as well as the data
– Actually, only need the high-order bits
– Called the tag
• What if there is no data in a location?
– Valid bit: 1 = valid, 0 = not valid
– Initially 0

9 v 1.2
Cache Example

◼ 8-blocks, 1 word/block, direct mapped

◼ Initial state
Index V Tag Data
0002(010) N
0012(110) N
0102(210) N
0112(310) N
1002(410) N
1012(510) N
1102(610) N
1112(710) N

◼ Access sequence (address in word):

10 22, 26, 22, 26, 16, 3, 16, 18, 16
v 1.2
Cache Example

Word addr Binary addr Hit/miss Cache block

22 10 110 Miss 110

Word addr Binary addr Hit/miss Cache block

26 11 010 Miss 010

Word addr Binary addr Hit/miss Cache block

22 10 110 Hit 110
26 11 010 Hit 010

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 13

Cache Example

Word addr Binary addr Hit/miss Cache block

16 10 000 Miss 000
3 00 011 Miss 011

Word addr Binary addr Hit/miss Cache block

16 10 000 Hit 000
18 10 010 Miss 010

16
Larger Block Size

◼ 64 blocks, 16 bytes/block
◼ To what block number does address 1200 map?
◼ Block address = 1200/16 = 75
◼ Block number = 75 modulo 64 = 11

31 10 9 4 3 0
Tag Index Offset
22 bits 6 bits 4 bits

17 v 1.2
Question 1

◼ 100 blocks, 16 bytes/block

◼ To what block number does address 2000 map?
◼ Find the Tag bit size, Index size, offset size.

18 v 1.2
Question 2

◼ 1024 blocks, 32 bytes/block

◼ Find the Tag bit size, Index size, offset size.

19 v 1.2
Multiword Block Direct Mapped Cache
Caches contain 256 blocks
with 16 words per block

20 v 1.2
Directed Mapped Cache Address Bits

• Cache with 2n blocks, 2m bytes/block

31 m+n m+n-1 m m-1 2 1 0
Block Byte
Tag Index
Offset offset
32-m-n bits n bits m-2 bits 2 bits
Size of a directed mapped cache:
– Data bits= 2n . 2m .8
– Valid bits: 2n (1 bit per block)
– Tag bits: 2n 32-m-n (1 tag per block)
– Total bits = 2n (1 + [32-m-n]+2m .8)
– Efficiency: data bits/ total bits = 2m .8/ (1 + [32-m-n]+2m .8)

21 v 1.2
Impact of Block size
• Advantages of larger blocks:
– Reduces miss rate due to spatial locality
– Amortize the overhead of the tag bits.
• Disadvantages (assuming fixed sized cache):
– Fewer blocks (increases the miss rate)
– Underutilizes blocks (pollution) if no spatial locality
– Larger miss penalty (more bytes to fetch)
–Can override benefit of reduced miss rate.
–Early restart and critical word first can help.

22 v 1.2
Writing to Caches

• On a write hit, if we just update the

block in cache, cache and memory
would be inconsistent.

• Need to ensure that both eventually

updated.

23 v 1.2
Write Through Cache

• A write-through cache updates

both cache and memory at the
time of the write.
• Disadvantage:
– Makes writes take longer because
they need to wait for lower levels in
the hierarchy.
• Solution: use a write buffer
– Buffers data that is waiting to be
written to memory.
– Processor continues while write
buffer writes in the background.
• Only stalls if write buffer is already
full.

24 v 1.2
Write-back Cache
• A write cache only update
cache.
• Value is written to memory
when the block is evicted.
– Need to know which blocks
Evict have changed
– Use a dirty bit.
• Now eviction take long
– Solution : use a write buffer for
evicted dirty blocks.

25 v 1.2
Block Placement in Various Mapping

26 v 1.2
Associative Caches
▪ Fully associative
▪ Allow a given block to go in any cache entry
▪ Requires all entries to be searched at once
▪ Comparator per entry (expensive)
▪ n-way set associative
▪ Each set contains n entries
▪ Search all entries in a given set at once
▪ Block address determines which set
▪ (Block address) modulo (#Sets in cache)
▪ n comparators (less expensive)

27 v 1.2
Spectrum of Associativity

28 v 1.2
Set Associative Cache Organization
4-way set, 256 sets in
cache and 1 word/block

29 v 1.2
Example: 2 way set associative
Tag Set Index Block offset Byte offset
Address Hit/Miss (3 bits) (2 bits) (1 bit) (2 bits)
10110000 Miss 101 10 0 00

11101100 Miss 111 01 1 00

10010000 Miss 100 10 0 00

10110100 Hit 101 10 1 00

11010000
Way 0 Way 1
Index v Tag Word 0 Word 1 v Tag Word 0 Word 1

00 0 0

01 1 111 x x 0

10 1 101 x x 1 100 x x

11 0 0

30 v 1.2
Size of Tags versus Set Associativity
Problem: Increasing associativity requires more comparators and more
tag bits per cache block. Assuming a cache of 4096 blocks, a 4-word
block size, and a 32-bit address, find the total number of sets and the
total number of tag bits for caches that are direct mapped, two-way and
four-way set associative, and fully associative.

31 v 1.2
Range of Set Associative Caches

For a fixed size cache, each increase by a factor of two in

associativity doubles the number of blocks per set (i.e., the
number of ways) and halves the number of sets – decreases
the size of the index by 1 bit and increases the size of the tag
by 1 bit
32 v 1.2
Replacement Policy
• Direct mapped: no choice
• Set associative
– Prefer non-valid entry, if there is one
– Otherwise, choose among entries in the set
– Least-recently used (LRU)
• Choose the one unused for the longest time
– Simple for 2-way, manageable for 4-way, too hard beyond that

– Random
• Gives approximately the same performance as LRU for high
associativity
– FIFO
• Replace the block that has been in the cache longest.

33 v 1.2
Sources of Cache Misses

▪ Compulsory (cold start or process migration, first reference):

First access to a block, “cold” fact of life, not a whole lot you
can do about it. If you are going to run “millions” of instruction,
compulsory misses are insignificant
▪ Solution: increase block size
▪ Capacity: Cache cannot contain all blocks accessed by the
program
▪ Solution: increase cache size
▪ Conflict (collision): Multiple memory locations mapped to the
same cache location
▪ Solution 1: increase cache size
▪ Solution 2: increase associativity

34 v 1.2
Reducing Cache Miss Rates

• Use multiple levels of caches

–Primary (L1) cache attached to CPU
• Small, but fast
• Separate L1 I$ and L1 D$
–Level-2 cache services misses from L1 cache
• Larger, slower, but still faster than main memory
• Unified cache for both instructions and data
• Main memory services L-2 cache misses
• Some high-end systems include L-3 cache

35 v 1.2
Multilevel Cache Considerations

Primary cache Focus

– on minimal hit time
– Smaller total size with smaller block size
L-2 cache Focus on low miss rate to avoid main
memory access
– Hit time has less overall impact
– Larger total size with larger block size
– Higher levels of associativity

36 v 1.2
Global v.s. Local Miss Rate

Global miss rate

– The fraction of references that miss in all levels of a multilevel
cache
– Dictate how often the main memory is accessed
Local miss rate
– The fraction of references to one level of a cache that miss

37 v 1.2
Virtual Memory

◼ A memory management technique developed

for multitasking computer architectures
◼ Virtualize various forms of data storage
◼ Allow a program to be designed as there is only one
type of memory, i.e., “virtual” memory
◼ Each program runs on its own virtual address space
◼ Use main memory as a “cache” for secondary
(disk) storage
◼ Allow efficient and safe sharing of memory among
multiple programs
◼ Provide the ability to easily run programs larger than
the size of physical memory
◼ Simplify loading a program for execution by
38
providing for code relocation
v 1.2
Two Programs Sharing Physical Memory
▪ A program’s address space is divided into pages (fixed
size)
▪ The starting location of each page (either in main memory or in
secondary memory) is contained in the program’s page table

Program 1
virtual address space

main memory

Program 2
virtual address space

“cache” of hard drive

39 v 1.2
Address Translation
▪ A virtual address is translated to a physical address by a
combination of hardware and software
Virtual Address (VA)
31 30 . . . 12 11 . . . 0
Virtual page number Page offset

Translation

Physical page number Page offset

29 . . . 12 11 0
Physical Address (PA)
◼ So each memory request first requires an address
translation from the virtual space to the physical space
◼ A virtual memory miss (i.e., when the page is not in physical
memory) is called a page fault
40 v 1.2
Page Tables

◼ Stores placement information

◼ Array of page table entries, indexed by virtual page
number
◼ Page table register in CPU points to page table in
physical memory
◼ If page is present in memory
◼ Page table entry stores the physical page number
◼ Plus other status bits (referenced, dirty, …)
◼ If page is not present
◼ Page table entry can refer to location in swap space
on disk
◼ Swap space: the space on the disk reserved for the full virtual
memory space of a process
41 v 1.2
Address Translation Mechanisms
Virtual page # Offset

Physical page #
Offset
Page table register

Physical page
V base addr
1
1
1
1
1
1
0
1 Main memory
0
1
0
Page Table
42 (in main memory)
v 1.2

Disk storage
Address Translation Example

43 v 1.2
Translation Using a Page Table

44
Page Fault Penalty

◼ On page fault, the page must be fetched from

disk
◼ Takes millions of clock cycles
◼ Handled by OS code
◼ Try to minimize page fault rate
◼ Fully associative placement
◼ Smart replacement algorithms

45 v 1.2
Replacement and Writes

◼ To reduce page fault rate, prefer least-recently

used (LRU) replacement
◼ Reference bit (aka use bit) in PTE set to 1 on access
to page
◼ Periodically cleared to 0 by OS
◼ A page with reference bit = 0 has not been used
recently
◼ Disk writes take millions of cycles
◼ Block at once, not individual locations
◼ Write through is impractical
◼ Use write-back
◼ Dirty bit in PTE set when page is written

46 v 1.2
Virtual Addressing with a Cache

◼ Thus it takes an extra memory access to

translate a VA to a PA
VA PA miss
Trans- Main
CPU Cache
lation Memory
hit
data

▪ This makes memory accesses very expensive (if every

access was really two accesses)
▪ The hardware fix is to use a Translation Lookaside Buffer
(TLB) – a small cache that keeps track of recently used
address mappings to avoid having to do a page table
47
lookup v 1.2
Fast Translation Using a TLB

◼ Address translation would appear to require

extra memory references
◼ One to access the PTE
◼ Then the actual memory access
◼ But access to page tables has good locality
◼ So use a fast cache of PTEs within the CPU
◼ Called a Translation Look-aside Buffer (TLB)
◼ Misses could be handled by hardware or software

48 v 1.2
Making Address Translation Fast
Virtual page # Physical page
V Tag base addr
1
1
1
0
1
TLB
Page table register

Physical page
V base addr
1
1
1
1
1
1
0
1 Main memory
0
1
0
Page Table
49 (in physical memory)
v 1.2

Disk storage
Fast Translation Using a TLB

50
Translation Lookaside Buffers (TLBs)

◼ Just like any other cache, the TLB can be

organized as fully associative, set associative,
or direct mapped
V Tag Physical Page # Dirty Ref Access

▪ TLB access time is typically smaller than cache access

time (because TLBs are much smaller than caches)
▪ TLBs are typically not more than 512 entries even on high end
machines

51 v 1.2
TLB Misses

◼ If page is in memory
◼ Load the PTE from memory and retry
◼ Could be handled in hardware
◼ Can get complex for more complicated page table structures
◼ Or in software
◼ Raise a special exception, with optimized handler
◼ If page is not in memory (page fault)
◼ OS handles fetching the page and updating the page
table
◼ Then restart the faulting instruction

52 v 1.2
TLB Miss Handler

◼ TLB miss indicates

◼ Page present, but PTE not in TLB
◼ Page not preset
◼ Handler copies PTE from memory to TLB
◼ Then restarts instruction
◼ If page not present, page fault will occur

53 v 1.2
Page Fault Handler

◼ Use faulting virtual address to find PTE

◼ Locate page on disk
◼ Choose page to replace
◼ If dirty, write to disk first
◼ Read page into memory and update page table
◼ Make process runnable again
◼ Restart from faulting instruction

54 v 1.2
Modes of Data Transfer

• Programmed IO
• Interrupt Driven IO
• DMA Transfer

55 v 1.2
Programmed IO

56
Programmed IO

• Each data item transfer is initiated by an instruction in the program. Usually the
transfer is to and from the CPU register and the peripheral.
• Transferring data under program control requires the constant monitoring of the
peripheral by the CPU.
• Once the data transfer is initiated the CPU is required to monitor the interface to
see that when the transfer can be made.
• In the programmed IO method the CPU stays in the program loop until the IO
unit indicates that it is ready for data transfer .polling
• This is a time consuming process. Since it keeps the processor needlessly.
• In the programmed IO method the IO device does not have direct access to
memory.
• A transfer from an IO device to memory requires the execution of several
instructions by the CPU , including a input instruction to transfer the data from
the device to the CPU, and store instruction to transfer the data from CPU to
memory.
• Other instructions are needed to verify that the data are available from the
device and to count the number of words transferred .

57 v 1.2
Programmed IO
• When the byte of data is available the device places it in the IO bus and enables its
data valid line.
• The interface accepts the byte in to its data register and enables the data
accepted.
• The interface sets a bit in the status register i.e F (or) flag bit. The device can now
disable the data valid line but will not transfer until the data accepted line is
disabled by the interface.
• A program is written to check the flag in the status register to determine if a byte
has been placed in the data register by the IO device. This is done by reading the
status register in to the CPU register and checking the value of the flag bit .
• If the flag is equal to 1 the CPU reads the data from the data register .The flag bit
will be reset to 0 by either the CPU or the interface depending on how the
interface circuits are designed.

58 v 1.2
Programmed IO

59
Programmed IO

• The transfer of each byte requires three instructions:

• Read the status register.
• Check the status of the flag bit and branch to step1. if not set or or to step 3 if set .
• Read the data register. The programmed IO method is particularly useful in small low
speed computer.

60 v 1.2
Interrupt driven data transfer

61
Interrupt driven data transfer
• The daisy chaining method of establishing the priority consists of
connection of all the devices that request an interrupt.
• The device with the highest priority will be positioned in the first position
followed by the low priority devices up to lowest priority device which is
positioned last in the chain.
• The interrupt request line is common to all the devices and forms an wired
logic connection.
• If any device has its interrupt signal in the low level state , the interrupt
goes to the low level state and enables the interrupt input in the CPU.
• When no interrupts are pending the interrupt line stays in the high level
state and no interrupts are recognized by the CPU. This is equivalent to
the –ve logic OR operation.

62 v 1.2
Interrupt driven data transfer
• The CPU responds to an interrupt request by enabling the interrupt
acknowledgement line.
• This signal is received by the device 1 at its PI ( priority In) input .
• The acknowledgement line passes to the next device through PO
(priority Out ) output only if device 1 is not requesting an interrupt.
• If device 1 has pending interrupt it blocks the acknowledgement signal
from the next device by placing the 0 in the PO output.
• It then proceeds to insert its own interrupt vector address (VAD) in to
the data bus for the CPU to use during the interrupt cycle

63 v 1.2
Interrupt driven data transfer

• A device with a 0 in its PI input generates a 0 in its PO output to

inform the next lower priority device that the acknowledge signal
has been blocked.
• A device that is requesting an interrupt and has 1 in its PI input will
intercept acknowledge signal by placing the 0 in its PO output.
• If the device does not have any pending interrupts it transmits the
acknowledgement signal to the next device by placing the 1 in the
PO output.
• Thus the device with PI=1 and PO=0 is the one with the highest
priority that is requesting an interrupt and this device places an
vectored address on the data bus.
• The daisy chain arrangement gives the highest priority to the
device that receives the acknowledgement signal from the CPU.

64 v 1.2
DMA mode of data transfer
• The transfer of data between the fast storage device such as magnetic
disk and memory is limited by the speed of the CPU.
• Removing the CPU from the path and letting the peripheral device
manage the memory buses directly would improve the speed of the
transfer.
• This transfer technique is called the direct access memory (DMA)
During DMA transfer the CPU is idle and has no control of memory
buses.
• A DMA controller takes control over the buses to manage the transfer
directly between the IO device and the memory

65 v 1.2
DMA mode of data
transfer

66
DMA mode of data transfer
• The bus request (BR) is used by the DMAC (DMA controller) to
request the CPU to relinquish control of the buses.
• When this input is active the CPU terminates the execution of the
current instruction and places the address bus , data bus and R/W
lines in to high impedance state.
• The high impedance state behaves like an open circuit which means
that the output is disconnected and does not have logic significance.
• The CPU activates the bus grant (BG) output to inform the external
DMA that the buses are in the high impedance state.
• The DMA that originated the bus request can now take control of the
buses to conduct memory transfer with out processor intervention.

67 v 1.2
DMA mode of data
transfer
• DMA Transfer can be :

• Burst mode➔ A block sequence consisting of number of memory words is

transferred in continuous burst while a DMA controller is a master of
the memory buses. This mode of transfer is needed for fast devices such
as magnetic disks , where the data transfer can not be stopped or slowed
down until an entire block is transferred.
• Cycle stealing➔ Allows the DMA controller to transfer one data word at a
time , after which it must return control of the buses to the CPU. The
CPU merely delays its operation for one memory cycle to allow the direct
memory IO transfer to steal one memory cycle.

68 v 1.2
DMA mode of data
transfer

69
DMA mode of data transfer

• When the peripheral device sends a DMA request line the DMAC
activates the BR line informing the CPU to relinquish the buses.
• The CPU responds with its BG line informing the DMA that its buses
are disabled.
• The DMA then puts the current value of the address register in to the
address bus initiates the RD or WR signal and sends a DMA ACK to the
peripheral device.
• The RD or WR lines in the DMAC are bi-directional. The direction of
transfer depends on the status of the BG signal.
• If BG=0 the RD and WR are input lines allowing the CPU to
communicate with the internal DMA registers.

70 v 1.2
DMA mode of data transfer

• If BG=1 the RD and WR are the output lines from the DMC to the
RAM to specify the read or write operation for the data.
• When the peripheral device receives the DMA ACK it puts the word in
the data bus (write) or receives the word from the data bus (read).
• Thus the DMA controls the R/W operation and supplies the addresses
for the memory.
• The peripheral unit can then be communicate with the memory thru
the data bus for direct transfer between the two units while the CPU
is momentarily disabled.

71 v 1.2
DMA mode of data
transfer

• For each word that is transferred the DMA increments its address
register and decrements its word count register.
• If the word count register reaches to zero the DMA stops any further
transfers and removes its bus request .
• It also informs the CPU of the termination by means of the interrupt.
• When the CPU responds the interrupt it reads the contents of the
word count register .
• The zero value of the register indicates that all words were
transferred successfully

72 v 1.2

VW Canbus Basic
100% (4)
VW Canbus Basic
32 pages
CMSC 611: Advanced Computer Architecture
No ratings yet
CMSC 611: Advanced Computer Architecture
21 pages
Computer Architecture: Cache Memory
No ratings yet
Computer Architecture: Cache Memory
57 pages
Cache
No ratings yet
Cache
34 pages
Memory Hierarchy Design Guide
No ratings yet
Memory Hierarchy Design Guide
115 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Chapter 5 Large and Fast Exploiting Memory Hierarchy
No ratings yet
Chapter 5 Large and Fast Exploiting Memory Hierarchy
101 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
No ratings yet
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
49 pages
Unit V
No ratings yet
Unit V
44 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
AC14L08 Memory Hierarchy
No ratings yet
AC14L08 Memory Hierarchy
20 pages
Memory Hierarchy Design Guide
No ratings yet
Memory Hierarchy Design Guide
54 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
9 - Cache
No ratings yet
9 - Cache
58 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
6421C Vol II System Manual For Mark VI
100% (2)
6421C Vol II System Manual For Mark VI
228 pages
361 Computer Architecture Lecture 14: Cache Memory
No ratings yet
361 Computer Architecture Lecture 14: Cache Memory
20 pages
Lec2 PDF
No ratings yet
Lec2 PDF
21 pages
Chap 6
No ratings yet
Chap 6
48 pages
Cache Basics and Operation
No ratings yet
Cache Basics and Operation
42 pages
CH10 - Memory Hierarchy
No ratings yet
CH10 - Memory Hierarchy
106 pages
Question and Answer of IT
No ratings yet
Question and Answer of IT
181 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
ICT - PR Book - BSCS - Part - 1 PDF
No ratings yet
ICT - PR Book - BSCS - Part - 1 PDF
53 pages
Chapter 2 Neede For Guide Line Help From Smiw
No ratings yet
Chapter 2 Neede For Guide Line Help From Smiw
7 pages
Computer Architecture: Cache Design
No ratings yet
Computer Architecture: Cache Design
61 pages
Main Ic Renesis Master Datasheet PDF
No ratings yet
Main Ic Renesis Master Datasheet PDF
223 pages
Network+ Guide To Networks 6 Edition: Network Hardware, Switching, and Routing
No ratings yet
Network+ Guide To Networks 6 Edition: Network Hardware, Switching, and Routing
63 pages
Cache 1 54
No ratings yet
Cache 1 54
54 pages
3.3.1 APU Starter Logic: Aircraft Electrical and Electronic Systems 64
No ratings yet
3.3.1 APU Starter Logic: Aircraft Electrical and Electronic Systems 64
15 pages
Lec 4
No ratings yet
Lec 4
31 pages
Lecture 19: Cache Basics: Today's Topics: Out-Of-Order Execution Cache Hierarchies Reminder: Assignment 7 Due On Thursday
No ratings yet
Lecture 19: Cache Basics: Today's Topics: Out-Of-Order Execution Cache Hierarchies Reminder: Assignment 7 Due On Thursday
17 pages
EC6504 New2 PDF
No ratings yet
EC6504 New2 PDF
184 pages
Dell Optiplex GX 620 UG
No ratings yet
Dell Optiplex GX 620 UG
352 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
49 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
CA Chap5 Memory
No ratings yet
CA Chap5 Memory
91 pages
Cache Presentation
No ratings yet
Cache Presentation
45 pages
Lecture 13 16 Post
No ratings yet
Lecture 13 16 Post
24 pages
11 Cache Memory
No ratings yet
11 Cache Memory
40 pages
R01an0239ej sh7125
No ratings yet
R01an0239ej sh7125
35 pages
Spy Robot for Engineering Students
No ratings yet
Spy Robot for Engineering Students
55 pages
Embedded System I/O and Bus Guide
No ratings yet
Embedded System I/O and Bus Guide
20 pages
PCI Express 2015
No ratings yet
PCI Express 2015
20 pages
CAO - Lecutre7 Cache Memory
100% (1)
CAO - Lecutre7 Cache Memory
39 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
Ranking of PMU
No ratings yet
Ranking of PMU
17 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
Datasheet mcf5253 Coldfire Microprocesor
No ratings yet
Datasheet mcf5253 Coldfire Microprocesor
34 pages
Programming Interface For Bus Master IDE Controller
No ratings yet
Programming Interface For Bus Master IDE Controller
6 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
What Is Chemometrics
No ratings yet
What Is Chemometrics
12 pages
Introduction To Hardware: 1 Pengantar Teknologi Informasi
No ratings yet
Introduction To Hardware: 1 Pengantar Teknologi Informasi
27 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
61 pages
Ec2304 LP III Ece
No ratings yet
Ec2304 LP III Ece
5 pages
FBP Fieldbusplug: Software Description
No ratings yet
FBP Fieldbusplug: Software Description
8 pages
Cache PPT
No ratings yet
Cache PPT
38 pages
Memory 2
No ratings yet
Memory 2
31 pages
Memory & Cache Fundamentals
No ratings yet
Memory & Cache Fundamentals
38 pages
Ch01 Part3 Caches
No ratings yet
Ch01 Part3 Caches
32 pages
Chapter2 - Computer Hardware
No ratings yet
Chapter2 - Computer Hardware
33 pages
Simatic Manual de Funções S7 300 S7 400
No ratings yet
Simatic Manual de Funções S7 300 S7 400
114 pages
CPH Lab Manual
No ratings yet
CPH Lab Manual
14 pages
Intel 8086
No ratings yet
Intel 8086
4 pages
Module 3 - Basic Software Techniques For Embedded Applications and Parallel Input and Output
No ratings yet
Module 3 - Basic Software Techniques For Embedded Applications and Parallel Input and Output
51 pages
13 - Large and Fast Exploiting Memory Hierarchy Final
No ratings yet
13 - Large and Fast Exploiting Memory Hierarchy Final
118 pages
Week12 Updated
No ratings yet
Week12 Updated
60 pages
Ertc Unit 1 - Part2
No ratings yet
Ertc Unit 1 - Part2
31 pages
6.module 2 - Part 2
No ratings yet
6.module 2 - Part 2
39 pages
Transforming Data
No ratings yet
Transforming Data
21 pages
Apb Code
No ratings yet
Apb Code
26 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Memory Banking in 8086 Microprocessor
No ratings yet
Memory Banking in 8086 Microprocessor
2 pages
10 Caches
No ratings yet
10 Caches
124 pages
MIC EPA Notes
No ratings yet
MIC EPA Notes
84 pages
Introduction To Cache Memory: Lecture 4A
No ratings yet
Introduction To Cache Memory: Lecture 4A
31 pages
Chapter 7
No ratings yet
Chapter 7
23 pages
Lecture 7
No ratings yet
Lecture 7
34 pages

Unit 4

Uploaded by

Unit 4

Uploaded by

Unit-4

SRAM 0.5–2.5 ns $500–$1000

Tavg= H* Hit_time+ (1-H)* Miss_time

Suppose CPU refers memory 100 times, the misses are

• Cache memory The level of the memory hierarchy closest to the

• How do we know if the

• How do we know which particular block is

◼ 8-blocks, 1 word/block, direct mapped

◼ Access sequence (address in word):

Word addr Binary addr Hit/miss Cache block

Index V Tag Data

Word addr Binary addr Hit/miss Cache block

Index V Tag Data

Word addr Binary addr Hit/miss Cache block

Index V Tag Data

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 13

Word addr Binary addr Hit/miss Cache block

Index V Tag Data

Word addr Binary addr Hit/miss Cache block

Index V Tag Data

◼ 100 blocks, 16 bytes/block

◼ 1024 blocks, 32 bytes/block

• Cache with 2n blocks, 2m bytes/block

• On a write hit, if we just update the

• Need to ensure that both eventually

• A write-through cache updates

11101100 Miss 111 01 1 00

10010000 Miss 100 10 0 00

10110100 Hit 101 10 1 00

For a fixed size cache, each increase by a factor of two in

▪ Compulsory (cold start or process migration, first reference):

• Use multiple levels of caches

Primary cache Focus

Global miss rate

◼ A memory management technique developed

“cache” of hard drive

Physical page number Page offset

◼ Stores placement information

◼ On page fault, the page must be fetched from

◼ To reduce page fault rate, prefer least-recently

◼ Thus it takes an extra memory access to

▪ This makes memory accesses very expensive (if every

◼ Address translation would appear to require

◼ Just like any other cache, the TLB can be

▪ TLB access time is typically smaller than cache access

◼ TLB miss indicates

◼ Use faulting virtual address to find PTE

• The transfer of each byte requires three instructions:

• A device with a 0 in its PI input generates a 0 in its PO output to

• Burst mode➔ A block sequence consisting of number of memory words is

You might also like