0 ratings0% found this document useful (0 votes) 38 views40 pagesCa Mod 2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Memory Hierarchy
‘CashaiSRAN
i Ta a 2 | 3
e Magnetic Oiske EF i
$ 5 :
i s |:
3 Maotio Tipo :
Memory hierarchy according to speed, size and cost
+ CPU is faster than memory access.
* A hierarchical memory system can be used to close up the speed gap.
The higher levels are expensive, but they are fast.
* As we move down the hierarchy, the cost generally decreases, whereas the access
time increases.
Memory Hierarchy contd.
* Cache is very high-speed memory, used to increase the
speed of processing by making the current program and
lata avail to the CPU at a rapid rate. It is employed in
computer systems to compensate for the speed difference
between main memory and processor. Cache memory
consists of static RAM cell.
* Main memory or primary memory stores the programs
and data that are currently needed by the processor. All
other information is stored in secondary memory and
transferred to main memory when needed.
* Secondary memory provides backup storage. The most
common secondary memories used in computer system are
magnetic disks and magnetic tapes. They are used for
storing system programs, large data-files and other
backup information.Locality of reference
The references to memory at any given interval of time tend to be
confined within a few localized areas in memory. This phenomenon is
known as the property of locality of reference.
There are two types of locality of reference.
i) Temporal locality
ii) Spatial locality
i) Temporal locality: Recently referenced instructions are likely to be
referenced again in the near future. This is called temporal locality. In
case of iterative loops, subroutines, a small code segment will be
referenced repeatedly.
ii) Spatial locality: This refers to the tendency for a program to access
instructions whose addresses are near one another. For example, in case
of arrays, memory accesses are generally consecutive addresses.Cache memory
What cache?
¥ Cache memory is a small-sized type of
volatile memory.
¥ Cache is the temporary memory.
¥ The data from programs and files you use the
most is stored in this temporary memory,
which is also the fastest memory in your
computer.
~ Cache storage is limited and very expensive
for its space.Cache Memory
The data or contents of the main memory
that are used frequently by CPU are stored
in the cache memory so that the processor
can easily access that data in a shorter
time. Whenever the CPU needs to access
memory, it first checks the cache memory.
If the data is not found in cache memory,
then the CPU moves into the main memory.
Cache memory is placed between the CPU
and the main memory. The block diagram
for a cache memory can be represented as:
Byts or word Block
transfer transfer
Rv i i i i i i
Cache Main
ecg. cera
Cache Memory
The cache is the fastest component in the
memory hierarchy and approaches the
speed of CPU components.Cache memory
CPU
When the CPU refers the memory for a word. If the word found in cache it is called
cache hit and if not found then cache miss.
The ratio of number of hits and the total number of CPU references to the memory
is called hit ratio.
Total number of CPU references to the memory = number of hits + number of misses
number of hits
Hit ratio = ———tumber of hits __
to number of hits+number of misses
number of mi
Miss ratio =————umberofmisses
S ratio = | mber of hitstnumber ofmisses = 2 Hit ratio
Cache memory contd.
Cache access time= t,
Main memory access time
Hit ratio is h,
Average Memory Access Time = h, xt, +(1-h.) x (t,, + t)
For example a computer with cache access time 100 ns , a main memory access time
is 1000 ns and a hit ratio of 0.9. Calculate the Average Memory Access Time.
Average Memory Access Time =0.9 x 100+ (1- 0.9) x (1000 + 100) ns
90 +110 nsExample: A three level memory system having cache access time of 15 ns ,main
memory access time of 25 ns and disk access time of 40 ns has a cache hit ratio of
0.96 and main memory hit ratio of 0.9. What should be the Average Memory Access
Time?
Secondary
Memory
Cache access time= ty =15 ns
Main memory access time =t,, =25 ns
Disk access time = t, =40 ns
Hit ratio of cache = h, =0.96
Hit ratio of cache = h, =0.9
Average Memory Access Time = h, xt, +(1-h) xh, x(t, + t)+(1-h) x(I-h,) x(t, +t,,+ td
= 0.96 x15 + 0.04 x 0.9 x (25415) + 0.04 x 0.1 x (40+25+15) ns
= 16.16 nsCache performance
Qne method to evaluate cache performance is to expand CPU execution time.
The CPU execution time is the product of the clock cycle time and the sum of the CPU cycles
and the memory stall cycles.
CPU execution time = (CPU clock cycles + Memory stall cycles) x clock cycle time
CPU clock cycles=1C x CPI
CPU clock cycle is the product of IG (Instruction Count) and the number of clock cycles needed per instruction
(CPN. Here IC means Instruction Count. That means the total number of instructions are executed,
Memory stall cycles are the number of cycles during which the CPU is stalled waiting for a
memory access.
The number of memory stall cycles depends on both the number of misses and miss penalty.
+ Ingeneral, the miss penalty is the time needed to bring the desired information from a
slower unit in the memory hierarchy to a faster unit.
* Here we consider, miss penalty is the time needed to bring a block of data from main
memory to cache.
Memory stall cycles = Number of misses x Miss penalty Glick Here
Misses ,
=ICx Traction * Miss penalty
Memory access
Instruction
Se
CPI Example
€xample: Let assume that a benchmark has 100 instructions:
+ 25 instructions are loads/stores (each take 2 cycles)
+ 50 instructions are adds (each takes 1 cycle)
+ 25 instructions are square root (each takes 3 cycles)
a) What is the average CPI for this benchmark?
b) How much CPU clock cycles are required for total instructions?
SICx x Miss rate x Miss penalty
a) Average CPI = ((25 * 2) + (50 * 1) + (25 * 3)) /100
= (50 + 50 + 75) / 100
= 1.75
b) Clock cycle required for total instructions = IC x CPI
= 100 x 1.75
clock cycles
= 175 clock
cyclesBACK
Memory stall cycles = Number of misses x Miss penalty
* Number of misses can be represented as the product of total number of
instructions and in each instruction how many misses will be occurred
* Total number of Instructions means IC ( Instruction Count).
x Mis
Instruction
Misses
Instruction
* Misses per Instruction can be represented as the product of Memory access per
instruction and miss rate
Miss Memory ace
Instruction ~ Instruction
Number of misses = IC
Memory stall cycles = IC x x Miss penalty
s
x Miss rate
Memo!
Memory stall cycles = IC x
ry v Instruction
access
x Miss rate x Miss penaltyCache performance examples
ample:
‘Assume we have 2 computer where the dock pr instruction (Cis 1 when all memory accesses htin the
‘ache. The only data accesses are loads and stores, and these ate SOX ofthe total instruction. he miss
penalty 625 clock cyles and the mis rate i 23, how much faster would the computer be fallinstuctions
Were cache hits?
(loHCPI +0} xclock eee time
1C1 x cock yele time
amor stall ces = ex MEMS Mis atex is peaty
"arary ecos par ntrasonrepreaert. on ratrton oan Trom manor. Tat Tor baton Tt mony
(rom 1008 ad dat eso pr tron, tate fr aperand fetch Wad or wre back tor memory
iasen ove 50%
=iex-r05) «002825
don Sox
siex075
lick Hare
Cache performance examples contd.
(GPU execution time that ha bath hit and mis
(CPU clock cytes + Memory stall cytes) x clock eel time
(We 1+ 16075) x dock cyte time
1161.75 x clock eyle tie
“CPU excatontime with aways
2175
‘The computer with no cathe misses i175 times faster.
Performance ratio
Cache performance examples contd.
‘Beample2:
‘AProcessor has instruction and data cache with miss rates 2% and 4% respectively Ifthe
processor frequency of loads and stores on average s 36% and CPI is 2. Miss penalty can be
‘taken to be 40 cycles forall misses, How much time is needed for CPU execution, If 1000
instructions presentin the program? Consider clock cycle time is 2s.
(PU execution time = (CPU clock eyes + Memory stall eyes) x lock eyle time
= ((C CPI + Memory stallcycles) x clock cycle time
‘Memory stallcycles = 1C x MEROD-SEESS x jis rate x Miss penalty Click Here
ex (10.02 +0.36 0.04) 40 cytes
1000 «0.0344 x 40 cycles
1376 oycles
{CPU execution time = (1000 2+ 1376) x2 ns
6752s-
ID
la
lx
+ Instruction cycle has five stages
i) IF (Instruction Fetch) (Memory Access)
il) 1D (Instruction Decoding)
ii) OF ( Operand Fetch) (Memory Access)
iv) EX (Execution)
v) WB (Write Back) (Memory Access)
+ In this five stages, only three stages If, OF and WB involve with memory access.
+ If stage is mandatory for all instructions. So, incase of IF always memory access
is required that means memory access is 100%,
+ OF and WG are optional for instruction execution
+ OF means load operation and WB means store operation.
+ 50% instructions are load and store means for 50% instruction either OF or WB
stages will be required.
* For execution of an instruction , IF must be required that is 100% memory access
and additionally in average 50% memory access will be required for OF and WB.
ID
la
la
If there are two caches instruction cache and data cache then miss rate will be consider
twice,
For instruction access instruction cache will be required.
For data access data cache will be required.
Memory access per instruction can be written as a
OSS
x Miss rate =
jon
ewory seeeae for nteUOn FE instruction cache Miss rate + MOWOWAcese fr lat aNd ey data cache Miss rateMIPS
* MIPS : millions of instructions per second
It is metrics for computer performance .
+ MIPS = instruction count / {execution time x 105)
for example, a program that executes 3 million instructions in 2 seconds has a
MIPS rating of 1.5
Advantage : Easy to understand and measure.
+ Example: Two different compilers are being tested for a 500 MHz machine with
three different classes of instructions:
Class A, Class 8, and Class C, which require one, two, and three cycles
(respectively).
The first compiler’s code uses 5 billions Class A instructions, 1 billion Class B
instructions, and 1 billion Class C instructions.
‘The second compiler's code uses 10 billions Class A instructions, 1 billion Class 6
instructions, and 1 billion Class C instructions.
What are the execution time of two different compilers?
What are the MIPS of two different compilers?
MIPS Example (Contd.)
Instruction counts(lin billions)
for each instruction class)
Code from [A 8
Compiler 1 [5 1
Compiler 2. | 10 1
Given Class A, Class 8, and Class C, which require 1, 2, and 3 cycles
(respectively).
Clock frequency is 500 MHz .
CPU Clock cycles1= (5 x 1+ 1x 2+ 1x 3) x 108 10 x 109
CPU Clock cycles2= (10 x 1+ 1x 2+ 1x 3) x 10% 15 x 109
CPU time1= 10 x 108 500 x 106 20 seconds
CPU time2= 15 x 109 500 x 106 30 seconds
MIPS1= (5 + 1+ 1) x 109 20 x 106 350
MIPS2= (10 + 1+ 1) x 109 30 x 106 400Average memory access time
+ A better measure of memory performance is the average memory
access time. Average memory access time is defined as
Average memory access time = Hit time + Miss rate x Miss ratty
So, average memory access time is depending upon Hit time, Miss rate
and Miss penalty, these three factors.
* Average memory access time is reduced if these three factors are
reduced. First we describe the miss penalty reduction technique.
These are the following techniques to reduce the miss penalty.
* Multi-level caches
* Victim caches
* €arly Restart and Critical Word First
* Read Priority over Write on MissMulti-level caches
cae
‘Multilevel eaches reduce the mies penalty.
The. first level cache (L1 cache) is ameller in size compare to second level cache ( L2
cache).
U1 cache is on-chip cache, witose access time Ia near to the clock speed of the CPU.
(2 cache Ie off-chip cache, larger enough to capture many accesses that would go to
‘memory. So it reduces the miss penalty
The speed of L1 cache affects the clock rate of the CPU, while the speed of L2 cache
Multi-level caches contd.
‘Multi-level inclusion: Peo
‘By multi level inclision property, data present in L1 cache are always
present in 2 cache.
Inclusion is desirable because consistency between 1/0 and caches can
be determined just by checking L2 cache.
Disadvantage of multilevel inclusion is, L2 cache has a redundant copy
of the L1 cache. So, space ie wasted in L2 cache.
+ By mult level exclusion property. data present in L1 cache never found
in 2 cache ee
Cache miss in L1 results in.a swap of blocks between (1 and L2
instead of a replacement of 1 block with an L2 block.
Advantage of, mutilevel excision is, this policy prevents wasting space
in the L2 cache.
Multi-level caches contd.
‘Average memory access time for a two level cache is defined by the following formula.
‘Average memory acces Ue = Hit ime,, + Mis at, Mise penalty,
Miss penalty, = Mit ime + Mis ates Miss penalty
‘Average memory access time = Hit ime,,+ Mis rate, x(HE time, + Mss ate. * Miss penalty]
miss rate:
‘lobal miss rte» ssiasans afmeeny scorned PNET
(Global mise rte of LL = Mie ate,
(Global mis rate of 2 « Migs ate, * Mis ates
Memory stalls per instruction can be defined as
"Average memory stalls per instruction =
Misses per instruction, x Hittime.+ Misses per instructions Miss penaltyzNumerical on Multi-level caches
Example:
In 1000 memory references there are 40 misses in the first level cache and 20 misses in the second level
cache. What are the various miss rates? Assume the miss penalty from the L2 cache to memory is 100
clock cycles, the hit time of the L2 cache is 10 clock cycles, the hit time of L1 is 1 clock-cycle and there are
1.6 memory references per instruction. What is the average memory access time and average stall cycles
per instruction?
40
Local miss rate of L1 cache = —*° x 100 = 4%
000
Global miss rate of L1 cache = 4%
Local miss rate of L2 cache = 2? x 100 = 50%
20
Fay 100 = 2%
Average memory access time = Hit time,, + Miss rate,, x (Hit time, + Miss rate,, x Miss penalty,:)
Global miss rate of L2 cache
= 140,04 x (10 + 0.5 x 100} clock cycles
= 1+ 2.4clock cycles
= 3.4 clock cycles
Numerical on Multi-level caches contd.
Average memory stalls per instruction=
Misses per instruction, x Hit time, + Misses per instruction, Miss penalty,
Let, x instructionsare present.
1.6 x x= 1000
1000
xe =625
Average memory stalls per instruction
2 x 10+2 x 100) clock cycles
400 4, 2000) clock cycles
625 625 ¥
2400
= a5. Clock cycles
= 3.84 clock cyclesVictim caches
Main
Processor Memory
je} Cache 74
Vietim
Cache
Placement of victim cache in the memory hierarchy
* Victim caches is another miss penalty reduction technique.
- Suppose a block was discarded and after this it needed again,
Since the discarded block has already been fetched, it can be used again at small cost.
+ Such recycling requires a small fully associative cache placed in between original cache
and their refill path,
* This small cache is called victim cache because it contains only blocks that are discarded
from a cache due to miss.
: This cache is checked on a miss to see if the cache contains the desired block or not before
accessing the main memory.
* If the desired block is found in the victim cache, the victim block and cache block are swapped.
Neen eee eee eeOn chip cache and off chip cache
* The first level cache (L1 cache) is smaller in size compare to second level cache
(L2 cache).
* L1 cache is on-chip cache, whose access time is near to the clock speed of the CPU.
* 2 cache is off-chip cache, larger enough to capture many accesses that would go to
main memory.Cache Mapping
1. Direct Mapping
2. Associative Mapping
3. Set- Associative Mapping
Direct Mapping *** cache memory & block
Mainmemorysie=1288=2” ( for direct )
No. of words present ina block=4, Tog
No.of blocks present in cache =2 Teo
cache memory size
Block size
Roger
No. of blocks present in main memory™?
‘main memory size
Dlocksize
=22
No. of Tags present
main memory size
ant ee
“Any block is mapped with blocki mod 8
Mapping funetion=i mod no. of block present in cache
emery adeross7 it
Block
7 main memory
Associative Mapping
Wain memory sae =128 8
Cache memory se=32 8
Blocksize=4 8
No.of blocks present in cache
cathememory sie
Blacksive
=3
No. of blocks presentin main meme
ainmemoryeize "Toe Cater] J
@ cache
138 o3 ‘Mapping function
‘ny block can be mapped wit any block in cache
*“main memory & block
(for associative )
Memory saress 7b
Bose
oa
Bide
Sb Bes
‘main memorySet-Associative Mapping
n way Set-Associative mapping
Where n=2” and x=1,2,3.
If x=1
then n=2 and it is 2 way Set-Associative mapping
If x=2
then n=4 and it is 4 way Set-Associative mapping
If x=3
then n=8 and it is 8 way Set-Associative mapping
no of block = cache size / block size
no of set = no of block / no of way
2 way Set-Associative Mapping
Main memory size = 128 B
Cache memory size = 32 B
Block size = 4B
No. of way =2 Tag [~ block0)
No. of block presentin cache=8. Tag locl
, Block
No. of set present in cache :
"No. of blocks present In cache 1 occa
. No of way Blacks
=8/2=4=22 : Poece—
8/2=4 = 2 Tag [Tblock7
No. of Tags present cache
No of way,
28X2
32
=de28 Any block i is mapped with block i mod 4
“8 Mapping function=i mod no. of set present in
Memory address 7 bit
Te Set Word
3 bit 2 bit 2bit
block 11
lock’:
Blockt4
lock:
block 16.
loo
lock
lock19.
block20
lock
lock:
block24
25.
lock
lock3t
Tago
Tag!
Tag2
Tag3
Tag4
Tags
Tagé
Tag?
main memoryDirect Mapping
Assume a system has 2KB cache, 68 KB man memory and 16 byte block.
Number of blocks presentin cache
cache memory size _ 2% 210
block sine =
Pie
[Number of blocks present in main memory
“malin memory size,
‘igeksize
ex St
BF = 2 = 4096
sain memory sie
ache memory size
B30
oh 32
Number of tags
00g
Memory adress 16 it Main memory
Teo Block Word
oat
“Advantage : Very simple and eas to manage. Search space is minimum compare to other mapring
fa program requires blockO and bock128 repeatedly then cache miss wll occur due to
blcko and block128 ae mapped in the same pace in cache.
Memory aresst6
Teg
oak
Main memory
‘Advantage : There is no limitation in block mapping.
Disadvantage : Search space is maximum compare to other mapping.
2 Way Set-Associative Mapping
Main memory size = 64 KB
Cache memory size = 2 KB
site= Hae
Blocksize = 16 8 co
No. of way =2
No. of block presentin cache=128
No. of set present in cache ESE
Po ofl prevent cache E ori
=128/3 a
No. of Tags present
‘isin memory size
TEEHY RO aE
2x 2%D |eacacog
ea rae
a Main memory
BERT
Momery adsrose16 bt
Set
SoeGiven the following, determine size of the sub-fields ( in bits) in the address for direct
mapping, associative and set associative mapping cache schemes.
+ We have 256 MB main memory and 1MB cache memory.
* The block size is 128 bytes.
* There are 8 blocks in cache set.
je memory §
2x2
=o 2
for Direct Mapping “"
Number of blocks present in cache
_cache memory size _ 220
“plocksize 27
The block size is 128 bytes = 27
No. of Tags present
pres 5 I ys
for set associative mapping “te memory ste
No.of way
__ 28% 220 23
=a
No. of set present in cache
= No. of blocks present in cache
Nootway
=213/ 232210
Direct Mapping
Memory address 28 bit
Tag Block
8 bit 13 bit
Associative Mapping
Memory address 28 bit
Tag Word
21 bit 7 bit
8 way set Associative Mapping
Memory address 28 bit
Tag Set Word
10 bit 7 bit| Cone anemmcy = 512 KB
| Main memos = 2MB
|
MerOlotk sive . GA bytes
\
Hee.
\ D Dineet mopped cache
re Coy Sek Associative | CAChA 4
No of worols psursunt iw block =6
N bi iu twit + Se at
0 TU = :
of block pxrteut euBle I Doze
3 10
g Re AS
—- Bi
i ‘ 13
Q ee
No of tags pruemr 2 » s: 4 se
ot
main memoxy size... aix2° 2g
eeeeorr © 122
Coens memory S7e
[ea | Block | word | > dinect
NS ea
\ mo af block prusint-
in Cone
oS
No ef Sit psuant im coche =
10
| a s
No of Taq Prusent = Main memory Aize
Coren ‘remo sap Ame
No
peal Lome] gory ct
10 6 ea acta sae aeWrite-through and Write-back method
Write-through :
* The simplest and most commonly used procedure.
* During write operation when the cache location is updated at the same time main memory
also updated.
+ main memory always contains the same data as the cache
Advantage: This characteristic is important in systems with Direct Memory Access (DMA)
transfers. It ensures that the data residing in main memory are valid and DMA would transfer
the most recent updated data.
Disadvantage: For every modification of cache, main memory access required.
write-back:
* In this method only the cache location is updated during a write operation.
* The location is then marked by a flag or modified bit so that later when the word is removed
from the cache it is copied into main memory.
Advantage: During the time a word resides in the cache, it may be updated several times.
For this reason repeatedly memory access is not required for a word modification.
Disadvantage: DMA transfer faces problem.Different types of misses
* Compulsory miss- The very first access to a block can’t be in the
cache, so the block must be brought into the cache. That means at
the initial stage no blocks are present in cache when program begins.
* Capacity miss- If the cache can’t contain all the blocks
needed during execution of a program capacity misses will
occur. Cache is too small.
+ Conflict miss- If the block placement strategy is set
associative or direct mapped , conflict misses will occur
because a block may be discarded and later retrieved if
too many block map to its set. These misses are also
called collision misses.
Click HereMiss rate reduction techniques
The following techniques are used to reduce the miss rate
1) Larger block size
2) Larger caches
3) Compiler optimization
1) Larger block size:
+ Using larger block in cache the miss rate can be reduced.
Larger block sizes will reduce compulsory misses.
Larger block size takes advantage of spatial locality
At the same time larger block increase the miss penalty.
Since it reduce the number of blocks in the cache, larger blocks
may increase the conflict misses and even capacity misses if the
cache is small
Choose the optimum size of the block such that miss rate is
reduced and other factor can't be increased.
Miss rate reduction techniques contd.
2) Larger caches:
* Larger caches reduce capacity misses.
* Orawback ie longer hit time and higher cost,
+ This technique is essentially popular in off-chip caches.
3) Compiler optimization:
+ The previous miss rate reduction technique requires changes to
or to the hardware: larger blocks, larger caches, higher
associativity, or pseudoassociativity. This technique reduces
miss rate using software approach.
+ Loop interchange
for (j=0; j<100; je)
t
for (i=0; i600; i+)
t
XUN) = 2° XC:
d
i
Miss rate reduction techniques contd.
The previous code has nested loops that access data in memory in non-eequential
‘order. Simply exchanging the nesting of the loops can make the cade access the
data in the order they are atored.
for (nO; 1500; 1+
[
for (0; 100; joe)
t
Eng = 2+ XIE:
i
1
Here the memory access is sequential and this technique reduces misses by
Improving spatial locality.Hit time reduction technique
Hit time is critical because it affects the clock rate of the processor. The
following techniques are used to reduce cache hit time
1) Small and Simple cache-
* Smaller hardware is faster, so a small cache certainly has lower hit time.
+ Smaller cache is easy to fit in onchip otherwise offchip time is included.
+ Simple cache means direct mapped cache. Here tag length is minimum than
other cache mapping technique. So searching time is reduced.
2) Avoid address translation to the cache:
* Translation of a virtual address to a physical address is taken more time.
* Use virtual address for the cache, since hits are much more common than
misses.
+ Cache which uses virtual address, is called virtual cache. Protection
security is reduced, So adding protection information to the virtual cache.
ee
Main memory organizations for
improving performance
+ Performance measures of a main memory emphasize both latency and
bandwidth.
+ Memory bandwidth is the number of bytes read or written per unit time, On
the other word, per unit time how many bytes are accessed from main memory
is called bandwidth.
+ Memory latency time is the time gap between two consecutive word accesses.
Since memory uses ORAM cell, one precharge time ( periodic refreshing time) is
extra needed to access the word.Wider main memory for higher bandwidth
a
ou gt
I co
Cal
il
cae
wens 1 f
‘Mamory
First level caches are often organized with a physical width of 1 word because most
CPU accesses are that size.
Doubling or quadrupling the width of the cache and the memory will therefore double
or quadruple the memory bandwidth
A wider memory has a narrow L1 cache and a wide L2 cache.
‘There is cost in wider connection between the CPU and memory, typically called a
memory bus.
CPU will still access the cache one word at a time, so there now needs to be
multiplexer between the cache and CPU.
Second level cache can help since the multiplexing can be between first and second level
caches.
Simple interleaved memory for higher bandwidth
car
+ Memory chip can be organized in 7
banks to read or write multiple
words at a time rather than a Cache
single word.
+ In general, the purpose of
interleaved memory is to try to a aw oo
take advantage of the potential ae os fas
memory bandwidth of all the chips ° 2 8
in the system. 4 5 6 7
* Most memory system activates 8 ° ad a
only those chips that containing the 2 ‘al 10 1s
needed words. So, power is less — ho im
required in the memory system. ‘eri. tot kerk
The banks are often 1 word wide so that the width of the bus and the cache nea
not change, but sending address to several banks permits them all to read
simultaneously.
In the above example, the addresses of the four banks are interleaved at the word
level.
Bank 0 has all words whose address modulo 4 is 0, bank? has all words whoseNee
Example
Assume the performance of 1-word wide primary memory organization is
7 4 clock cycles to send the address
+ $6 clock cycles for the access time per word
+ 4clock cycles to send a word of data.
Given a cache block of 4 words and that a word is 8 bytes, calculate the miss penalty and the effective
memory bandwidth. Recomputed the miss penalty and the memory bandwidth assuming we have
+ Main memory width of 2 words
+ Main memory width of 4 words
+ Interleaved main memory with 4 banks with each bank 1-word wide.
Ans:
+ Incase of main memory width one word
Miss penalty=(4+56+4)clock cycles x4=256 clock cycles
Memory bandwidth= ss ;
+ Incase of main memory width 2 words,
Miss penalty=(4+56+4)clock cycles x2=128 clock cycles
Memory bandwidth= = ;
+ Incase of main memory width 4 words
Miss penalty=(4+56+4)clack cycles x1=64 clock cycles
Memory bandwidth= =
2
+ Incase of interleaved main memory with 4 banks,
Miss penalty=(4+56+4x4)clock cycles =76 clock cycles
Memory bandwidth= “*° =Logical to Physical address translation using MMU
———<— Aread
Paocation 000 occupied
register memory
5000 ‘space
Logical Physical 5000]
address address
ceu 0340 5340 Free
memory
space
MMU
Main memory
* The addresses generated by CPU or program is called logical addresses.
+ The corresponding addresses in the physical memory occupied by the executing
program, are called physical addresses.
The memory management unit (MMU) maps each logical address to physical address
For example, if the relocation register or base register holds an address value 5000.
Then logical address 0 will mapped with 5000 address and similarly logical address 0340
is mapped to the physical address 5340,Paging
The program space will be known as a logical address
space and main memory epace will be physical address
space,
Total logical address space (generated by the CPU is
divided into equal size partitions and each partition is
known as a page.
Similarly the main memory is also divided into equal size
partitions and each partition is known as a frame.
Page sizesframe size. So, When a particular page is
getting loaded into a certain frame there will be no free
space remaining
Paging Hardware
Physical
Logical
address Ly
Physical
memory
Paging contd.
Example: The size of logical address apace is 128 KG and the page size is 2 KB.
1 How many pages are there? How many bits are used to represent page
number and page offact?
MN How many frames are there if the main memory size is 2MG? How many
bita are used to represent frame number?
To represent logical address 17 bits (217 or 128 KO) are required
Total number of pages = logical address space/ page size= 2° / 21 = 26 = 64
6 bite (17 -11) are representing the page number.
11 bits are representing the page offset.
To represent physical address 21 bits (22 or 2 MB) are required
Total number of frames = physical address space/ page sizo= 22 / 21 = 210m
1024
10 bits are representing the frame number.Paging example for a 32 byte memory
with 4 byte pages
Logical address space is 16 bytes (2*).
Page size i:
Physical memory of 32 bytes (2°)
Number of pages = Logical address space /page see=2/ 22 =4
Since page size = frame size,
number of frames=size of physical memory /frame size =2/2=8))
There are four pages Page0, pagel, page2, page3.
There are eight frames frame0, frame1, ame
According to page table (ininiext slide) paged, pagel, page2/vage3 are mapped
with frames, frame6,frame1 and frame2 respectively.
Paging example for a 32 byte memory with 4 byte pages contd.
no of pg - 2°4 “frame o
12°2 - 242)
so pg no 2
bits - 4 pgs pg size 4 = 2%(2) , d(data) = 2 bit
Physical address
f
page frame
number number
o[s
ife
apa
3
no of frame» 2°5 / 2°2 = 243), f= 3 bit = a frames
Logical addres:
pg size 4
242) so ofset
(d)-2
2
Page table
Logical
memory
Physical memoryLogieat ete ee
noe Addsuss = 6 bit
Page Size = 8 words / byl o®) :
Caleulate wo of pages & wo of prmrnis
zea
oe ieee
4) a) wo of bits of page
Side om rege APS
Wo of, pits 3% 4 eee Rae size 2228
[weap erga eat rm ete
a
15
PA
ta
ra [24 Some as LA TE
frome (in bits) = 3 & has?
mo of purrs o228 08) |T
4Paging Hardware with TLB
Logical
address
Llp
rinver_ sumer TLB hit
7
f
Physical
memory
Page table
Paging with TLB
‘The entire page table was kept in main memory.
In case of paging the problem is main memory is accessed two times.
One for finding the frame number in page table and another for accessing
the memory address specified by frame number.
‘To overcome this problem a high-speed cache is set up for page table entries
called a Translation Look aside Buffer (TLB)..
‘Translation Look aside Buffer (TL8) is nothing but a special cache used to keep
track of recently used transactions.
For a logical address, the processor examines the TLB ifa page table entry
Is present then itis called TLB hit, the frame number is retrieved and then
physical address is formed.
If a page table entry is not found in the TLB then TLB miss will occur, the page
number is used to index the page table.
Paging with TLB example
{A paging scheme uses a Translation Look aside buffer (718).
‘ATLB access takes 20 ns and a main memory access takes 100 ns.
What isthe effective memory access time (in ns) if the TLB hit ratio is 8096?
Solution:
TLB access time = 20 ns (given)
Main memory acces time
=200 ns (given)
Hit ratio = 0.8 (given) 13
Effective memory access time
menery
Effective memory access time = hit ratio x ( TLB access time + memory access time) +
{A hit ratio) x (TLB access time + 2 x memory access time)
Effective memory access time = 0.8 ( 20+ 100) + (1- 0.8) x ( 20+ 2x 100) ns
8x 120+0.2* 220 ns
140 ns
IFTLB Is not used here then the effective memory access time will be 200s.ae
Virtual Memory
+ Virtual memory is commonly implemented by swap in and swap out.
* Process reside on secondary memory. When the process is ready for execution it
be brought in into main memory.
+ The page is swapped into main memory unless the page will be needed.
After completion of execution the pages are swapped out from main memory
Swap out
OO
OO oo
820 00 word
2D wO wow
vO 2
Program
a
Program
a
Main memory
Virtual memory
* Virtual memory is a technique that allows the execution of processes that may 1
be completely in main memory .
+ It represents to the programmer in such a way that programmer thinks a large
amount of main memory available, but it really does not exist.
+ The size of the virtual memory is equivalent to the size of secondary memory.
* Gach address generated by CPU called virtual address (logical address) is mappa
with Physical address in main memory.
Advantages of virtual memory
i) The program larger than free memory space can be executed by using virtual
memory technique.
ii) The programmers do not need to worried about the size of the program.
iif) It allows muttiprogramming which increases the CPU utilization.Page Replacement algorithm
1, FIFO(First in First Out) algorithm
2, Optimal Page replacement algorithm
3. LRU(Least Recently Used) algorithmFIFO Page replacement Algorithm
In FIFO page replacement algorithm, which page come first that will
be replaced first
Reference String
PONS GEN EP
‘Available memory frame = 3
paces fe eet
crs elf] [|
Pe Fe
Eee.
LRU Page replacement Algorithm
{In LeU page replacement algorithm, replace the page that has not been used for
the longest period of time.
Reference String
Pare pas
‘Available memory frame = 3
pars pasreqp
ee if
(is ‘ :
‘Number of page fault = 8
Optimal Page replacement Algorithm
In Optimal page replacement algorithm, replace the page that will not be
used for the longest period of time.
Reference String
pean eae
‘Available memory frame = 3
parspqersa
HeTTT
aeglener Aae NEA, TIT
2/2 [2 |2] Faults =1
5S)
44
il
=
eel
a
BEI o els fe fey| tie
|r] |S | ote ohio [e fa
tho | ly |= loa ]a
atx ©
vlolalelz of .[sisfalt
2 ~ |
Th a] |: T SER NT
o 4t-lolele
Mia lol & a -———
ce -loly|+
dal ily 0 cnr
a SOs
{kis ies
= EEE
vo =
Mee Pele
Fy Oy by 201 30. 412130, 3, 251) 2) O44 410% BeOUNUUI EQUGAUDI Ue pate
| Optimal. Page Reploeorrent Atgovation
| 0,1, 2,0,38,0,4,2)%
|
|Ss Hit |i fo fo jw [3 |3 | 3 [ZZ [2
Go. O}0/O}7 12132 |¥ [2 /2i2{zAli fils
£1ElF[FIZE2 [ZI Ia [42 |? [oP fo
ee eR OP eR NM ES SHR ® Pe
Hert 7,011,2,0,3,9,4,2, 3,0,3,1,2,0 Pa Rpbamd of
%. Rene o> ) FIFo
Het Rules Noo his
Pyepat/ = 12 eof hits 2) OP tral Pye Ry
Re bias _ No-of Rebesmesy
“im” 3 3) Let hag VeoP tim fae Repent pc
pee pahiohe ib not od im bogat
Dares of Lime in fuse
Su 2a Zea eens 21212 ([2([2) 2/2/1212 2
{, Vyy fa ft jr [4 G Ty [wf rpep i pefe ye te
$3. 0 |o jo [o [ojo jo o lo lo |e Jelole [ole fo
a)F iF lFl4 [413 [3 13 313 (2l3l3lalaittls
ad a
RE Stoia > FO
ae i 1 ABs, Clk Bh, On Bh
Attar (2 pes PA DI
Pofruttse &urd ( Repo the frost acest, ved Pye in Past}
d
st Recent
Lea
4
W|-lo |r fe
o[-|o |» JE
vf-lole fe _
N]-lo Fo
SE
w{—]o fo |e =
loin a.
w{kfolm|E
| vel
als] ol9F me
“>| ol9 = o
ae OR
Nisley j= ~
~
wlrfoja|* Te
2
!X Jo [09 |= >
ons
ss
+ sly
xl-jolk|E wf 8
xj-folrej* > x
~|ofer|* oe
*
Ol .
a
ree
SG