CS433: Computer
System Organization
Main Memory
Virtual Memory
Translation Lookaside Buffer
Acknowledgement
Slides from previous CS433 semesters
TLB figures taken from
http://www.cs.binghamton.edu/~kang/cs350/Chap
Memory figures from http://larc.ee.nthu.edu.
tw/~cthuang/courses/ee3450/lectures/flecture
/F0713.gif
Topics Today
Main Memory (chap 5.8, 5.9)
Simple main memory
Wider memory
Interleaved memory
Memory technologies
Virtual Memory (chap 5.10, 5.11)
Basics
Address translation
Simple Main Memory
Consider a memory with these
paremeters:
1 cycle to send an address
6 cycles to access each word
1 cycle to send word back to CPU/cache
What is the miss penalty for a 4-word
block?
(1 + 6 + 1) per word x 4 words = 32 cycles
Wider Main Memory
Make the memory wider
Higher bandwidth by reading more
words at once
Same problem…
1 cycle to send an address
6 cycles to access each doubleword
1 cycle to send word back to
CPU/cache
Miss penalty:
(1 + 6 + 1) x 2 doublewords = 16
Wider Main Memory
Make the memory wider
Higher bandwidth
Read more words in parallel
Cost:
Wider bus
Larger expansion size
Minimum increment must
correspond to width
Double width minimum
increment must be doubled
Interleaved Main Memory
Organize memory in
banks
Subsequent words map
to different banks
Example: word A in bank
(A mod M)
Within a bank, word A in
location (A div M)
Interleaved Main Memory
Same problem…
1 cycle to send an
address
6 cycles to access each
word
1 cycle to send word
back to CPU/cache
Miss penalty:
1 + 6 + 1 x 4 = 11 cycles
Memory Technologies
Dynamic Random Access Memory (DRAM)
Optimized for density, not speed
One transistor per cell
Cycle time about twice the access time
Destructive reads
Must refresh every few ms
Access every row
Memory Technologies
Static Random Access Memory (SRAM)
Optimized for speed, then density
About 4 – 6 transistors per cell
Separate address pins
Static = no refresh
Access time = cycle time
Memory Technologies
ROM
Read-only
1 transistor per cell
Nonvolatile
Flash
Allows memory to be modified
Nonvolatile
Slow write
Virtual Memory
Motivation:
Give each running program its own private
address
Restrict process from modifying other processes
Want programs to be protected from each other
bug in one program can’t corrupt memory in another
program
Programs can run in any location in physical
memory
Simplify loading the program
Virtual Memory
Motivation:
Want programs running simultaneously to share
underlying physical memory
Want to use disk as another level in the memory
hierarchy
Treat main memory as a cache for disk
Virtual Memory
The program sees virtual addresses
Main memory sees physical addresses
Need translation from virtual to physical
Virtual Memory vs Cache
Virtual Memory
Longer miss penalty
Handled by OS
Size dependent of physical address space
Cache
Handled by hardware
Size independent of physical address space
Alias
2 virtual addresses map
to the same physical
address
Consistency problems
Virtual Memory
Block replacement strategy
LRU
Write-back strategy
With dirty bit
Allows blocks to be written to disk when the block
is replaced
Memory-Management Scheme
Segmentation
Paging
Segmentation + Paging
Segmentation
Supports user view of memory.
Virtual memory is divided into variable
length regions called segments
Virtual address consists of a segment
number and a segment offset
<segment-number, offset>
Fragmentation
Memory allocation is a dynamic process that
uses a best fit or first fit algorithm
Fragmentation is NOT segmentation
External
Internal
Data
Internal vs External
External fragmentation:
free space divided into many small pieces
result of allocating and deallocating pieces of the storage
space of many different sizes
one may have plenty of free space, it may not be able to all
used, or at least used as efficiently as one would like to
Unused portion of main memory
Internal fragmentation:
result of reserving a piece of space without ever intending
to use it
Unused portion of page
Paging
Virtualmemory is divided into fixed-size
blocks called pages
typically a few kilobytes
should be a natural unit of transfer to/from disk
Page replacement
LRU, MRU, Clock… etc
Page placement
Fully associative - efficient
Paging
Page Identification
Virtual address is divided into
<virtual page number, page offset>
The virtual page number is translated into a
physical page number
Provides indirection
Indirection is always good
Translation cached in a buffer
Paging vs Segmentation
Paging: Segmentation:
Block replacement easy Block replacement hard
Fixed-length blocks Variable-length blocks
Need to find
contiguous, variable-
sized, unused part of
main memory
Paging vs Segmentation
Paging: Segmentation:
Invisible to application Visible to application
programmer programmer
No external fragmentation No internal fragmentation
There is internal Unused pieces of main
fragmentation memory
Unused portion of page There is external
Units of code and data are fragmentation
broken up into separate Keeps blocks of code or
pages data as single units
Segmentation + Paging
Segmentation is typically combined with
paging
Paging is transparent to the programmer
Segmentation is visible to the programmer
Each segment is broken into fixed-size pages
Simplify page replacement
Memory doesn’t have to be contiguous
Page Table
Page table is a collection of PTEs that maps
a virtual page number to a PTE
PTEs = page table entry
The page table tells where a particular virtual
page address is stored on disk and in the main
memory
Each process has its own page table
Size depends on # of pages, page size, and
virtual address space
Page Table
virtual address is divided into
virtual page number
page offset
Page Table
Page-table base-register tells where the page table starts
Page offset directly maps to physical address page offset
Reference bit is set when a page is accessed
Page Table
Each virtual memory reference can cause two physical memory accesses
One to fetch the page table
One to fetch the data
Page Table
Address translation must be performed for every memory
access
Need optimization!
TLB
Problems:
Every virtual memory reference 2 physical
memory accesses
Every memory access -> address translation
Toovercome this problem a high-speed
cache is set up for page table entries
Called a Translation Lookaside Buffer (TLB)
TLB
The Translation Lookaside Buffer
a small cache
contains translations for the most recently used
pages
Theprocessor consults the TLB first when
accessing memory
Address Translation
Virtual page number (Tag, Index)
Index tells which row to access
Address Translation
TLB is a cache
Fully associative for efficiency
TLB
Given a virtual address, processor examines
the TLB
If page table entry is present (TLB hit)
the frame number is retrieved and the real
address is formed
TLB
Ifpage table entry is not found in the TLB
(TLB miss)
Hardware checks page table and loads new Page
Table Entry into TLB
Hardware traps to OS, up to OS to decide what to
do
OS knows which program caused the TLB fault
the software handler will get the address mapping
TLB Miss
If page is in main memory (page hit)
If the mapping is in page table, we add the entry
to the TLB, evicting an old entry from the TLB
TLB
If page is on disk (page fault)
load the page off the disk into a free block of
memory
update the process's page table
TLB
TLB