UNIT 5 Notes
UNIT 5 Notes
Memory Concepts and Hierarchy – Cache Memories: Mapping and Replacement Techniques –
Virtual Memory – DMA – I/O – Accessing I/O: Parallel and Serial Interface – Interrupt I/O
Memory Concept
• Memories are made up of registers. Each register in the memory is one storage
location also called memory location.
• Generally, the total number of bits that a memory can store is its capacity. Most
of the types the capacity is specified in terms of bytes (group of eight bits).
• The data stored in a memory by a process called writing and are retrieved from
the memory by a process called reading. Fig. 5.1.1 illustrates in a very simplified
way the concept of write, read, address and storage capacity for a generalized
memory.
• As shown in the Fig. 8.1.1 memory unit stores binary information in groups of
bits called words. A word in memory is an entity of bits that moves in and out of
storage as a unit. A word having group of 8-bits is called a byte. Most computer
memories use words that are multiples of eight bits in length. Thus, a 16-bit word
contains two bytes, and a 32-bit word is made of 4-bytes.
• The Fig. 8.1.2 shows the block diagram of memory unit. Then data lines provide
the information to be stored in memory and the k address lines specify the
particular word chosen among the many available. The two control inputs specify
the direction transfer.
• When there are k address lines we can access 2 k memory words. For example, if k
= 10 we can access 210 = 1024 memory words.
Illustrative Examples
Example 8.1.1 A bipolar RAM chip is arranged as 16 words. How many bits are
stored in the chip?
Example 8.1.2 How many address bits are needed to operate a 2 K × 8 ROM ?
Example 8.1.3 How many locations are addressed using 18 address bits?
Characteristics of Memory
The Table 8.1.2 shows the characteristics of some common memory technologies.
Characteristics of some common memory technologies
• The processor of a computer can usually process instructions and data faster than
they are fetched from the memory unit. The memory cycle time, then is the
bottleneck in the system. One way to avoid this bottleneck is to use a cache
memory. Cache memory is a small, fast memory that is inserted between the
larger, slower main memory and the processor. It usually holds the currently active
segments of a program and their data.
• In most modern computers, the physical main memory is not as large as the
address space spanned by an address issued by the processor. Here, the virtual
memory technique is used to extend the apparent size of the physical memory. It
uses secondary storage such as disks, to extend the apparent size of the physical
memory.
• Principle of operation
• Physical characteristics
• Read/Write Memories (RWMs) are those memories, which allows both read and
write operations. They are used in applications where data has to change
continuously. They are also used for temporary storage of data. ROM memories
allow only read operation. They are used to store monitor programs and constants
used in the program.
• The volatile memories which can hold data as long as power is ON are called
Static RAMS (SRAMs). Dynamic RAMS (DRAMs) stores the data as a charge on
the capacitor and they need refreshing of charge on the capacitor after every few
milliseconds to hold the data even if power is ON.
• EPROM and EEPROM are erasable memories in which the stored data can be
erased and new data can be stored.
• The semiconductor memories are also classified as Bipolar and MOS memories
depending upon the type of transistors used to construct the individual cell.
• A primary memory is costly and has a limited size. This memory is mainly used
for storing the currently processing data.
• Secondary storage is used to store data and instructions (programs) when they are
not being processed.
• The devices those are used as secondary storage are non-volatile and have a
larger storage capacity. Also, they are less expensive as compared to primary
storage devices.
• However, they are slower in comparison. The examples are hard disks, floppies,
CD-ROMs, magnetic tapes etc. This type of memory is also called secondary
memory, auxiliary memory or peripheral storage.
• Fig. 8.1.4 shows the classification of secondary storage devices. They can be
categorized broadly according to their access types as sequential and random
(direct).
Memory Hierarchy AU; Dec.-14, 18, May-19
• Very fast memory system can be achieved if SRAM chips are used. These chips
are expensive and for the cost reason it is impracticable to build a large main
memory using SRAM chips. The only alternative is to use DRAM chips for large
main memories.
• Processor fetches the code and data from the main memory to execute the
program. The DRAMS which form the main memory are slower devices. So it is
necessary to insert wait states in memory read/write cycles. This reduces the speed
of execution.
• The solution for this problem is come out with the fact that most of the computer
programs work with only small sections of code and data at a particular time. In
the memory system, small section of SRAM is added along with main memory,
referred to as cache memory.
• The program which is to be executed is loaded in the main memory, but the part
of program (code) and data that work at a particular time is usually accessed from
the cache memory.
• This is accomplished by loading the active part of code and data from main
memory to cache memory. The cache controller looks after this swapping between
main memory and cache memory with the help of DMA controller.
• The cache memory just discussed is called secondary cache. Recent processors
have the built-in cache memory called primary cache.
• DRAMs along with cache allow main memories in the range of tens of
megabytes to be implemented at a reasonable cost, the size and better speed
performance. But the size of memory is still small compared to the demands of
large programs with voluminous data. A solution is provided by using secondary
storage, mainly magnetic disks and magnetic tapes to implement large memory
spaces. Very large disks are available at a reasonable price, sacrificing the speed.
• From the above discussion, we can realize that to make efficient computer system
it is not possible to rely on a single memory component, but to employ a memory
hierarchy. Using memory hierarchy all of different types of memory units are
employed to give efficient computer system. A typical memory hierarchy is in Fig.
8.2.1.
Memory hierarchy
Fig. 8.2.3 shows common memory hierarchies with two, three and four levels.
Cache Memories
• Definition: The part of program (code) and data that work at a particular time is
usually accessed from the SRAM memory. This is accomplished by loading the
active part of code and data from main memory to SRAM memory. This small
section of SRAM memory added between processor and main memory to speed up
execution process is known as cache memory.
• A cache memory system includes a small amount of fast memory () and a large
amount of slow memory (DRAM). This system is configured to simulate a large
amount of fast memory.
• Cache controller implements the cache logic. If processor finds that the addressed
code or data is not available in cache - the condition referred to as cache miss, the
desired memory block is copied from main memory to cache using cache
controller. The cache controller decides which memory block should be moved in
or out of the cache and in or out of main memory, based on the requirements. (The
cache block is also known as cache slot or cache line.)
• The percentage of accesses where the processor finds the code or data word it
needs in the cache memory is called the hit rate/hit ratio. The hit rate is normally
greater than 90 percent.
Example 8.4.1 The application program in a computer system with cache uses
1400 instruction acquisition bus cycle from cache memory and 100 from main
memory. What is the hit rate? If the cache memory operates with zero wait state
and the main memory bus cycles use three wait states, what is the average number
of wait states experienced during the program execution ?
Average wait states = Total wait states / Number of memory bus cycles
• Two most commonly used system organizations for cache memory are :
• Look-aside and
• Look-through
• The Fig. 8.4.2 shows system of look-aside cache organization. Here,the cache and
the main memory are directly connected to the system bus.
• In this system, the CPU initiates a memory access by placing a physical address
on the memory address bus at the start of read or write cycle.
• If match is not found, i.e., in case of cache miss, the desired access is completed
by a read or write operation directed to M 2. In response to cache miss, a block of
data that includes the target address is transferred from M 2 to M1. The system bus is
used for this transfer and hence it is unavailable for other uses like I/O operations.
• Unlike the look-aside system, look-through cache system does not automatically
send all memory requests to main memory; it does so only after a cache miss.
• A look-through cache systems use wider local bus to link M 1 and M2, thus
speeding up cache-main-memory transfers (block transfers).
Disadvantages:
• It is complex.
• It is costly.
• It takes longer time for M2 to respond to the CPU when a cache miss occurs.
• The Fig. 8.4.4 shows the small cache system. Here each cache block is 4 bytes
and each memory address is 10-bit long. Due to this 8 high-order bits form the tag
or block address and the 2 low-order bits define a displacement address within the
block.
• When a block is assigned to the cache data memory, its tag is also placed in the
cache tag memory.
• During read operation, the 8 high-order bits of an address are compared with
stored tags in the cache tag memory to find match (cache hit). The stored tag
pinpoints the corresponding block in cache data memory and the 2-bit
displacement is used to read the target word.
• The Fig. 8.4.5 shows execution of cache write operation. It uses same addressing
technique as in case of read operation.
• When cache hit occurs, the new data, in this case E6, is stored at the location
pointed by address in the cache data memory, thereby overwriting the old data 5 A.
• Now data in the cache data memory and data in the main memory for given
address is different. This causes cache consistency problem.
Program Locality
• In cache memory system, prediction of memory location for the next access is
essential. This is possible because computer systems usually access memory from
the consecutive locations. This prediction of next memory address from the current
memory address is known as program locality.
• The principle of program locality may not work properly when program executes
JUMP and CALL instructions. In case of these instructions, program code is not in
sequence.
Locality of Reference
• We know that program may contain a simple loop, nested loops or a few
procedures that repeatedly call each other. The point is that many instructions in
localized area of the program are executed repeatedly during some time period and
the remainder of the program is accessed relatively infrequently. This is referred to
as locality of reference.
• The spatial means that instructions stored nearby to the recently executed
instruction are also likely to be executed soon.
• The spatial aspect suggests that instead of bringing just one instruction or data
from the main memory to the cache, it is wise to bring several instructions and data
items that reside at adjacent address as well. We use the term block to refer to a set
of contiguous addresses of some size.
Block Fetch
• A block fetch can retrieve the data located before the requested byte (look
behind) or data located after the requested byte (look ahead) or both.
• When CPU needs to access any byte from the block, entire block that contains the
needed byte is copied from main memory into cache.
• The size of the block is one of the most important parameters in the design of a
cache memory system.
Block size
1. If the block size is too small, the look-ahead and look-behind are reduced and
therefore the hit rate is reduced.
2. Larger blocks reduce the number of blocks that fit into a cache. As the number
of blocks decrease, block rewrites from main memory becomes more likely.
3. Due to large size of block, the ratio of required data and useless data is less.
4. Bus size between the cache and the main memory increases with block size to
accommodate larger data transfers between main memory and the cache, which
increases the cost of cache memory system.
• The cache design elements include cache size, mapping function, replacement
algorithm write policy, block size and number of caches.
Cache size: The size of the cache should be small enough so that the overall
average cost per bit is close to that of main memory alone and large enough so that
the overall average access time is close to that of the cache alone.
Mapping function: The cache memory can store a reasonable number of blocks at
any given time, but this number is small compared to the total number of blocks in
the main memory. Thus we have to use mapping functions to relate the main
memory blocks and cache blocks. There are two mapping functions commonly
used direct mapping and associative mapping. Detail description of mapping
functions is given in section 8.4.6.
Replacement algorithm: When a new block is brought into the cache, one of the
existing blocks must be replaced, by a new block.
• There are four most common replacement algorithms:
• Least-Recently-Used (LRU)
• First-In-First-Out (FIFO)
• Least-Frequently-Used (LFU)
• Random
Write policy: It is also known as cache updating policy. In cache system, two
copies of the same data can exist at a time, one in cache and one in main memory.
If one copy is altered and other is not, two different sets of data become associated
with the same address. To prevent this, the cache system has updating systems
such as: write through system, buffered write through system and write-back
system. The choice of cache write policy also changes the design of cache.
Mapping
• Usually, the cache memory can store a reasonable number of memory blocks at
any given time, but this number is small compared to the total number of blocks in
the main memory. The correspondence between the main memory blocks and
those in the cache is specified by a mapping function.
• The mapping techniques are classified as:
1. Direct-mapping technique
3. Set-associative techniques.
1. Direct-Mapping
• In this technique, each block from the main memory has only one possible
location in the cache organization. In this example, the block i of the main memory
maps onto block j (j = i modulo 128) of the cache, as shown in Fig. 8.4.6.
Therefore, whenever one of the main memory blocks 0, 128, 256 ... is loaded in are
stored in cache, it is stored in cache block 0. Blocks 1, 129, 257 … block 1 and so
on.
j = i modulo m
where
i =Main memory block number
• To implement such cache system, the address is divided into three fields, as
shown in Fig. 8.4.6.
• The lower order 4-bits select one of the 16 words in a block. This field is known
as word field.
• The second field known as block field is used to distinguish a block from other
blocks. Its length is 7-bits since 27 = 128.
• When a new block enters the cache, the 7-bit cache block field determines the
cache position in which this block must be stored.
• The third field is a tag field. It is used to store the high-order 5-bits of memory
address of the block. These 5-bit (tag bits) are used to identify which of the 32
blocks (pages) that are mapped into the cache.
• When memory is accessed, the 7-bit cache block field of each address generated
by CPU points to a particular block location in the cache. The high-order 5-bits of
the address are compared with the tag bits associated with that cache location. If
they match, then the desired word is in that block of the cache. If there is no match,
then the block containing the required word must first be read from the main
memory and loaded into the cache.
• This means that to determine whether requested word is in the cache, only tag
field is necessary to be compared. This needs only one comparison.
• The main drawback of direct mapped cache is that if processor needs to access
same memory locations from two different pages of the main memory frequently,
the controller has to access main memory frequently. Since only one of these
locations can be in the cache at a time. For example, if processor want to access
memory location 100H from page 0 and then from page 2, the cache controller has
to access page 2 of the main memory. Therefore, we can say that direct-mapped
cache is easy to implement, however, it is not very flexible.
• The 12-tag bits are required to identify a memory block when it is resident in the
cache. The high-order 12-bits of an address received from the CPU are compared
to the tag bits of each block of the cache to see if the desired block is present.
• Once the desired block is present, the 4-bit word is used to identify the necessary
word from the cache.
• This technique gives complete freedom in choosing the cache location in which
to place the memory block. Thus, the memory space in the cache can be used more
efficiently.
• A new block that has to be loaded into the cache has to replace (remove) an
existing block only if the cache is full.
3. Set-Associative Mapping
• A block of data from any page in the main memory can go into a particular block
location of any direct-mapped cache. Hence the contention problem of the direct-
mapped technique is eased by having a few choices for block placement.
• The required address comparisons depend on the number of direct-mapped
caches in the cache system. These comparisons are always less than the
comparisons required in the fully-associative mapping.
• Fig. 8.4.8 shows two way set-associative cache. Each page in the main memory is
organized in such a way that the size of each page is same as the size of one
directly mapped cache. It is called two-way set-associative cache because each
block from main memory has two choices for block placement.
• As there are two choices, it is necessary to compare address of memory with the
tag bits of corresponding two block locations of particular set. Thus for two-way
set-associative cache, we require two comparisons to determine whether a given
block is in the cache.
• Since there are two direct-mapped caches, any two bytes having same offset from
different pages can be in the cache at a time. This improves the hit rate of the cache
system.
• The set field needs 6-bits to determine the desired block from 64 sets. However,
there are now 64 pages. To identify a block belongs to a particular page from 64
pages, six tag bits are required.
Example 8.4.2 Consider a cache consisting of 256 blocks of 16 words each, for a
total of 4096 (4 K) words and assume that the main memory is addressable by a
16-bit address and it consists of 4 K blocks. How many bits are there in each of the
TAG, BLOCK/SET and word fields for different mapping techniques ?
Solution: We know that memory address is divided into three fields. We will now
find the exact bits required for each field in different mapping techniques.
a) Direct-mapping
Word bits: We know that each block consists of 16 words. Therefore, to identify
each word we must have (24 = 16) four bit reserved for it.
Block bits: The cache memory consists of 256 blocks and using direct-mapped
technique, block k of the main memory maps onto block k modulo 256 of the
cache. It has one to one correspondence and requires unique address for each
block. To address 128 block we require (28 = 256) eight bits.
Tag bits: The remaining 4 (16 – 4 - 8) address bits are tag bits which stores the
higher address of the main memory.
b) Associative-mapping
Word bits: The word length will remain same i.e. 4 bits.
Tag bits: To address each block in the main memory (2 12 = 4096) 12 bits are
required and therefore, there are 12 tag bits.
The main memory address for direct mapping technique is divided as shown
below:
c) Set-associative mapping
Let us assume that there is a 2-way set-associative mapping. Here, cache memory
'is mapped with two blocks per set. The set field of the address determines which
set of the cache might contain the desired block.
Word bits: The word length will remain same i.e. 4 bits.
Set bits: There are 128 sets (256/2). To identify each set (2 7 = 128) seven bits are
required.
Tag bits: The remaining 5 (16 – 4-7) address bits are the tag bits which stores
higher address of the main memory.
The main memory address for 2-way set associative mapping technique is divided
as shown below:
Example 8.4.3 A block set-associative cache consists of 64 blocks divided into 4
block sets. The main memory contains 4096 blocks, each consists of 128 words of
16 bits length:
ii) How many bits are there in each of the TAG, SET and WORD fields?
= Number of blocks x Number of words per block x Number of bits per word
= 4096 × 128 × 16
= 8388608 bits
There are 128 words in each block. Therefore, to identify each word (2 7 = 128) 7
bits are required.
iii) Number of T bits in set field: There are 64 blocks and each set consists of 4
blocks.
Therefore, there are 16 (64/4) sets. To identify each set (2 4 = 16) four bits are
required.
iv) Number of bits in tag field: The total words in the memory are :
To address these words we require (219 = 524288) 19 address lines. Therefore, tag
bits are eight (19-7-4).
Example 8.4.5 A two way set associative cache memory uses block of four words.
The cache can accommodate a total of 2048 words from main memory. The main
memory size is 128 K x 32.
i) How many bits are there in the tag index, block and word field of address
format?
Solution: Number of bits in main memory address = log2 128K = log217 = 17 bit
Example 8.4.6 A direct mapped cache has the following parameters: cache size =
1 K words, Block size 128 words and main memory size is 64 K words. Specify
the number of bits in TAG, BLOCK and WORD in main memory address.
Example 8.4.7 How many total bits are required for a direct-mapped cache with 16
kb of data and 4-word blocks, assuming a 32-bit address?
Solution:
Example 8.4.8 You have been asked to design a cache with the following
properties:
1) Data words are 32 bits each.
In the below Fig.8.4.11, there are 8 fields (labeled a,b,c,d,e,f,g and h), you will
need to indicate the proper name or number of bits for a particular portion of this
cache configuration. AU May-19, Marks 8
Solution:
f. (name) -You are being asked to show what part of a physical address form the
index, offset, and tag. < f > refers to the most significant bits of the address - so
this is the tag.
g. (name) - It follows that the next part of the address is the index.
c. (number) - There are 211 bits / block and there are 25 bits / word. Thus there are
26 words / block so we need 6 bits of offset.
b. (number) - There are 211 blocks and the cache is direct mapped (or "1-way set
associative"). Therefore, we need 11 bits of index.
a. (number) - The remaining bits form the tag. Thus, 32 - 6 - 11 = 15 bits of tag.
d. (number) - Field < d > refers to the fact that a tag must be stored in each block.
Thus, 15 bits are kept in each block.
e. (number) Field < e > asks you to specify the total number of bits / block. This.is
2048.
We need to compare the valid bit associated with the block, the tag stored in the
block, and the tag associated with the physical address to determine if the cache
entry is useable or not. The tags should be the same and the valid bit should be 1.
Cache Size
There are 2048 blocks in the cache and there are 2048 bits / block. There are 8
bits/byte. Thus, there are 256 bytes / block
2048 blocks × 256 bytes / block =219 bytes (or 0.5 MB)
• When a new block is brought into the cache, one of the existing blocks must be
replaced, by a new block.
• In case of direct-mapping cache, we know that each block from the main memory
has only one possible location in the cache, hence there is no choice. The previous
data is replaced by the data from the same memory location from new page of the
main memory.
• Least-Recently-Used (LRU)
• First-In-First-Out (FIFO)
• Least-Frequently-Used (LFU)
• Random
• Least-Recently-Used: In this technique, the block in the set which has been in
the cache longest with no reference to it, is selected for the replacement. Since we
assume that more-recently used memory locations are more likely to be referenced
again. This technique can be easily implemented in the two-way set-associative
cache organization.
• First-In-First-Out: This technique uses same concept that stack implementation
uses in the microprocessors. In this technique, the block which is first loaded in the
cache amongst the present blocks in the cache is selected for the replacement.
• Least-Frequently-Used: In this technique, the block in the set which has the
fewest references is selected for the replacement.
• Random: Here, there are no specific criteria for replacement of any block. The
existing blocks are replaced randomly. Simulation studies have proved that random
replacement algorithm provides only slightly inferior performance to algorithms
just discussed.
Example 8.4.9 Consider web browsing application. Assuming both client and
server are involved in the process of web browsing application, where can caches
are placed to speed up the process? Design a memory hierarchy for the system.
Show the typical size and latency at various levels of the hierarchy. What is the
relationship between cache size and its access latency? What are the units-of data
transfers between hierarchies? What is the relationship between the data location,
data size, and transfer latency? AU: Dec.-18, Marks 7
Solution: a) Assuming both client and server are involved in the process of web
browsing application, caches can be placed on both sides - web browser and server.
1. Browser cache, size = fraction of client computer disk, latency = local disk
= latency.
4. Server storage, size = server storage, latency = WAN + server storage. Latency
is not directly related to cache size.
• If the required page is not found in the main memory, the page fault occurs and
the required page is loaded into the main memory from the secondary memory.
•However, if there is no vacant space in the main memory to copy the required
page, it is necessary to replace the required page with one of the existing page in
the main memory which is currently not in use.
• Thus we can say that the page replacement is a basic requirement of demand
paging.
FIFO Algorithm
The next reference (2) replaces page 6, because page 6 was brought in first. Since 0
is the next reference and 0 is already in memory, we have no fault for this
reference. The first reference to 3 results in page 0 being replaced, since it was the
first of the three pages in memory (0, 1 and 2) to be brought in. This process
continues as shown in Fig. 8.5.9.
Optimal algorithm
• It simply states that "Replace the page that will not be used for the longest period
of time".
• For example, on our sample reference string, the optimal page replacement
algorithm would yield nine page faults, as shown in Fig. 8.5.10. The first three
references cause faults that fill the three empty frames.
Reference string
• The reference to page 2 replaces page 6, because 6 will not be used until
reference 18, whereas page 0 will be used at 5, and page 1 at 14. The reference to
page 3 replaces page 1, as page 1 will be the last of the three pages in memory to
be referenced again. With only nine page faults, optimal replacement is much
better than a FIFO algorithm, which had 15 faults. (If we ignore the first three,
which all algorithms must suffer, then optimal replacement is twice as good as
FIFO replacement.) In fact, no replacement algorithm can process this reference
string in three frames with less than nine faults.
LRU algorithm
• The algorithm which replaces the page that has not been used for the longest
period of time is referred to as the Least Recently Used (LRU) algorithm.
• The result of applying LRU replacement to our example reference string is shown
in Fig. 8.5.11.
• When the reference to page 4 occurs, LRU replacement sees that, of the three
frames in memory, page 2 was used least recently. The most recently used page is
page 0, and just before that page 3 was used. Thus, the LRU algorithm replaces
page 2, not knowing that page 2.
• When it then faults for page 2, the LRU algorithm replaces page 3 since, of the
three pages in memory {0, 3, 4}, page 3 is the least recently used.
• LRU replacement with 12 faults is still much better than FIFO replacement with
15.
Counting algorithms
• In the counting algorithm the a counter of the number of references that have
been made to each pages are kept, and based on these counts the following two
schemes work.
• LFU algorithm: The Least Frequently Used (LFU) page replacement algorithm
requires that the page with the smallest count be replaced.
• The reason for this selection is that an actively used page should have a large
reference count.
• This algorithm suffers from the situation in which a page is used heavily during
the initial phase of a process, but then is never used again.
• MFU algorithm: The Most Frequently Used (MFU) page replacement algorithm
is based on the argument that the page with the smallest count was probably just
brought in and has yet to be used.
Example 8.5.5: Explain page replacement algorithms. Find out page fault for
following string using LRU method
60 12 0 30 4 2 30 321 20 15
Solution:
Example 8.5.6: Explain page replacement algorithm. Find out page fault for
following string using LRU method. Consider page frame size 3.
7012030423032120
• The virtual memory technique is used to extend the apparent size of the
physical memory.
• It uses secondary storage such as disks, to extend the apparent size of the physical
memory.
• When a program does not completely fit into the main memory, it is divided into
segments. The segments which are currently being executed are kept in the main
memory and remaining segments are stored in the secondary storage devices, such
as a magnetic disk.
• If an executing program needs a segment which is not currently in the main
memory, the required segment is copied from the secondary storage device.
• In virtual memory, the addresses that processor issues to access either instruction
or data are called virtual or logical address. The set of such addresses is called
address space. These addresses are translated into physical addresses by a
combination of hardware and software components.
Concept of Paging
• Fig. 8.5.1 shows a typical memory organization that implements virtual memory.
The memory management unit controls this virtual memory system. It translates
virtual address into physical addresses.
Segment translation :
• Every segment selector has a linear base address associated with it, and it is
stored in the segment descriptor. A selector is used to point a descriptor for the
segment in a table of descriptors. The linear base address from the descriptor is
then added to the offset to generate the linear address. This process is known as
segmentation or segment translation. If paging unit is not enabled then the linear
address corresponds to the physical address. But if paging unit is enabled, paging
mechanism translates the linear address space into the physical address space by
paging translation.
Page translation
Fig. 8.5.4 shows the translation of the virtual page number to a physical page
number. The physical page number constitutes the upper portion of the physical
address, while the page offset, which is not changed, constitutes the lower portion.
The number of bits in the page offset field decides the page size.
• The page table is used to keep the information about the main memory location of
each page. This information includes the main memory address where the page is
stored and the current status of the page.
• To obtain the address of the corresponding entry in the page table the virtual page
number is added with the contents of page table base register, in which the starting
address of the page table is stored. The entry in the page table gives the physical
page number, in which offset is added to get the physical address of the main
memory.
Example 8.5.1 The logical address space in a computer system consists of 128
segments. Each segment can have up to 32 pages of 4 K words each. Physical
memory consists of 4 K blocks of 4 K words in each. Formulate the logical and
physical address formats.
Solution :
iii) If a page consists of 4 K words, how many pages and blocks are there in the
systems.
Solution :
i)Words in the address space 232 = 4 G words
=224 = 16 M words
=216 = 64 K words
• To support demand paging and virtual memory processor has to access page table
which is kept in the main memory. To reduce the access time and degradation of
performance, a small portion of the page table is accommodated in the memory
management unit. This portion is called Translation Lookaside Buffer (TLB).
• It is used to hold the page table entries that corresponds to the most recently
accessed pages. When processor finds the page table entries in the TLB it does not
have to access page table and saves substantial access time.
• The Fig. 8.5.6 shows a organization of a TLB where the associative mapping
technique is used.
• As shown in the Fig. 8.5.6, a given virtual address is compared with the TLB
entries for the reference page by MMU. If the page table entry for this page is
found in the TLB, the physical address is obtained immediately.
• If entry is not found, there is a miss in the TLB, then the required entry is
obtained from the page table in the main memory and the TLB is updated.
• It is important that the contents of the TLB be coherent with the contents of page
tables in the memory. When page table entries are changed by an operating system,
it must invalidate the corresponding entries in the TLB. One of the control bits in
the TLB is provided for this purpose. When an entry is invalidated, the TLB
acquires the new information as a part of the MMU's normal response to access
misses.
• If the page required by the processor is not in the main memory, the page fault
occurs and the required page is loaded into the main memory from the secondary
storage memory by special routine called page fault routine.
• This technique of getting the desired page in the main memory is called demand
paging.
• When a program is to be swapped in, the pager guesses which pages will be used
before the program is swapped out again. Instead of swapping in a whole process,
the pager brings only those necessary pages into memory. Thus, it avoids reading
into memory pages that will not be used any way, decreasing the swap time and the
amount of physical memory needed.
• With this scheme, we need some form of hardware support to distinguish
between those pages that are in memory and those pages that are on the disk. Fig.
8.5.8 shows hardware to implement demand paging.
• As shown in the Fig. 8.5.8, the valid-invalid bits are used to distinguish between
those pages that are in memory and those pages that are on the disk. When bit is set
to "valid", it indicates that the associated page is both legal and in memory. It the
bit is set to "invalid", it indicates that the page either is not valid (that is, not in the
logical address space of the process), or is valid but is currently on the disk. If the
process tries to use a page that was not brought into memory, access to a page
marked invalid causes a page-fault trap. This trap is the result of the operating
system's failure to bring the desired page into memory. The procedure for handling
this page fault is illustrated in Fig. 8.5.8.
1. It checks an internal table for this program, to determine whether the reference
was a valid or invalid memory access.
4. It schedules a disk operation to read the desired page into the newly allocated
frame.
5. When the disk read is complete, it modifies the internal table kept with the
program and the page table to indicate that the page is now in memory.
6. It restarts the instruction that was interrupted by the illegal address trap. The
program can now access the page as though it had always been in memory.
• The hardware to implement demand paging is the same as the hardware for
paging and swapping :
Page table: This table has the ability to mark an entry invalid through a valid-
invalid bit or special value of protection bits.
Secondary memory: This memory holds those pages that are not present in main
memory. The secondary memory is usually a hard-disk. It is known as the swap
device, and the section of disk used for this purpose is known as swap space or
backing store.
Example 8.5.4: Calculate the effective address time if average page-fault service
time of 20 milliseconds and a memory access time of 80 nanoseconds. Let us
assume the probability of a page fault 10 %. Dec.-09, Marks 2
=2,000,072 (nanoseconds)
1. The I/O transfer rate is limited by the speed with which the CPU can test and
service a device.
2. The time that the CPU spends testing I/O device status and executing a number
of instructions for I/O data transfers can often be better spent on other processing
tasks.
DMA Operation
• DMA controlled data transfer is used for large data transfer. For example to read
bulk amount data from disk to main memory.
• To read a block of data from the disk processor sends a series of commands to the
disk controller device telling it to search and read the desired block of data from
the disk.
• When disk controller is ready to transfer first byte of data from disk, it sends
DMA request DRQ signal to the DMA controller.
• Then DMA controller sends a hold request HRQ, signal to the processor HOLD
input. The processor responds this HOLD signal by floating its buses and sending
out a hold acknowledge signal HLDA, to the DMA controller.
• When the DMA controller receives the HLDA signal, it takes the control of
system bus.
• When DMA controller gets control of the buses, it sends the memory address
where the first byte of data from the disk is to be written. It also sends a DMA
acknowledge, DACK signal to the disk controller device telling it to get ready to
output the byte.
• Finally, it asserts both the I/O read and memory write signals on the control bus.
Asserting the I/O read signal enables the disk controller to output the byte of data
from the disk on the data bus and asserting the memory write signal enables the
addressed memory to accept data from the data bus. In this technique data is
transferred directly from the disk controller to the memory location without
passing through the processor or the DMA controller.
• Thus, the CPU is involved only at the beginning and end of the transfer.
• After completion of data transfer, the HOLD signal is deasserted to give control
of all buses back to the processor.
• The Fig. 8.10.2 shows the interaction between processor and DMA discussed
above.
Comparison of I/O Program Controlled Transfer and DMA
Transfer
• For performing the DMA operation, the basic blocks required in a DMA
channel/controller are shown in Fig. 8.10.3.
• DMA controller communicates with the CPU via the data bus and control lines.
• The registers in DMA are selected by the CPU through the address bus by
enabling the DS (DMA select) and RS (Register select) inputs.
• The RD (Read) and WR (write) inputs are bidirectional.
• When the BG (bus grant) input is 0, the CPU can communicate with the DMA
registers through the data bus to read from or write the DMA registers RD and WR
signals are input signals for DMA.
• When BG = 1, the CPU has relinquished the buses and the DMA can
communicate directly with the memory by specifying an address in the address bus
and activating the RD or WR signals (RD and WR are now output signals for
DMA). DMA consists of data count; data register, address register and control
logic.
• Data counter register stores the number which gives the number data transfers to
be done in one DMA cycle. It is automatically decremented after each word
transfer.
• Data register acts as buffer whereas address register initially holds the starting
address of the device. Actually, it stores the address of the next word to be
transferred. It is automatically incremented or decremented after each word
transfer.
• After each transfer, data counter is tested for zero. When the data count reaches
zero, the DMA transfer halts.
In this mode device can make only one transfer (byte or word). After each transfer
DMAC gives the control of all buses to the processor. Due to this processor can
have access to the buses on a regular basis.
It allows the DMAC to time share the buses with the processor, hence this mode is
most commonly used.
2. The DMAC asserts HLDA line to request use of the buses from the processor.
3. The processor asserts HLDA, granting the control of buses to the DMAC.
4. The DMAC asserts to the requesting I/O device and executes DMA bus
cycle, resulting data transfer.
5. I/O device deasserts its DRQ after data transfer of one byte or word.
9. HOLD signal is reasserted to request the use of buses when I/O device is ready
to transfer another byte or word. The same process is then repeated until the last
transfer.
10. When the transfer count is exhausted, terminal count is generated to indicate
the end of the transfer.
In this mode device can make number of transfers as programmed in the word
count register. After each transfer word count is decremented by 1 and the address
is decremented or incremented by 1. The DMA transfer is continued until the word
count "rolls over" from zero to FFFFH, a Terminal Count (TC) or an external END
of Process ( ) is encountered. Block transfer mode is used when the DMAC
needs to transfer a block of data.
2. The DMAC asserts HLDA line to request use of the buses from the
microprocessor.
3. The microprocessor asserts HLDA, granting the control of buses to the DMAC.
4. The DMAC asserts to the requesting I/O device and executes DMA bus
cycle, resulting data transfer.
5. I/O device deasserts its DRQ after data transfer of one byte or word.
6. DMA deasserts line.
8. When the transfer count is exhausted, the data transfer is not complete and the
DMAC waits for another DMA request from the I/O device, indicating that it has
another byte or word to transfer. When DMAC receives DMA request steps are
repeated.
9. If the transfer count is not exhausted, the data transfer is complete then DMAC
deasserts the HOLD to tell the microprocessor that it no longer needs the buses.
10. Microprocessor then deasserts the HLDA signal to tell the DMAC that it has
resumed control of the buses.
2. The DMAC asserts HLDA line to request use of the buses from the
microprocessor.
3. The microprocessor asserts HLDA, granting the control of buses to the DMAC.
4. The DMAC asserts to the requesting I/O device and executes DMA bus
cycle, resulting data transfer.
5. I/O device deasserts its DRQ after data transfer of one byte or word.
8. The DMAC continues to execute transfer cycles until the I/O device deasserts
DRQ indicating its inability to continue delivering data. The DMAC deasserts
HOLD signal, giving the buses back to microprocessor. It also deasserts .
10. Transfer continues in this way until the transfer count has been exhausted.
The flowcharts in the Fig. 8.10.4 summarized the three data transfer modes of
DMA. (See Fig. 8.10.4 on next page.)
• The Fig. 8.10.5 (a) shows the use of DMA in a computer system.
• The DMA is used to connect a high-speed network to the computer bus. The
DMA control handles the data transfer between high-speed network and the
computer system.
• It is also used to transfer data between processor and floppy disk with the help of
floppy disk controller.
• Let us see how DMA controller does the data transfer between floppy disk
and the processor. The Fig. 8.10.5 (b) shows the interface required for such
transfer.
• The sequence of events that takes place during the data transfer are as follows:
• When processor needs some data from the disk, it sends a series of command
words to registers inside the floppy disk controller.
• The floppy disk controller then proceeds to find the specified track and sector on
the disk.
Interfac
e required for such transfer.
• In the mean while processor loads the DMA data counter and address register.
The data counter is loaded with count equal to the number of bytes to be
transferred to or from the memory. The address register is loaded with the starting
address of the memory.
• When the controller reads the first byte of data from a sector, it sends DMA
request, DRQ signal to the DMA controller. DMA request sends a hold request
signal to the HOLD input of the processor.
• The processor then floats its buses and sends a hold-acknowledge signal to the
DMA controller.
• The DMA controller then sends out the first transfer address on the bus and
asserts the input of the FDC to tell it that the DMA transfer is underway.
When the number of bytes specified in the DMA controller initialization has been
transferred, the DMA controller asserts the (end of process) signal, which is
connected to the TC (Terminal Count) input of the FDC.
• This causes FDC to generate interrupt signal to tell the processor that the
requested block of data has been read from disk to a buffer in memory.
Accessing I/O
• The important point to be noted here is, I/O devices (peripherals) cannot be
connected directly to the system bus. The reasons are discussed here.
2. The data transfer rate of peripherals is often much slower than that of the
memory or CPU. So it is impractical to use the high speed system bus to
communicate directly with the peripherals.
3. Generally, the peripherals used in a computer system have different data formats
and word lengths than that of CPU used in it.
1. Handle data transfer between much slower peripherals and CPU or memory.
2. Handle data transfer between CPU or memory and peripherals having different
data formats and word lengths.
3. Match signal levels of different I/O protocols with computer signal levels.
• The I/O system if nothing but the hardware required to connect an I/O device to
the bus. It is also called I/O interface. The major requirements of an I/O
interface are :
2. Processor communication
3. Device communication
4. Data buffering
5. Error detection
• The important blocks necessary in any I/O interface are shown in Fig. 8.6.1.
• As shown in the Fig. 8.6.1, I/O interface consists of data register, status/control
register, address decoder and external device interface logic.
• The data register holds the data being transferred to or from the processor.
• Address lines drive the address decoder. The address decoder enables the device
to recognize its address when address appears on the address lines.
• The external device interface logic accepts inputs from address decoder,
processor control lines and status signal from the I/O device and generates control
signals to control the direction and speed of data transfer between processor and
I/O devices.
• The Fig. 8.6.2 shows the I/O interface for input device and output device. Here,
for simplicity block schematic of I/O interface is shown instead of detail
connections.
• The address decoder enables the device when its address appears on the address
lines.
• The data register holds the data being transferred to or from the processor.
• The status register contains information relevant to the operation of the I/O
device.
• Both the data and status registers are assigned with unique addresses and they are
connected to the data bus.
I/O devices can be interfaced to a computer system I/O in two ways, which are
called interfacing techniques,
• In this technique, the total memory address space is partitioned and part of this
space is devoted to I/O addressing as shown in Fig. 8.6.3.
• When this technique is used, a memory reference instruction that causes data to
be fetched from or stored at address specified, automatically becomes an I/O
instruction if that address is made the address of an I/O port.
Advantage
• The usual memory related instructions are used for I/O related operations. The
special I/O instructions are not required.
Disadvantage
• If we do not want to reduce the memory address space, we allot a different I/O
address space, apart from total memory space which is called I/O mapped I/O
technique as shown in Fig. 8.6.4.
Advantage
Disadvantage
• The memory related instructions do not work. Therefore, processor can only use
this mode if it has special instructions for I/O related operations such as I/O read,
I/O write.
• In I/O data transfer, the system requires the transfer of data between external
circuitry and the processor. Different ways of I/O data transfer are:
1. Program controlled I/O or polling control.
• In program controlled I/O, the transfer of data is completely under the control of
the processor program. This means that the data transfer takes place only when an
I/O transfer instructions executed. In most of the cases it is necessary to check
whether the device is ready for data transfer or not. To check this, processor polls
the status bit associated with the I/O device.
• When interrupted, the processor stops the execution of the program and transfers
the program control to an interrupt service routine.
• After the data transfer, it returns control to the main program at the point it was
interrupted.
Hardware controlled I/O
• To increase the speed of data transfer between processors memory and I/O, the
hardware controlled I/O is used. It is commonly referred to as Direct Memory
Access (DMA). The hardware which controls this data transfer is commonly
known as DMA controller.
• The DMA controller sends a HOLD signal to the processor to initiate data
transfer. In response to HOLD signal, processor releases its data, address and
control buses to the DMA controller. Then the data transfer is controlled at high
speed by the DMA controller without the intervention of the processor.
• After data transfer, DMA controller sends low on the HOLD pin, which gives the
control of data, address, and control buses back to the processor.
• The handshake signals are used to ensure the readiness of the I/O device and to
synchronize the timing of the data transfer. In this data transfer, the status of
handshaking signals are checked between the processor and an I/O device and
when both are ready, the actual data is transferred.
• An I/O interface consists of circuits which connect an, I/O device to a computer
bus.
• As shown in Fig. 8.9.1 on one side of the interface we have the bus signals for
address, data and control. On the other side we have a data path with its associated
controls to transfer data between the interface and the I/O device.
Parallel Interface
• Parallel interface is used to send or receive data having group of bits (8-bits or
16-bits) simultaneously.
• Input interfaces are used to receive the data whereas output interfaces are used to
send the data.
Input Interface
• Commonly used input device is a keyboard. Fig. 8.9.1 shows the hardware
components needed for connecting a keyboard to a processor.
• A typical keyboard consists of mechanical switches that are normally open. When
key is pressed, corresponding signal alters and encoder circuit generates ASCII
code for the corresponding key.
• When debouncing is implemented in software, the I/O routine that reads a ASCII
code of character from the keyboard waits long enough to ensure that bouncing has
subsided.
• Fig. 8.9.3 shows the hardware approach to prevent key bouncing. It consists of
flip-flop. The output of flip-flop shown in Fig. 8.9.3 is logic 1 when key is at
position A (unpressed) and it is logic 0 when key is at position B, as shown in
Table 8.9.1. It is important to note that, when key is in between A and B, output
does not change, preventing bouncing of key output. In other words we can say
that output does not change during transition period, eliminating key debouncing.
• The output of encoder in Fig. 8.9.4 consists of the bits that represent the encoded
character and one control signal called valid, which indicates that a key is being
pressed. The encoder circuit sends this information to the interface circuit.
• The interface circuit consists of a data register, DATA IN, and a status flag, SIN.
•When key is pressed, the valid signal is activated, (changes from 0 to 1) causing
the ASCII code to be loaded into DATA IN and SIN to be set to 1. The status flag
SIN is cleared to 0 when the processor reads the contents of the DATA IN register.
• The interface circuit is connected to an asynchronous bus on which transfers are
controlled by using the handshake signals master-ready and slave-ready.
• The Fig. 8.9.4 shows the typical internal circuitry for input interface. Here, we
receive the data input from the keyboard input device.
• When the key is pressed, its switch closes and establishes a path for an electrical
signal. This signal is detected by an encoder circuit that generates ASCII code for
the corresponding character.
• The input interface consists of data register, DATA IN, and status flag, SIN.
When a key is pressed, the valid signal activates and causes the ASCII code to be
loaded into DATA IN and SIN to be set to 1.
• The status flag SIN is cleared to 0 when the processor reads the contents of the
DATA IN register.
• An address decoder is used to select the input interface when the high-order 31
bits of an address corresponds to any of the addresses assigned to this interface.
• Address bit determines whether the status or the data register is to be read when
the Master-ready signal is active.
Output Interface
• The Fig. 8.9.5 shows typical example of output interface which is used to
interface parallel printer.
• The output interface contains a data register, DATAOUT, and a status flag,
SOUT.
• The SOUT flag is set to 1 when the printer is ready to accept another character,
and it is cleared to 0 when a new character is loaded into DATAOUT by the
processor.
• The Fig. 8.9.6 shows the detail internal circuit for output interface.
Combined Input/Output Interface
• The 30-bits of higher order address (A 31 - A2) are used to select overall interface.
The low-order two bits of address (A0 - A1) are used to select one of the three
addressable locations in the interface. These are: Two data registers and one status
register.
• Flags SIN and SOUT are in the status register.
• Labels RS1 and S0 used for inputs that determines the selection of desired register.
• The input and output interfaces can be combined into a single interface and the
direction of data flow can be controlled by data direction register.
• Single interface can be programmed to use for input the data or output the data.
Such a interface is known as programmable parallel interface.
• The Fig. 8.9.8 shows the simplified block diagram of 8-bit programmable parallel
interface.
• Data and interface lines are bidirectional. There direction is controlled by Data
Direction Register (DDR).
• Two lines M0 and M1 are connected to the status and control. These two lines
decides the mode of operation of the parallel interface, i.e. whether to operate
parallel interface as a simple input/output or handshake input/output.
• Ready and Accept signals are provided as handshaking signals.
• The signal is also provided to allow interrupt drives I/O data transfer.
Serial Interface
• A serial interface is used to transmit / receive data serially, i.e,, one bit at a time.
• A shift register is used to transform information between the parallel and serial
formats. The Fig. 8.9.9 shows the block diagram of typical internal circuit for serial
interface.
• As shown in the Fig. 8.9.9, the input shift register accepts serial data bit by bit
and converts it into the parallel data. The converted parallel data is loaded in the
data register and it is then read by the processor using data bus.
• When it is necessary to send data serially, the data is loaded into DATAOUT
register. It is then loaded into output shift register. Output shift register converts
this parallel data into serial data.
Comparison between Serial and Parallel Interface
• For transmitting data over a long distance, using parallel interface is impractical
due to the increase in cost of cabling.
• Parallel interface is also not practical for devices such as cassette tapes or a CRT
terminal. In such situations, serial interface is used.
• In serial interface one bit is transferred at a time over a single line.
Interrupt I/O
AU: Dec.-06, 07, 08, 09, 10, 11, 12, 16, 18, May-06, 07, 08, 09, 12, 13
• This method provides an external asynchronous input that would inform the
processor that it should complete whatever instruction that is currently being
executed and fetch a new routine (Interrupt Service Routine) that will service the
requesting device. Once this servicing is completed, the processor would resume
exactly where it left off. The event that causes the interruption is called interrupt
and the special routine executed to service the interrupt is called Interrupt Service
Routine (ISR).
• The interrupt service routine is different from subroutine because the address of
ISR is predefined or it is available in Interrupt Vector Table (IVT), whereas
subroutine address is necessarily to be given in subroutine CALL instruction. IRET
instruction is used to return from the ISR whereas RET instruction is used to return
from subroutine. IRET instruction restores flag contents along with CS and IP in
the IA-32 architecture; however RET instruction only restores CS and IP contents.
• Most of the processors provide the masking facility. In the processor those
interrupts which can be masked under software control are called maskable
interrupts.
• The interrupts which can not be masked under software control are called non-
maskable interrupts.
• Maskable interrupts are enabled and disabled under program control. By setting
or resetting particular flip-flops in the processor, interrupts can be masked or
unmasked, respectively.
• When masked, processor does not respond to the interrupt even though the
interrupt is activated.
Vectored interrupts
• When the external device interrupts the processor (interrupt request), processor
has to execute interrupt service routine for servicing that interrupt. If the internal
control circuit of the processor produces a CALL to a predetermined memory
location which is the starting address of interrupt service routine, then that address
is called vector address and such interrupts are called vector interrupts. For vector
interrupts fastest and most flexible response is obtained since such an interrupt
causes a direct hardware-implemented transition to the correct interrupt-handling
program. This technique is called vectoring. When processor is interrupted, it reads
the vector address and loads it into the PC.
Interrupt nesting
• For some devices, a long delay in responding to an interrupt request may cause
error in the operation of computer. Such interrupts are acknowledged and serviced
even though processor is executing an interrupt service routine for another device.
•A system of interrupts that allows an interrupt service routine to be interrupted is
known as nested interrupts.
Interrupt priority
• When interrupt requests arrive from two or more devices simultaneously, the
processor has to decide which request should be serviced first and which one
should be delayed. The processor takes the decision with the help of interrupt
priorities.
• It accepts the request having the highest priority.
• The CPU recognizes the interrupt when the external asynchronous input
(interrupt input) is asserted (a signal is sent to the interrupt input) by an I/O device.
• When a processor is interrupted, it stops executing its current program and calls a
special routine which "services" the interrupt. The event that causes the
interruption is called interrupt and the special routine which is executed is called
interrupt service routine.
2. The program counter's current contents are stored on the stack. Remember,
during the execution of an instruction the program counter is pointing to the
memory location for the next instruction.
3. The program counter is loaded with the address of an interrupt service routine.
4. Program execution continues with the instruction taken from the memory
location pointed by the new program counter contents.
6. After execution of from the RET instruction processor gets the old address (the
address of the next instruction where the interrupt service routine was called.) of
the program counter form the stack and puts it back into the program counter. This
allows the interrupted program to continue executing at the instruction following
the one where it was interrupted. Fig. 8.8.2 shows the response to interrupt with
flowchart and diagram.
The Table 8.8.1 gives the comparison between programmed I/O and interrupt
driven I/O.
Interrupt Priority Schemes
• The Fig. 8.8.4 shows another arrangement for handling priority interrupts. Here,
device are organised in groups, and each group is connected at a different priority
level. Within a group, devices are connected in a daisy-chain.