Unit 4
Unit 4
• Bare Machine is logic hardware in the computer system which can execute the programs in
the processor without using the Operating System.
• In the early days, before the Operating systems were developed, the instructions were
executed directly on the hardware without any interfering software.
• But the only drawback was that the Bare Machine accepts the program and instructions in
Machine Language. Due to this, only the trained people who were qualified in the computer
field and were able to understand and instruct the computer in Machine language were able to
operate on a computer.
• Due to this reason, the Bare Machine was termed as inefficient and cumbersome because of
absence of operating system all the tasks has to be performed manually. No operating
system service, it means OS manage nothing instead of its entire address space is given to the
program and all the managing left to it.
• It is the operating system that makes the bare machine work like a computer system, acts as
an interface; provide friendly environment and ease of work to the system users.
Resident Monitor
• In computing, a resident monitor is a type of system software program that was used in
many early computers from the 1950s to 1970s.The name is derived from a program which is
always present in the computer's memory, thus being "resident".
• After scheduling the job Resident monitors loads the programs one by one into the main
memory according to their sequences. One most important factor about the resident monitor is
that when the program execution occurred there is no gap between the program execution and
the processing is going to be faster.
The Resident monitors are divided into 4 parts as:
1. Control Language Interpreter:
The job of the Control Language Interpreter is to read and carry out the instructions line by
line to the next level.
2. Loader:
The second part of the Resident monitor which is the main part of the Resident Monitor is
Loader which Loads all the necessary system and application programs into the main memory.
3. Device Driver:
The third part of the Resident monitor is Device Driver it Takes care of all the Input-Output
devices connected with the system. So, all the communication that takes place between the
user and the system is handled by the Device Driver. It simply acts as an intermediate between
the requests and the response, requests that are made by the user to the system, and they
respond that the system produces to fulfill these requests.
4. Interrupt Processing:
The fourth part as the name suggests, it processes all the interrupt occurred to the system.
Memory Management
Memory management
• Memory management keeps track of each and every memory location, regardless of either
it is allocated to some process or it is free.
Main Memory
• Main Memory refers to a physical memory that is the internal memory to the computer. The
word main is used to distinguish it from external mass storage devices such as disk drives.
Main memory is also known as RAM.
• The computer is able to change only data that is in main memory. Therefore, every program
we execute and every file we access must be copied from a storage device into main memory.
• It allows you to check how much memory needs to be allocated to processes that decide
which processor should get memory at what time.
• Tracks whenever inventory gets freed or unallocated. According to it will update the status.
• It allocates the space to application routines.
• It also makes sure that these applications do not interfere with each other.
• Helps protect different processes from each other
• It places the programs in memory so that memory is utilized to its full extent.
• Logical Address is generated by CPU while a program is running. The logical address is
virtual address as it does not exist physically, therefore, it is also known as Virtual Address.
This address is used as a reference to access the physical memory location by CPU. The term
Logical Address Space is used for the set of all logical addresses generated by a program’s
perspective.
The hardware device called Memory-Management Unit is used for mapping logical address to
its corresponding physical address.
• Physical Address identifies a physical location of required data in a memory. The user never
directly deals with the physical address but can access by its corresponding logical address.
The user program generates the logical address and thinks that the program is running in this
logical address but the program needs physical memory for its execution, therefore, the
logical address must be mapped to the physical address by MMU before they are used. The
term Physical Address Space is used for all physical addresses corresponding to the logical
addresses in a Logical address space.
Mapping Virtual Addresses to Physical Addresses
Address binding is the process of mapping from one address space to another address space. Logical
address is address generated by CPU during execution whereas Physical Address refers to location in
memory unit(the one that is loaded into memory).Note that user deals with only logical
address(Virtual address). The logical address undergoes translation by the MMU or address
translation unit in particular. The output of this process is the appropriate physical address or the
location of code/data in RAM.
Virtual and physical addresses are the same in compile-time and load-time address-binding schemes.
Virtual and physical addresses differ in execution-time address-binding scheme.
The choice between Static or Dynamic Loading is to be made at the time of computer program being
developed. If you have to load your program statically, then at the time of compilation, the complete
programs will be compiled and linked without leaving any external program or module dependency.
The linker combines the object program with other necessary object modules into an absolute
program, which also includes logical addresses.
If you are writing a Dynamically loaded program, then your compiler will compile the program and
for all the modules which you want to include dynamically, only references will be provided and rest
of the work will be done at the time of execution.
At the time of loading, with static loading, the absolute program (and data) is loaded into memory in
order for execution to start.
If you are using dynamic loading, dynamic routines of the library are stored on a disk in relocatable
form and are loaded into memory only when they are needed by the program.
As explained above, when static linking is used, the linker combines all other modules needed by a
program into a single executable program to avoid any runtime dependency.
When dynamic linking is used, it is not required to link the actual module or library with the
program, rather a reference to the dynamic module is provided at the time of compilation and
linking. Dynamic Link Libraries (DLL) in Windows and Shared Objects in Unix are good examples
of dynamic libraries.
Swapping
A process needs to be in memory for execution. But sometimes there is not enough main memory to
hold all the currently active processes in a timesharing system. So, excess process are kept on disk and
brought in to run dynamically. Swapping is the process of bringing in each process in main memory,
running it for a while and then putting it back to the disk.
Fragmentation
Processes are stored and removed from memory, which creates free memory space, which are too
small to use by other processes.
After sometimes, that processes not able to allocate to memory blocks because its small size and
memory blocks always remain unused is called fragmentation. This type of problem happens during a
dynamic memory allocation system when free blocks are quite small, so it is not able to fulfill any
request.
• External fragmentation can be reduced by rearranging memory contents to place all free
memory together in a single block.
• The internal fragmentation can be reduced by assigning the smallest partition, which is still
good enough to carry the entire process.
Memory Allocation
• It is the easiest memory management technique. In this scheme, the memory is divided into
two parts. One part holds the operating system and the remaining holds the user processes
which are loaded and executed one at a time.
• When user process completes its task, it is taken out of main memory and another requesting
process is brought into main memory by the operating system. For example, MS-DOS
operating system allocates memory in this way.
Advantages:
• It is simple.
• It is easy to understand and use.
Disadvantages
• One of the simplest methods for allocating memory is to divide memory into several
partitions, which are mostly contiguous areas of memory.
➢ Suppose process P5 of size 7MB comes. But this process cannot be accommodated
inspite of available free space because of contiguous allocation (as spanning is not
allowed). Hence, 7MB becomes part of External Fragmentation.
1. Easy to implement:
Algorithms needed to implement Fixed Partitioning are easy to implement. It simply requires
putting a process into certain partition without focussing on the emergence of Internal and
External Fragmentation.
2. Little OS overhead:
Processing of Fixed Partitioning require lesser excess and indirect computational power.
For example, suppose in above example- process P1(2MB) and process P3(1MB) completed their
execution. Hence two spaces are left i.e. 2MB and 1MB. Let’s suppose process P5 of size 3MB
comes. The empty space in memory cannot be allocated as no spanning is allowed in contiguous
allocation. The rule says that process must be contiguously present in main memory to get executed.
Hence it results in External Fragmentation
Memory is divided into different blocks or partitions. Each process is allocated according to the
requirement. Partition allocation is an ideal method to avoid internal fragmentation. Below are
the various partition allocation schemes:
First Fit:
• In this type fit, the partition is allocated, which is the first sufficient block from the
beginning of the main memory. This method keeps the free/busy list of jobs organized by
memory location, low-ordered to high-ordered memory.
• In this method, first job claims the first available memory with space more than or equal to
it’s size.
• The operating system doesn’t search for appropriate partition but just allocate the job to the
nearest memory partition available with sufficient size. Example:
As illustrated above, the system assigns J1 the nearest partition in the memory. As a result, there is no
partition with sufficient space is available for J3 and it is placed in the waiting list.
• It is fast in processing: As the processor allocates the nearest available memory partition to
the job, it is very fast in execution.
• It wastes a lot of memory: The processor ignores if the size of partition allocated to the job
is very large as compared to the size of job or not. It just allocates the memory. As a result, a
lot of memory is wasted and many jobs may not get space in the memory, and would have to
wait for another job to complete.
Best Fit:
• It allocates the process to the partition that is the first smallest partition among the free
partitions. Using Best Fit has some disadvantages:
• It is slower because it scans the entire list every time and tries to find out the smallest hole
which can satisfy the requirement the process.
• Due to the fact that the difference between the whole size and the process size is very small,
the holes produced will be as small as it cannot be used to load any process and therefore it
remains useless. Despite of the fact that the name of the algorithm is best fit, it is not the best
algorithm among all. Example:
As illustrated in above figure, the operating system first search throughout the memory and allocates
the job to the minimum possible memory partition, making the memory allocation efficient.
• The operating system allocates the job minimum possible space in the memory, making
memory management very efficient. To save memory from getting wasted, it is the best
method.
• It is a Slow Process. Checking the whole memory for each job makes the working of the
operating system very slow. It takes a lot of time to complete the work.
Worst Fit:
• It allocates the process to the partition, which is the largest sufficient freely available
partition in the main memory.
• The worst fit algorithm scans the entire list every time and tries to find out the biggest hole in
the list which can fulfill the requirement of the process.
• Despite of the fact that this algorithm produces the larger holes to load the other processes,
this is not the better approach due to the fact that it is slower because it searches the entire
list every time again and again.
• The large amount of space may be wasted. It requires sorting and searching free holes.
It does not provide the optimal solution and time consuming.
Next Fit:
It is mostly similar to the first Fit, but this Fit, searches for the first sufficient partition from the last
allocation point.
• In the non-contiguous memory allocation the available free memory space are scattered here
and there and all the free memory space is not at one place.
• In the non-contiguous memory allocation, a process will acquire the memory space but it is
not at one place it is at the different locations according to the process requirement.
• This technique of non-contiguous memory allocation reduces the wastage of memory which
leads to internal and external fragmentation. This utilizes all the free memory space which is
created by a different process.
Paging: This method divides the computer's main memory into fixed-size units known as page
frames. This hardware memory management unit maps pages into frames which should be allocated
on a page basis.
Segmentation: Segmented memory is the only memory management method that does not provide
the user's program with a linear and contiguous address space. Segments need hardware support in the
form of a segment table. It contains the physical address of the section in memory, size, and other
data like access protection bits and status.
Paging
Non-Contiguous Memory Allocation-
Techniques-
There are two popular techniques used for non-contiguous memory allocation-
Paging
• Paging is a fixed size partitioning scheme. In paging, secondary memory and main
memory are divided into equal fixed size partitions.
• The size of a frame should be kept the same as that of a page to have maximum
utilization of the main memory and to avoid external fragmentation. Paging is used
for faster access to data, and it is a logical concept.
• Each process is divided into parts where size of each part is same as page size. The
size of the last part may be less than the page size.
• The pages of process are stored in the frames of main memory depending upon their
availability.
Example-
Consider a process is divided into 4 pages P0, P1, P2 and P3.Depending upon the availability,
these pages may be stored in the main memory frames in a non-contiguous fashion as shown-
• Page Number specifies the specific page of the process from which CPU wants to
read the data.
• Page Offset specifies the specific word on the page that CPU wants to read.
• Page table provides the corresponding frame number (base address of the frame)
where that page is stored in the main memory.
Step 3: The frame number combined with the page offset forms the required physical
address.
• Frame number specifies the specific frame where the required page is stored.
• Page Offset specifies the specific word that has to be read from that page.
The following diagram illustrates the above steps of translating logical address into
physical address-
Page Table-
• Page table is a data structure.
• It maps the page number referenced by the CPU to the frame number where that page
is stored.
Characteristics-
• It gives the entry of the page table containing the frame number where the referenced
page is stored.
A page table entry contains information about the page which varies from operating
system to operating system. The most important information in a page table entry is frame
number. In general, each entry of a page table contains the following information-
1. Frame Number-
• Frame number specifies the frame where the page is stored in the main memory.
• The number of bits in frame number depends on the number of frames in the main
memory.
• This bit is also sometimes called as valid / invalid bit.This bit specifies whether that
page is present in the main memory or not.
• If the page is not present in the main memory, then this bit is set to 0 otherwise set
to 1.
• If the required page is not present in the main memory, then it is called as Page Fault.
The required page has to be fetched from the secondary memory and brought into the
main memory.
3. Protection Bit-
• This bit is also sometimes called as “Read / Write bit“. This bit is concerned with the
page protection.
• It specifies the permission to perform read and write operation on the page. If only
read operation is allowed to be performed and no writing is allowed, then this bit is set to
0.
• If both read and write operations are allowed to be performed, then this bit is set to
1.
4. Reference Bit-
• Reference bit specifies whether that page has been referenced in the last clock cycle
or not. If the page has been referenced recently, then this bit is set to 1 otherwise set to
0.
• A page that has not been referenced recently is considered a good candidate for page
replacement in LRU page replacement policy.
• This bit enables or disables the caching of page. Whenever freshness in the data is
required, then caching is disabled this bit is set to 1, otherwise set to 0.
6. Dirty Bit-
• This bit is also sometimes called as “Modified bit”. This bit specifies whether that
page has been modified or not.
• If the page has been modified, then this bit is set to 1, otherwise set to 0.
• In case the page is modified, before replacing the modified page with some other
page, it has to be written back in the secondary memory to avoid losing the data.
• This is because if the page is not modified, then it can be directly replaced by another
page without any need of writing it back to the disk.
Advantages of Paging
Disadvantages of Paging
• The times taken to fetch the instruction increases since now two memory accesses are
required.
• In Operating System, for each process page table will be created, which will
contain Page table entry .This PTE will contain information like frame number (The
address of main memory where we want to refer), and some other useful bits (e.g.,
valid/invalid bit, dirty bit, protection bit etc).
• This page table entry (PTE) will tell where in the main memory the actual page is
residing.
• Now the question is where to place the page table, such that overall access time
(or reference time) will be less.
• Initially, some people thought of using registers to store page table, as they are
high-speed memory so access time will be less.The idea used here is, place the page
table entries in registers, for each request generated from CPU (virtual address), it will
be matched to the appropriate page number of the page table, which will now tell
where in the main memory that corresponding page resides. Everything seems right
here, but the problem is register size is small (in practical, it can accommodate
maximum of 0.5k to 1k page table entries) and process size may be big hence the
required page table will also be big (lets say this page table contains 1M entries), so
registers may not hold all the PTE’s of Page table. So this is not a practical
approach.
• To overcome this size issue, the entire page table was kept in main memory. But the
problem here is two main memory references are required:
• TLB contains page table entries that have been most recently used. Given a virtual
address, the processor examines the TLB if a page table entry is present (TLB hit),
the frame number is retrieved and the real address is formed. If a page table entry is
not found in the TLB (TLB miss), the page number is used to index the process page
table.
• TLB first checks if the page is already in main memory, if not in main memory a
page fault is issued then the TLB is updated to include the new page entry.
• Translation Lookaside Buffer (TLB) is a solution that tries to reduce the effective
access time.
• Being hardware, the access time of TLB is very less as compared to the main
memory.
Steps in TLB hit:
1. If TLB does not contain an entry for the referenced page number, a TLB miss occurs.
2. In this case, page table is used to get the corresponding frame number for the referenced
page number.
3. The TLB is updated with new PTE (if space is not there, one of the replacement
technique comes into picture i.e either FIFO, LRU or MFU etc).
Important Points-
Point-01:
• Unlike page table, there exists only one TLB in the system.
• So, whenever context switching occurs, the entire content of TLB is flushed and deleted.
• TLB is then again updated with the currently running process.
Point-02:
When a new process gets scheduled-
• Initially, TLB is empty. So, TLB misses are frequent.
• With every access from the page table, TLB is updated.
• After some time, TLB hits increases and TLB misses reduces.
Point-03:
• The time taken to update TLB after getting the frame number from the page table is
negligible.
• Also, TLB is updated in parallel while fetching the word from the main memory.
Advantages of TLB-
Disadvantages of TLB-
Effective memory access time(EMAT) : TLB is used to reduce effective memory access
time as it is a high speed associative cache.
Solution-
• TLB access time = 10 ns
• Main memory access time = 50 ns
• TLB Hit ratio = 90% = 0.9
= 0.1
Effective Access Time= hit ratio of TLB *(TLB access time + Memory access time) + miss
ratio of TLB * (TLB access time +2* Memory access time)
= 0.9 x { 10 ns + 50 ns } + 0.1 x { 10 ns + 2 x 50 ns }
= 54 ns + 11 ns
= 65 ns
Q.2. A paging scheme uses a Translation Lookaside buffer (TLB). The effective
memory access takes 160 ns and a main memory access takes 100 ns. What is the TLB
access time (in ns) if the TLB hit ratio is 60% and there is no page fault?
Solution-
• Main memory access time = 100 ns
• TLB Hit ratio = 60% = 0.6
• Effective access time = 160 ns
= 1 – 0.6
= 0.4
160 = T + 140
T = 160 – 140=20ns
1. MULTILEVEL PAGING
• Multilevel Paging is a paging scheme, which consists of two or more levels of page
tables in a hierarchical manner. It is also known as hierarchical paging.
• The page table might be too big to fit in a contiguous space, so we may have a
hierarchy with several levels. We break up the logical address space into multiple
page tables. We then page the page table.
• The entries of the level 1 page table are pointers to a level 2 page table and entries of
the level 2 page tables are pointers to a level 3 page tables and so on.
• The entries of the last level page table are stores actual frame information. Level
1 contain single page table and address of that table is stored in PTBR (Page Table
Base Register).
• In multilevel paging whatever may be levels of paging all the page tables will be
stored in main memory. So it requires more than one memory access to get the
physical address of page frame. One access for each level needed. Each page table
entry except the last level page table entry contains base address of the next level
page table.
The simple techniques used are: Two level paging, Three level paging
1. Two level paging
• Inverted Page Table is the global page table which is maintained by the
Operating System for all the processes.
• In inverted page table, the number of entries is equal to the number of
frames in the main memory. It can be used to overcome the drawbacks
of page table.
• There is always a space reserved for the page regardless of the fact that
whether it is present in the main memory or not. However, this is simply
the wastage of the memory if the page is not present. We can save this
wastage by just inverting the page table.
• We can save the details only for the pages which are present in the main
memory. Frames are the indices and the information saved inside the
block will be Process ID and page number.
Inverted paging scheme
Segmentation
• Segmentation is another non-contiguous memory allocation technique.
• In segmentation, process is not divided blindly into fixed size pages.
• Rather, the process is divided into modules for better visualization.
Characteristics-
• Segmentation is a variable size partitioning scheme.
• In segmentation, secondary memory and main memory are divided into partitions of unequal
size.
• The size of partitions depends on the length of modules.
• The partitions of secondary memory are called as segments.
Consider a program is divided into 5 segments as-
Segment Table-
• Segment table is a table that stores the information about each segment of the process.
• It has two columns.First column stores the size or length of the segment.
• Second column stores the base address or starting address of the segment in the main
memory.
• Segment table is stored as a separate segment in the main memory.
• Segment table base register (STBR) stores the base address of the segment table.
For the above illustration, consider the segment table is-
• Limit indicates the length or size of the segment.
• Base indicates the base address or starting address of the segment in the main memory.
In accordance to the above segment table, the segments are stored in the main memory as-
Step-01:
1. Segment Number
2. Segment Offset
• Segment Number specifies the specific segment of the process from which CPU wants to
read the data.
• Segment Offset specifies the specific word in the segment that CPU wants to read.
Step-02:
• For the generated segment number, corresponding entry is located in the segment table.
• Then, segment offset is compared with the limit (size) of the segment.
• If segment offset is found to be greater than or equal to the limit, a trap is generated.
Diagram-
The following diagram illustrates the above steps of translating logical address into physical address-
Advantages-
The advantages of segmentation are-
• It allows to divide the program into modules which provides better visualization.
• Segment table consumes less space as compared to Page table in paging.
• It solves the problem of internal fragmentation.
Disadvantages-
Paged Segmentation
• Paging and Segmentation are the non-contiguous memory allocation techniques.
• Paging divides the process into equal size partitions called as pages.
• Segmentation divides the process into unequal size partitions called as segments.
Segmented paging is a scheme that implements the combination of segmentation and paging
Working-
In segmented paging,
• Process is first divided into segments and then each segment is divided into pages.
• These pages are then stored in the frames of main memory.
• A page table exists for each segment that keeps track of the frames storing the pages
of that segment.
• Each page table occupies one frame in the main memory.
• Number of entries in the page table of a segment = Number of pages that segment is
divided.
• A segment table exists that keeps track of the frames storing the page tables of
segments.
• Number of entries in the segment table of a process = Number of segments that
process is divided.
• The base address of the segment table is stored in the segment table base register.
• For the generated segment number, corresponding entry is located in the segment
table. Segment table provides the frame number of the frame storing the page table of
the referred segment.
• The frame containing the page table is located.
Step-03:
• For the generated page number, corresponding entry is located in the page table.
• Page table provides the frame number of the frame storing the required page of the
referred segment.
• The frame containing the required page is located.
Step-04:
• The frame number combined with the page offset forms the required physical
address.
• For the generated page offset, corresponding word is located in the page and read.
Diagram-
Advantages-
The advantages of segmented paging are-
• Segment table contains only one entry corresponding to each segment.
• It reduces memory usage.
• The size of Page Table is limited by the segment size.
• It solves the problem of external fragmentation.
Disadvantages-
The disadvantages of segmented paging are-
• Segmented paging suffers from internal fragmentation.
• The complexity level is much higher as compared to paging.
• Virtual memory is a technique that allows the execution of process that may not be
completely in memory. The main visible advantage of this scheme is that programs
can be larger than physical memory.
• Virtual memory – separation of user logical memory from physical memory. Only
part of the program needs to be in memory for execution. Logical address space can
therefore be much larger than physical address space. Allows address spaces to be
shared by several processes.
• Allows for more efficient process creation. More programs running concurrently. Less
I/O needed to load or swap processes.
Following are the situations, when entire program is not required to load fully:
1. Error handling code is not needed unless that specific error occurs, some of which are
quite rare.
2. Arrays are often over-sized for worst-case scenarios, and only a small fraction of the
arrays are actually used in practice.
3. More physical memory available, as programs are stored on virtual memory, so they
occupy very less space on actual physical memory.
Note: Virtual Memory is a storage scheme that provides user an illusion of having a very
big main memory. This is done by treating a part of secondary memory as the main memory.
• In this scheme, User can load the bigger size processes than the available main
memory by having the illusion that the memory is available to load the process.
• Instead of loading one big process in the main memory, the Operating System loads
the different parts of more than one process in the main memory.
• By doing this, the degree of multiprogramming will be increased and therefore, the
CPU utilization will also be increased.
Fig: Diagram showing that Virtual memory is larger than physical memory
1. Demand paging
2. Demand segmentation
Demand Paging
• The basic idea behind demand paging is that when a process is swapped in, its pages
are not swapped in all at once. Rather they are swapped in only when the process
needs them (on demand). This is termed as Lazy Swapper, although a Pager is a
more accurate term. A swapper manipulates entire processes, whereas a pager is
concerned with the individual pages of a process. We thus use pager, rather than
swapper, in connection with demand paging.
• A page is copied to the main memory when its demand is made or page fault occurs.
• There are various page replacement algorithms which are used to determine the pages
which will be replaced.
• The basic idea behind paging is that when a process is swapped in, the pager only
loads into memory those pages that it expects the process to need (right away.)
• Pages that are not loaded into memory are marked as invalid in the page table,
using the invalid bit. (The rest of the page table entry may either be blank or contain
information about where to find the swapped-out page on the hard drive.)
• If the process only ever accesses pages that are loaded in memory (memory
resident pages), then the process runs exactly as if all the pages were loaded in to
memory.
Steps for handling page fault
On the other hand, if a page is needed that was not originally loaded up then a page fault
trap is generated, which must be handled in a series of steps:
1. The memory address requested is first checked, to make sure it was a valid memory
request.
2. If the reference was invalid, the process is terminated. Otherwise, the page must be
paged in.
4. A disk operation is scheduled to bring in the necessary page from disk. (This will
usually block the process on an I/O wait, allowing some other process to use the CPU
in the meantime.)
5. When the I/O operation is complete, the process's page table is updated with the new
frame number, and the invalid bit is changed to indicate that this is now a valid page
reference.
6. The instruction that caused the page fault must now be restarted from the beginning,
(as soon as this process gets another turn on the CPU.)
• In an extreme case, NO pages are swapped in for a process until they are requested
by page faults. This is known as pure demand paging. In theory each instruction
could generate multiple page faults. In practice this is very rare, due to locality of
reference.
• The hardware necessary to support virtual memory is the same as for paging and
swapping:
Demand Paging can significantly affect the performance of a computer system. We compute
the effective access time for the same,
• Page fault service time is: s (The time taken to service the page fault is called as
page fault service time. The page fault service time includes the time taken to perform
all the above six steps.)
• Page fault rate is: p (the probability of a page fault 0≤ p≤1. We would expect p to
be close to 0- that is to have only a few page faults)
There are various constraints to the strategies for the allocation of frames:
You cannot allocate more than the total number of available frames.
At least a minimum number of frames should be allocated to each process. This
constraint is supported by two reasons. The first reason is, as less number of frames are
allocated, there is an increase in the page fault ratio, decreasing the performance of the
execution of the process. Secondly, there should be enough frames to hold all the
different pages that any single instruction can reference.
1. Equal allocation: In a system with x frames and y processes, each process gets equal
number of frames, i.e. x/y. For instance, if the system has 48 frames and 9 processes,
each process will get 5 frames. The three frames which are not allocated to any process
can be used as a free-frame buffer pool.
Disadvantage: In systems with processes of varying sizes, it does not make much
sense to give each process equal frames. Allocation of a large number of frames
to a small process will eventually lead to the wastage of a large number of
allocated unused frames.
2. Proportional allocation: Frames are allocated to each process according to the process
size. For a process pi of size si,
the number of allocated frames is ai = (si/S)*m, where S is the sum of the sizes of all
the processes and m is the number of frames in the system.
For instance, in a system with 62 frames, if there is a process of 10KB and another
process of 127KB, then the first process will be allocated (10/137)*62 = 4 frames and
the other process will get (127/137)*62 = 57 frames.
Advantage: All the processes share the available frames according to their needs,
rather than equally.
1. Local replacement: When a process needs a page which is not in the memory, it can
bring in the new page and allocate it a frame from its own set of allocated frames
only.
Advantage: The pages in memory for a particular process and the page fault ratio
are affected by the paging behavior of only that process.
Disadvantage: A low priority process may hinder a high priority process by not
making its frames available to the high priority process.
2. Global replacement: When a process needs a page which is not in the memory, it can
bring in the new page and allocate it a frame from the set of all frames, even if that
frame is currently allocated to some other process; that is, one process can take a
frame from another.
Advantage: Does not hinder the performance of processes and hence results in
greater system throughput.
If the referred page is not present in the main memory then there will be a miss and
the concept is called Page miss or page fault. Since actual physical memory is much
smaller than virtual memory, page faults happen.
In case of page fault, Operating System might have to replace one of the existing
pages with the newly needed page.
Different page replacement algorithms suggest different ways to decide which page to
replace. The target for all algorithms is to reduce the number of page faults.
This is the simplest page replacement algorithm. In this algorithm, the operating
system keeps track of all pages in the memory in a queue,
The oldest page is in the front of the queue. When a page needs to be replaced
page in the front of the queue is selected for removal that is the oldest page is
selected as the victim.
Example: Consider page reference string 1, 3, 0, 3, 5, 6, 3 with 3 page frames. Find number
of page faults.
Example: A system uses 3 page frames for storing process pages in main memory. It uses the
First in First out (FIFO) page replacement policy. Assume that all the page frames are
initially empty. What is the total number of page faults that will occur while processing the
page reference string given below? Also calculate the hit ratio and miss ratio.
4 , 7, 6, 1, 7, 6, 1, 2, 7, 2
= 10 – 6 = 4
Thus, Hit ratio= Total number of page hits / Total number of references
= 4 / 10 = 0.4 or 40%
Thus, Miss ratio= Total number of page misses / Total number of references
= 6 / 10 = 0.6 or 60%
Belady’s Anomaly: Normally it is expected that with increasing the number of frames will
decrease the number of page faults. We would expect that giving more memory to a process
would improve its performance. This assumption was not always true, Belady’s anomaly was
discovered as a result.
FIFO Illustrating Belady’s Anomaly
Optimal page replacement (OPR)
This algorithm replaces the page that will not be referred by the CPU in future for
the longest time.
It is practically impossible to implement this algorithm because the pages that will
not be used in future for the longest time can not be predicted.
However, it is the best known algorithm and gives the least number of page
faults. Hence, it is used as a performance measure criterion for other algorithms.
Advantages of Optimal Page Replacement Algorithm :-
i) Difficult to implement.
ii) It needs forecast i.e. Future knowledge.
Example: Consider the page references 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, with 4 page frame.
Find number of page fault using OPR.
In this algorithm page will be replaced which is least recently used. Use past
knowledge rather than future.
Replace page that has not been used in the most amount of time. Associate
time of last use with each page
The advantage of LRU page replacement algorithm is that it does not suffer from Belady's
anomaly and the disadvantage is that it needs expensive hardware support or additional data
structure to implement.
4 , 7, 6, 1, 7, 6, 1, 2, 7, 2
= 10 – 6 = 4
Thus, Hit ratio= Total number of page hits / Total number of references
= 4 / 10
= 0.4 or 40%
Thus, Miss ratio= Total number of page misses / Total number of references
= 6 / 10
= 0.6 or 60%
LRU Approximation Algorithms
Reference bit
Second-chance algorithm
Clock replacement
Improve algorithm by using reference bit and modify bit (if available) in concert
2. (0, 1) not recently used but modified – not quite as good, must write out before
replacement
3. (1, 0) recently used but clean – probably will be used again soon
4. (1, 1) recently used and modified – probably will be used again soon and need to
write out before replacement
When page replacement called for, use the clock scheme but use the four classes
replace page in lowest non-empty class
Keep a counter of the number of references that have been made to each page
Not common
Least Frequently Used (LFU) Algorithm: replaces page with smallest count
Most Frequently Used (MFU) Algorithm: based on the argument that the
page with the smallest count was probably just brought in and has yet to be used
o Read page into free frame and select victim to evict and add to free pool
o When backing store otherwise idle, write pages there and set to non-dirty
Possibly, keep free frame contents intact and note what is in them
o If referenced again before reused, no need to load contents again from disk
Page fault and swapping: We know every program is divided into some pages. When a
program needs a page, which is not in RAM that is called page fault. Whenever a page fault
happens, operating system will try to fetch that page from secondary memory and try to swap it
with one of the page in RAM. This is called swapping.
Thrashing in OS
If this page fault and then swapping happening very frequently at higher rate, then
operating system has to spend more time to swap these pages. This state is called
thrashing. Because of this, CPU utilization is going to be reduced.
The basic concept involved is that if a process is allocated too few frames, then there
will be too many and too frequent page faults. As a result, no useful work would be
done by the CPU and the CPU utilisation would fall drastically.
The long-term scheduler would then try to improve the CPU utilisation by loading
some more processes into the memory thereby increasing the degree of
multiprogramming.
This would result in a further decrease in the CPU utilization triggering a chained
reaction of higher page faults followed by an increase in the degree of
multiprogramming, called Thrashing.
Locality Model –
A locality is a set of pages that are actively used together. The locality model states that as a
process executes, it moves from one locality to another. A program is generally composed of
several different localities which may overlap.
For example when a function is called, it defines a new locality where memory references are
made to the instructions of the function call, it’s local and global variables, etc. Similarly, when
the function is exited, the process leaves this locality.
Effect of Thrashing
Whenever thrashing starts, operating system tries to apply either Global page
replacement Algorithm or Local page replacement algorithm.
Since global page replacement can access to bring any page, it tries to bring more
pages whenever thrashing found.
But what actually will happen is, due to this, no process gets enough frames and by
result thrashing will be increase more and more. So global page replacement
algorithm is not suitable when thrashing happens.
Unlike global page replacement algorithm, local page replacement will select pages
which only belong to that process. So there is a chance to reduce the thrashing.
But it is proven that there are many disadvantages if we use local page replacement.
So local page replacement is just alternative than global page replacement in
thrashing scenario.
Techniques to handle:
If there are enough extra frames, then some more processes can be loaded in the
memory. On the other hand, if the summation of working set sizes exceeds the
availability of frames, then some of the processes have to be suspended
(swapped out of memory).
This technique prevents thrashing along with ensuring the highest degree of
multiprogramming possible. Thus, it optimizes CPU utilization.
2. Page Fault Frequency –
A more direct approach to handle thrashing is the one that uses Page-Fault
Frequency concept. The problem associated with Thrashing is the high page fault
rate and thus, the concept here is to control the page fault rate.
If the page fault rate is too high, it indicates that the process has too few frames
allocated to it. On the contrary, a low page fault rate indicates that the process has
too many frames.
Upper and lower limits can be established on the desired page fault rate as shown in
the diagram:
o If the page fault rate falls below the lower limit, frames can be removed
from the process.
o Similarly, if the page fault rate exceeds the upper limit, more number of
frames can be allocated to the process.
In other words, the graphical state of the system should be kept limited to
the rectangular region formed in the given diagram.
Here too, if the page fault rate is high with no free frames, then some of the
processes can be suspended and frames allocated to them can be reallocated to
other processes. The suspended processes can then be restarted later.
Cache memory organization
Cache Memory
Cache memory is a very high speed semiconductor memory which can speed up the CPU. It acts
as a buffer between the CPU and the main memory.
It is used to hold those parts of data and program which are most frequently used by the CPU.
The parts of data and programs are transferred from the disk to cache memory by the operating
system, from where the CPU can access them.
Purpose
Cache memory is used to reduce the average time to access data from the Main memory.
It is used for bridging the speed mismatch between the fastest CPU and the main memory.
The cache is a smaller and faster memory which stores copies of the data from frequently used
main memory locations.
Execution of Program-
Whenever any program has to be executed, it is first loaded in the main memory.
The portion of the program that is mostly probably going to be executed in the near future is kept
in the cache memory.
This allows CPU to access the most probable portion at a faster speed.
Step-01:
Whenever CPU requires any word of memory, it is first searched in the CPU registers.Now, there
are two cases possible-
Case-01:
If the required word is found in the CPU registers, it is read from there.
Case-02:
If the required word is not found in the CPU registers, Step-02 is followed.
Step-02:
When the required word is not found in the CPU registers, it is searched in the cache memory.
Tag directory of the cache memory is used to search whether the required word is present in the
cache memory or not. Now, there are two cases possible-
Case-01:
If the required word is found in the cache memory, the word is delivered to the CPU. This is
known as Cache hit.
Case-02:
If the required word is not found in the cache memory, Step-03 is followed. This is known
as Cache miss.
Step-03:
When the required word is not found in the cache memory, it is searched in the main memory.
Page Table is used to determine whether the required page is present in the main memory or not.
Now, there are two cases possible-
Case-01:
If the page containing the required word is found in the main memory,
The page is mapped from the main memory to the cache memory. This mapping is performed
using cache mapping techniques.
Then, the required word is delivered to the CPU.
Case-02:
If the page containing the required word is not found in the main memory,
A page fault occurs. The page containing the required word is mapped from the secondary
memory to the main memory.
Then, the page is mapped from the main memory to the cache memory. Then, the required word
is delivered to the CPU.
Multilevel Cache Organization-
A multilevel cache organization is an organization where cache memories of different sizes are
organized at multiple levels to increase the processing speed to a greater extent.
The smaller the size of cache, the faster its speed. The smallest size cache memory is placed
closest to the CPU. This helps to achieve better performance in terms of speed.
Example-
Cache Mapping-
Cache mapping is a technique by which the contents of main memory are brought into the cache
memory.
The following diagram illustrates the mapping process-
NOTE:
Main memory is divided into equal size partitions called as blocks or frames.
Cache memory is divided into partitions having same size as that of blocks called as lines.
During cache mapping, block of main memory is simply copied to the cache and the block is not
actually brought from the main memory
1. Direct Mapping-
In direct mapping,
A particular block of main memory can map only to a particular line of the cache.
The line number of cache to which a particular block can map is given by-
i = j modulo m
Here,
k = 2 suggests that each set contains two cache lines.
Since cache contains 6 lines, so number of sets in the cache = 6 / 2 = 3 sets.
Block ‘j’ of main memory can map to set number (j mod 3) only of the cache.
Within that set, block ‘j’ can map to any cache line that is freely available at that moment.
If all the cache lines are occupied, then one of the existing blocks will have to be replaced.
Special Cases-
If k = 1, then k-way set associative mapping becomes direct mapping i.e.
1-way Set Associative Mapping ≡ Direct Mapping
If k = Total number of lines in the cache, then k-way set associative mapping becomes fully
associative mapping.
Direct Mapped Cache Questions-
Q. Consider a direct mapped cache of size 16 KB with block size 256 bytes. The size of main memory is
128 KB. Find-
1. Number of bits in tag
2. Tag directory size
Solution-
Given- Cache memory size = 16 KB, Block size = Frame size = Line size = 256 bytes, Main memory size
= 128 KB
Number of Bits in Physical Address-
Size of main memory= 128 KB = 217 bytes
Thus, Number of bits in physical address = 17 bits
Number of Bits in Block Offset-
Block size = 256 bytes = 28 bytes
Thus, Number of bits in block offset = 8 bits
Number of Bits in Line Number-
Total number of lines in cache = Cache size / Line size
= 16 KB / 256 bytes = 214 bytes / 28 bytes
= 26 lines
Thus, Number of bits in line number = 6 bits
Number of Bits in Tag-
Number of bits in tag = Number of bits in physical address – (Number of bits in line number +
Number of bits in block offset)
= 17 bits – (6 bits + 8 bits) = 3 bits
Thus, Number of bits in tag = 3 bits
Tag Directory Size-
Tag directory size= Number of tags x Tag size = Number of lines in cache x Number of bits in tag
= 26 x 3 bits = 192 bits
= 24 bytes
Thus, size of tag directory = 24 bytes
Q. Consider a direct mapped cache of size 512 KB with block size 1 KB. There are 7 bits in the tag. Find-
1. Size of main memory
2. Tag directory size
Solution-
Given-Cache memory size = 512 KB, Block size = Frame size = Line size = 1 KB, Number of bits in tag
= 7 bits
Number of Bits in Block Offset-
Block size = 1 KB = 210 bytes
Thus, Number of bits in block offset = 10 bits
Q. Consider a fully associative mapped cache of size 16 KB with block size 256 bytes. The size of main
memory is 128 KB. Find-
1. Number of bits in tag
2. Tag directory size
Solution-
Given- Cache memory size = 16 KB, Block size = Frame size = Line size = 256 bytes, Main memory
size = 128 KB
Number of Bits in Physical Address-
Size of main memory = 128 KB = 217 bytes
Thus, Number of bits in physical address = 17 bits
Q. Consider a fully associative mapped cache with block size 4 KB. The size of main memory is 16 GB.
Find the number of bits in tag.
Solution-
Given- Block size = Frame size = Line size = 4 KB , Size of main memory = 16 GB
Number of Bits in Physical Address-
Size of main memory = 16 GB
= 234 bytes
Thus, Number of bits in physical address = 34 bits
Q. Consider a 2-way set associative mapped cache of size 16 KB with block size 256 bytes. The size of
main memory is 128 KB. Find-
1. Number of bits in tag
2. Tag directory size
Solution-
Given- Set size = 2, Cache memory size = 16 KB, Block size = Frame size = Line size = 256 bytes, Main
memory size = 128 KB
Number of Bits in Physical Address-
Size of main memory = 128 KB
= 217 bytes
Thus, Number of bits in physical address = 17 bits
Q. Consider a 4-way set associative mapped cache with block size 4 KB. The size of main memory is 16
GB and there are 10 bits in the tag. Find-
1. Size of cache memory
2. Tag directory size
Solution-
Given- Set size = 4, Block size = Frame size = Line size = 4 KB, Main memory size = 16 GB, Number
of bits in tag = 10 bits
Number of Bits in Physical Address-
Size of main memory = 16 GB
= 234 bytes
Thus, Number of bits in physical address = 34 bits
Number of Bits in Block Offset-
Block size = 4 KB
= 212 bytes
Thus, Number of bits in block offset = 12 bits
Q. Consider a 8-way set associative mapped cache. The size of cache memory is 512 KB and there are 10
bits in the tag. Find the size of main memory.
Solution-
Given- Set size = 8, Cache memory size = 512 KB, Number of bits in tag = 10 bits
Let-
Number of bits in set number field = x bits
Number of bits in block offset field = y bits
Sum of Number Of Bits Of Set Number Field And Block Offset Field-
Cache memory size = Number of sets in cache x Number of lines in one set x Line size
Now, substituting the values, we get-
512 KB = 2x x 8 x 2y bytes
219 bytes = 23+x+y bytes
19 = 3 +x + y
x + y = 19 – 3 = 16
Number of Bits in Physical Address-
Number of bits in physical address = Number of bits in tag + Number of bits in set number +
Number of bits in block offset
= 10 bits + x bits + y bits
= 10 bits + (x + y) bits
= 10 bits + 16 bits
= 26 bits
Thus, Number of bits in physical address = 26 bits
For higher performance in a multiprocessor system, each processor will usually have its own
cache. Cache coherence refers to the problem of keeping the data in these caches consistent.
The main problem is dealing with writes by a processor.
In a multiprocessor system, data inconsistency may occur among adjacent levels or within the
same level of the memory hierarchy. For example, the cache and the main memory may have
inconsistent copies of the same object. As multiple processors operate in parallel, and
independently multiple caches may possess different copies of the same memory block, this
creates cache coherence problem.
Cache coherence schemes help to avoid this problem by maintaining a uniform state for each
cached block of data.
There are two general strategies for dealing with writes to a cache:
Write-through - all data written to the cache is also written to memory at the same time.
Write-back - when data is written to a cache, a dirty bit is set for the affected block. The
modified block is written to memory only when the block is replaced.
Software solution:
In software approach, the detecting of potential cache coherence problem is transferred from
run time to compile time, and the design complexity is transferred from hardware to software.
On the other hand, compile time; software approaches generally make conservative decisions.
Leading to inefficient cache utilization. Compiler-based cache coherence mechanism perform
an analysis on the code to determine which data items may become unsafe for caching, and
they mark those items accordingly. So, there are some more cacheable items, and the operating
system or hardware does not cache those items.
The simplest approach is to prevent any shared data variables from being cached. This is too
conservative, because a shared data structure may be exclusively used during some periods and
may be effectively read-only during other periods.
It is only during periods when at least one process may update the variable and at least one other
process may access the variable then cache coherence is an issue. More efficient approaches
analyze the code to determine safe periods for shared variables. The compiler then inserts
instructions into the generated code to enforce cache coherence during the critical periods.
Hardware solutions:
Hardware schemes can be divided into two categories: directory protocol and snoopy
protocols.
Directory protocols:
Directory protocols collect and maintain information about where copies of lines reside.
Typically, there is centralized controller that is part of the main memory controller, and a
directory that is stored in main memory.
The directory contains global state information about the contents of the various local caches.
When an individual cache controller makes a request, the centralized controller checks and issues
necessary commands for data transfer between memory and caches or between caches
themselves.
It is also responsible for keeping the state information up to date, therefore, every local action
that can effect the global state of a line must be reported to the central controller.
The controller maintains information about which processors have a copy of which lines.
Before a processor can write to a local copy of a line, it must request exclusive access to the line
from the controller.
Before granting thus exclusive access, the controller sends a message to all processors with a
cached copy of this time, forcing each processor to invalidate its copy.
After receiving acknowledgement back from each such processor, the controller grants exclusive
access to the requesting processor.
When another processor tries to read a line that is exclusively granted to other processors, it will
send a miss notification to the controller.
The controller then issues a command to the processor holding that line that requires the
processors to do a write back to main memory.
Directory schemes suffer from the drawbacks of a central bottleneck and the overhead of
communication between the various cache controllers and the central controller.
Snoopy Protocols:
Snoopy protocols distribute the responsibility for maintaining cache coherence among all of
the cache controllers in a multiprocessor system.
A cache must recognize when a line that it holds is shared with other caches.
When an update action is performed on a shared cache line, it must be announced to all other
caches by a broadcast mechanism.
Each cache controller is able to “snoop” on the network to observe these broadcasted notifications
and react accordingly.
Snoopy protocols are ideally suited to a bus-based multiprocessor, because the shared bus
provides a simple means for broadcasting and snooping.
Two basic approaches to the snoopy protocol have been explored: Write invalidates or write-
update (write-broadcast)
With a write-invalidate protocol, there can be multiple readers but only one write at a time.
Initially, a line may be shared among several caches for reading purposes.
When one of the caches wants to perform a write to the line it first issues a notice that invalidates
that tine in the other caches, making the line exclusive to the writing cache. Once the line is
exclusive, the owning processor can make local writes until some other processor requires the
same line.
With a write update protocol, there can be multiple writers as well as multiple readers. When a
processors wishes to update a shared line, the word to be updated is distributed to all others, and
caches containing that line can update it
Cache Performance numerical
Locality of Reference
Locality of reference refers to a phenomenon in which a computer program tends to access same set of
memory locations for a particular time period. In other words, Locality of Reference refers to the
tendency of the computer program to access instructions whose addresses are near one another. The
property of locality of reference is mainly shown by loops and subroutine calls in a program.
1. In case of loops in program control processing unit repeatedly refers to the set of instructions that
constitute the loop.
2. In case of subroutine calls, every time the set of instructions are fetched from memory.
3. References to data items also get localized that means same data item is referenced again and
again
In the above figure, you can see that the CPU wants to read or fetch the data or instruction.
First, it will access the cache memory as it is near to it and provides very fast access. If the
required data or instruction is found, it will be fetched. This situation is known as a cache hit.
But if the required data or instruction is not found in the cache memory then this situation is
known as a cache miss. Now the main memory will be searched for the required data or
instruction that was being searched and if found will go through one of the two ways:
1. First way is that the CPU should fetch the required data or instruction and use it and that’s it but
what, when the same data or instruction is required again.CPU again has to access the same main
memory location for it and we already know that main memory is the slowest to access.
2. The second way is to store the data or instruction in the cache memory so that if it is needed soon
again in the near future it could be fetched in a much faster way.
Cache Operation:
It is based on the principle of locality of reference. There are two ways with which data or instruction is
fetched from main memory and get stored in cache memory. These two ways are the following:
1. Temporal Locality – Temporal locality means current data or instruction that is being fetched
may be needed soon. So we should store that data or instruction in the cache memory so that we
can avoid again searching in main memory for the same data.
When CPU accesses the current main memory location for reading required data or instruction, it also
gets stored in the cache memory which is based on the fact that same data or instruction may be needed in
near future. This is known as temporal locality. If some data is referenced, then there is a high probability
that it will be referenced again in the near future.
2. Spatial Locality –Spatial locality means instruction or data near to the current memory location
that is being fetched, may be needed soon in the near future. This is slightly different from the
temporal locality. Here we are talking about nearly located memory locations while in temporal
locality we were talking about the actual memory location that was being fetched.