Distributed System
Dr. D.S. Kushwaha
Computer Architectures
Computer architectures consisting of
interconnected, multiple processors are
basically of two types:
Tightly coupled systems
Loosely coupled systems
Tightly coupled systems
There is a single system wide primary
memory (address space) that is shared by all
the processors.
These are also known as parallel processing
systems.
System wide
CPU CPU CPU CPU
Shared memory
Interconnection hardware
Loosely coupled systems
The processors do not share memory, and
each processor has its own local memory.
These are also known as distributed
computing systems or simply distributed
systems.
Local memory Local memory Local memory Local memory
CPU CPU CPU CPU
Communication Network
Distributed Computing System
A DCS is a collection of independent computers that
appears to its users as a single coherent system,
or
A collection of processors interconnected by a
communication network in which
Each processor has its own local memory and other
peripherals, and
The communication between any two processors of
the system takes place by message passing.
For a particular processor, its own resources are
local, whereas the other processors and their
resources are remote.
Cont…
Together, a processor and its resources are
usually referred to as a node or site or machine
of the distributed computing system.
A distributed system is organized as middleware.
Note that the middleware layer extends over multiple
machines.
Distributed System
Distributed Computing System Models
1. Minicomputer Model
2. Workstation model
3. Workstation-server model
4. Processor-pool model
5. Hybrid model
Minicomputer model
Extension of the centralized time - sharing system.
It consists of a few minicomputers interconnected by a
communication network.
Each user is logged on to one specific minicomputer,
with remote access to other minicomputers.
The network allows a user to access remote resources
that are available on some machine other than the one
on to which the user is currently logged.
This model may be used when resource sharing with
remote users is desired.
Cont…
Mini- Terminals
computer
Mini- Mini-
Communication
computer computer
network
Mini-
computer
Workstation Model
It consists of several workstations interconnected by a
communication network.
The idea of the workstation model is to interconnect all
workstations by a high-speed LAN so that:
idle workstations may be used to process jobs of users
who are logged onto other workstations and
The nodes which don’t have sufficient processing power
at their own workstations may send their process.
Cont…
workstation
workstation workstation
Communication workstation
workstation network
workstation workstation
workstation
Cont… : Issues
1. How does the system find an idle workstation?
2. How is a process transferred from one
workstation to get it executed on another
workstation?
3. What happens to a remote process if a user
logs onto a workstation that was idle until now
and was being used to execute a process of
another workstation?
Cont… : Issues
Three approaches to handle third issue:
Allow the remote process share the resources
of the workstation along with its own logged-on
user’s processes.
Kill the remote process.
Migrate the remote process back to its home
workstation, so that its execution can be
continued there.
Workstation Server Model
It consists of a few minicomputers and several
workstation maybe diskless.
When diskless workstations are used on a network, the
file system to be used by these workstations must be
implemented either by a diskful workstation or by a
minicomputer equipped with a disk for file storage.
Each minicomputer is used as a server machine to
provide one or more type of services.
Cont…
In addition to workstations, there are specialized
machines/workstations for running server processes
(called servers) for managing and providing access to
shared resources.
The user’s processes need not be migrated to the
server machines for getting the work done by those
machines.
For better overall system performance, the local disk of
a diskful workstation is normally used for such purposes
as
Storage of temporary files,
Storage of unshared files,
Storage of shared files that are rarely changed,
Paging activity in virtual-memory management, and
Caching of remotely accessed data.
Cont…
workstation
workstation workstation
Communication workstation
workstation network
Mini-
Mini- Mini-
computer
computer computer
Used as
Used as Used as
Print
File server database
server
server
Cont…
Advantages of workstation server model over
workstation model:
It is much cheaper to use a few minicomputers equipped
with large and fast disks.
System maintenance is easy.
Flexibility to use any workstation and access the files
because all files are managed by file servers.
Request-response protocol is mainly used to access the
services of the server machines.
Both client and server can run on the same computer.
A user has guaranteed response time because each
workstations are not used for executing remote
processes.
Processor Pool Model
Based on notion that most of the time, the CPU
computing power is not used. One needs large
computing power for short time.
The pool of processors consists of a large number of
microcomputers and minicomputers attached to the
network.
Each processor in the pool has its own memory to load
and run the program.
A user does not log onto a particular machine but to the
system as a whole.
Cont…
Terminals
Communication
network
Run File
server server
Cont…
Advantages over workstation server model:
Better utilization
The entire processing power of the system is
available for use by the currently logged-on users.
Greater flexibility
The system’s services can be easily expanded
without the need to install any more computers;
The processors in the pool can be allocated to act
as extra servers to carry any additional load arising
from an increased user population or to provide new
services.
Cont…
Disadvantage
Unsuitable for high-performance interactive
applications.
Because of the slow speed of communication
between the computer on which the application
program of a user is being executed and the
terminal via which the user is interacting with the
system.
Hybrid Model
Workstation Server Model is most widely used for small
program/Application.
Processor-Pool is ideal for massive computation jobs.
Hybrid combines the above two model as
A basic workstation/server model
Addition of pool of processors
The processors in the pool can be allocated dynamically
for computations that are too large for workstations
Gives guaranteed response to interactive jobs.
More expensive
Major Issues
In contrast to Standalone or centralized systems, issues:
Communication &
Security pose challenge
Access Control
Privacy Constraints
Reliability and Performance of Distributed System rests
on underlying communication network
Factors that led to the emergence of
distributed computing system
Inherently distributed applications
Organizations once based in a particular location have gone
global.
Information sharing among distributed users
CSCW
Resource sharing
Such as software libraries, databases, and hardware resources.
Factors that led to the emergence of
distributed computing system
Better price-performance ratio
Shorter response times and higher throughput
Higher reliability
Reliability refers to the degree of tolerance against errors
and component failures in a system.
Achieved by multiplicity of resources.
Factors that led to the emergence of
distributed computing system
Extensibility and incremental growth
By adding additional resources to the system as and when
the need arise.
These are termed as open distributed System.
Better flexibility in meeting user’s needs
Distributed Operating system
OS controls the resources of a computer system and
provides its users with an interface or virtual machine
that is more convenient to use than the bare machine.
Primary tasks of an operating system:
To present users with a virtual machine that is easier to
program than the underlying hardware.
To manage the various resources of the system.
Types of operating systems used for distributed
computing systems:
Network operating systems
Distributed operating systems.
Uniprocessor Operating Systems
Separating applications from operating system code
through a microkernel
Multicomputer Operating Systems
Message Transfer
Network Operating System
Network Operating System
Distributed System as Middleware
Middleware and Openness
In an open middleware-based distributed system, the protocols
used by each middleware layer should be the same, as well as
the interfaces they offer to applications
Comparison between Systems
Distributed OS
Network Middleware-
Item Multi- Multi- OS based OS
processor computer
Degree of
Very High High Low High
transparency
Same OS on all
Yes Yes No No
nodes
Number of copies
1 N N N
of OS
Basis for Shared
Messages Files Model specific
communication memory
Resource Global, Global,
Per node Per node
management central distributed
Scalability No Moderately Yes Varies
Openness Closed Closed Open Open
Major Differences
1. System image
In NOS, user perceives the DCS as a group of nodes
connected by a communication N/W. Hence, user is aware of
multiple computers.
DOS hides the existence of multiple computers and provides
a single system image to its users. Hence group of Networked
nodes act as virtual Uniprocessor.
In NOS, by default user’s job is executed on the machine on
which the user is currently logged on, else he has to remote
login.
DOS dynamically allocates jobs to the various machines of
the system for processing.
Cont…
2. AUTONOMY
Different nodes in NOS may use different O.S. but communicate
with each other by using a mutually agreed on communication
protocol.
In DOS, there exists single system wide O.S. and each node of
the DCS runs a part of OS (i.e. identical kernels run on all the
nodes of DCS).
This ensures same set of system calls globally valid.
3. FAULT TOLERANCE
Low in Network OS
High in Distribute OS
DCS using NOS generally referred as N/W System
DCS using DOS generally referred as Distributed System
Issues
Transparency
Reliability
Flexibility
Performance
Scalability
Heterogeneity
Security
Emulation to existing operating systems
TRANSPARENCY
To make the existence of multiple computers transparent &
Provide a single system image to its users.
ISO 1992 identifies eight types of transparencies:
Access Transparency
Location Transparency
Replication Transparency
Failure Transparency
Migration Transparency
Concurrency Transparency
Performance Transparency
Scaling Transparency
Access Transparency
Allows users to access remote resources in the same
way as local i.e. user interface which is in the form of a
set of system calls should not distinguish between local
and remote resources.
Location transparency
Aspects of location transparency
Name Transparency
Name of a resource should not reveal any information
about physical location of resource.
Even movable resources such as files must be
allowed to move without having their name changed.
User Mobility
No matter which machine a user is logged onto, he
should be able to access a resource with the same
name.
Replication Transparency
Replicas are used for better performance and reliability.
Replicated resource and the replication activity should be
transparent to user.
Two important issues related to replication transparency are:
Naming of Replicas
Replication Control
There should be a method to map a user-supplied name of the
resource to an appropriate replica of the resource.
Failure Transparency
In partial failures, system continues to function may be
with degraded transparency.
Complete failure transparency is not achievable.
Failure of the communication network of a distributed
system normally disrupts the work of its users and is
noticeable by the users.
Migration Transparency
Migration may be done for performance, reliability and
security reasons.
Aim of migration transparency
To ensure that the movement of the object is handled
automatically by the system in a user-transparent manner.
Issues for migration transparency:
Migration decisions
Migration of an object
Migration object as a process
Concurrency transparency
Virtualization that:
One is sole user of the system &
Other users do not exist in the system.
Properties for providing concurrency transparency:
Event Ordering Property
It ensures that all access requests to various system
resources are properly ordered to provide a consistent view
to all users of the system.
Mutual Exclusion property
At any time, at most one process accesses a shared
resource & not used simultaneously.
Cont…
No starvation policy
It ensures that if every process that is granted a
resource, which must not be used simultaneously by
multiple processes, eventually releases it, every request
for that resource is eventually granted.
No deadlock
It ensures that a situation will never occur in which
competing processes prevent their mutual progress even
though no single one requests more resources than
available in the system.
Performance transparency
The aim of performance transparency is to allow the
system to be automatically reconfigured to improve
performance, as loads vary dynamically in the system.
A situation in which one processor of the system is
overloaded with jobs while another processor is idle
should not be allowed to occur.
Scaling transparency
Allows the system to expand and scale without
disrupting the activities of the users.
Requires open system architecture and use of scalable
algorithms.
RELIABILITY
The distributed systems are required to be
more reliable than centralized systems due to
The multiple instances of resources.
Failure should be avoided & their types are:
Fail stop Failure: System stops functioning after
changing to a state in which its failure can be
detected.
Byzantine Failure: System continues to function but
produces wrong results.
Cont…
Methods to handle failure in distributed system :
Fault Avoidance
Fault Tolerance
Fault Detection
Fault detection and recovery
Fault Avoidance
It Deals with designing the components of the system
in such a way that the occurrence of faults is
minimized.
For this, reliable hardware components and intensive
software test are used.
Fault Tolerance
Ability of a system to continue functioning in the event
of partial system failure.
Method for tolerating the faults:
Redundancy techniques:
Avoid single point of failure by replicating critical
hardware and software components,
Additional overhead is needed to maintain multiple
copies of replicated resource & consistency issue.
Distributed control
A highly available distributed file system should have
multiple and independent file servers controlling multiple
and independent storage device.
Fault detection and recovery
Atomic Transaction
Use of stateless servers
History of the serviced requests between Client and Server
affects the execution of service request.
Acknowledgements and timeout-based retransmissions
of messages
For IPC between two processes, the system must have ways
to detect lost messages so that these could be retransmitted.
FLEXIBILITY
The design of a distributed operating system should be flexible due
to:
Ease of Modification
Ease of Enhancement
Kernel is the most important influencing design factor which
operates in a separate address space that is inaccessible to user
processes.
Commonly used models for kernel design in distributed operating
systems:
Microkernel model
Monolithic model
Monolithic kernel model
Most operating system services such as process management,
memory management, device management , file management,
name management, and inter-process communication are
provided by the kernel.
Result: The kernel has a large monolithic structure.
The large size of the kernel reduces the overall flexibility and
configurability of the resulting OS.
A request may be serviced faster.
No message passing and no context switching are required while the
kernel is performing the job.
Cont…
Node 1 Node 2 Node n
user user user
applications applications applications
Monolithic Monolithic Monolithic
kernel kernel kernel
(includes most (includes most (includes most
OS services) OS services) OS services)
Network Hardware
Microkernel model
Goal is to keep the kernel as small as possible.
Kernel provides only the minimal facilities necessary for
implementing additional operating system services like:
inter-process communication,
low-level device management, &
some memory management.
All other OS services are implemented as user-level server
processes. So it is easy to modify the design or add new
services.
Each server has its own address space and can be
programmed separately.
Cont…
Node 1 Node 2 Node n
user user user
applications applications applications
Server/manager Server/manager Server/manager
modules modules modules
Microkernel Microkernel Microkernel
(has only (has only (has only
minimal facilities) minimal facilities) minimal facilities)
Network Hardware
Cont…
Modular in nature, OS is easy to design, implement and
install.
For adding or changing a service, there is no need to stop
the system and boot a new kernel, as in the case of
monolithic kernel.
Performance penalty :
Each server module has to its own address space. Hence
some form of message based IPC is required while
performing some job.
Message passing between server processes and the
microkernel requires context switches, resulting in additional
performance overhead.
Cont…
Advantages of microkernel model over monolithic kernel
model:
Flexibility to design, maintenance and portability.
In practice the performance penalty is not too much due to
other factors and the small overhead involved in exchanging
messages which is usually negligible.
PERFORMANCE
Design principles for better performance are:
Batch if possible
Transfer of data in large chunk is more efficient than
individual pages.
Caching whenever possible
Saves a large amount of time and network bandwidth.
Minimize copying of data
While a message is transferred from sender to receiver, it
takes the following path:
From senders stack to its message buffer
From message buffer in senders address apace to
message buffer in kernels address space
Cont…
Finally from kernel to NIC
Similarly on receipt ,hence six copy operations are
required.
Minimize Network traffic
Migrate process closer to resource.
Take advantage of fine-grain parallelism for
multiprocessing
Use of threads for structuring server processes.
Fine-grained concurrency control of simultaneous
accesses by multiple processes to a shared resource for
better performance.
SCALABILITY
Scalability refers to the capability of a system to adapt to
increased service load.
Some principles for designing the scalable distributed
systems are:
Avoid centralized entities
Central file server
Centralized database
Avoid centralized algorithms
Perform most operations on client workstation
Heterogeneity
Dissimilar hardware or software systems.
Some incompatibilities in a heterogeneous distributed
system are:
Internal formatting schemes
Communication protocols and topologies of different networks
Different servers at different nodes
Some form of data translation is necessary for
interaction.
An intermediate standard data format can be used.
Security
More difficult than in a centralized system because of:
The lack of a single point of control &
The use of insecure networks for data communication.
Requirements for security
It should be possible for the sender of a message to know
that the message was received by the intended receiver.
It should be possible for the receiver of a message to know
that the message was sent by the genuine receiver.
Cont…
It should be possible for both the sender and receiver of a
message to be guaranteed that the contents of the message
were not changed while it was in transfer.
Cryptography is the solution.
Challenges
Alignment with the needs of the business / user / non-
computer specialists / community and society
Need to address the scalability issue:
large scale data,
high performance computing,
response time,
rapid prototyping, and rapid time to production.
Need to effectively address
ever shortening cycle of obsolescence,
heterogeneity and
rapid changes in requirements
What about providing all this in a cost-effective
manner?
68
Enter the cloud
Cloud computing is Internet-based computing,
whereby shared resources, software and
information are provided to computers and other
devices on-demand, like the electricity grid.
The cloud computing is a culmination of
numerous attempts at large scale computing with
seamless access to virtually limitless resources.
on-demand computing, utility computing,
ubiquitous computing, autonomic computing,
platform computing, edge computing, elastic
computing, grid computing, …
69
Enabling Technologies
Cloud applications: data-intensive,
compute-intensive, storage-intensive
Bandwidth
WS
Services interface
Web-services, SOA, WS standards
VM0 VM1 VMn
Storage Virtualization: bare metal, hypervisor. …
Models: S3,
BigTable,
BlobStore,
... Multi-core architectures
64-bit
processor 70
What is a Cloud?
SLAs
Web Services
Virtualization
Why Cloud Computing?
Cloud computing is an important model for the
distribution and access of computing resources.
As-needed availability:
Aligns resource expenditure with actual resource
usage thus allowing the organization to pay only
for the resources required, when they are
required.
Fog, Edge & Cloud
Local data processing helps to mitigate the
weakness of cloud computing.
In addition, fog computing also brings new
advantages, such as greater context-
awareness, realtime processing, lower
bandwidth requirement, etc.
Fog Computing
A standard that defines how edge computing
should work, and it facilitates the operation of
compute, storage and networking services
between end devices and cloud computing
data centers.
Additionally, many use fog as a jumping-off
point for edge computing.
Why Fog Computing?
Handle large data volume created by
IoT devices.
Latency Reduction as compared to
traditional IoT Networks.
Avoid data bottlenecks taking place at
the cloud.
Dense geographical distribution and
local resource pooling.
Better security and failure resistance.
Backbone bandwidth savings for better
Edge Computing
Edge brings processing close to the data
source, and it does not need to be sent to a
remote cloud or other centralized systems for
processing.
By eliminating the distance and time it takes
to send data to centralized sources, we can
improve the speed and performance of data
transport, as well as devices and applications
on the edge.