UNIT 1
What is a Distributed System:
Distributed System is a collection of autonomous computer systems that are physically
separated but are connected by a centralized computer network that is equipped with distributed
system software. The autonomous computers will communicate among each system by sharing
resources and files and performing the tasks assigned to them.
Example of Distributed System:
Any social media can have its Centralized Computer Network as its Headquarters and computer
systems that can be accessed by any user and using their services will be the Autonomous
Systems in the Distributed System Architecture.
• Distributed System Software: This Software enables computers to coordinate
their activities and to share the resources such as Hardware, Software, Data, etc.
• Database: It is used to store the processed data that are processed by each
Node/System of the Distributed systems that are connected to the Centralized
network.
• As we can see that each Autonomous System has a common Application that can
have its own data that is shared by the Centralized Database System.
• To Transfer the Data to Autonomous Systems, Centralized System should be having
a Middleware Service and should be connected to a Network.
• Middleware Services enables some services which are not present in the local
systems or centralized system default by acting as an interface between the
Centralized System and the local systems. By using components of Middleware
Services systems communicate and manage data.
• The Data which is been transferred through the database will be divided into
segments or modules and shared with Autonomous systems for processing.
• The Data will be processed and then will be transferred to the Centralized system
through the network and will be stored in the database.
Characteristics of Distributed System:
• Resource Sharing: It is the ability to use any Hardware, Software, or Data
anywhere in the System.
• Openness: It is concerned with Extensions and improvements in the system (i.e.,
How openly the software is developed and shared with others)
• Concurrency: It is naturally present in the Distributed Systems, that deal with the
same activity or functionality that can be performed by separate users who are in
remote locations. Every local system has its independent Operating Systems and
Resources.
• Scalability: It increases the scale of the system as a number of processors
communicate with more users by accommodating to improve the responsiveness of
the system.
• Fault tolerance: It cares about the reliability of the system if there is a failure in
Hardware or Software, the system continues to operate properly without degrading
the performance the system.
• Transparency: It hides the complexity of the Distributed Systems to the Users
and Application programs as there should be privacy in every system.
Advantages of Distributed System:
• Applications in Distributed Systems are Inherently Distributed Applications.
• Information in Distributed Systems is shared among geographically distributed
users.
• Resource Sharing (Autonomous systems can share resources from remote
locations).
• It has a better price performance ratio and flexibility.
• It has shorter response time and higher throughput.
• It has higher reliability and availability against component failure.
• It has extensibility so that systems can be extended in more remote locations and
also incremental growth.
Disadvantages of Distributed System:
• Relevant Software for Distributed systems does not exist currently.
• Security possesses a problem due to easy access to data as the resources are shared
to multiple systems.
• Networking Saturation may cause a hurdle in data transfer i.e., if there is a lag in
the network then the user will face a problem accessing data.
Applications Area of Distributed System:
• Finance and Commerce: Amazon, eBay, Online Banking, E-Commerce websites.
• Information Society: Search Engines, Wikipedia, Social Networking, Cloud
Computing.
• Cloud Technologies: AWS, Salesforce, Microsoft Azure, SAP.
• Entertainment: Online Gaming, Music, YouTube.
• Healthcare: Online patient records, Health Informatics.
• Education: E-learning.
• Transport and logistics: GPS, Google Maps.
• Environment Management: Sensor technologies.
Characterization of Distributed Systems:
A distributed system is also known as distributed computer science and distributed databases;
independent components that interact with other different machines that exchange messages
to achieve common goals. As such, the distributed system appears to the end-user like an
interface or a computer. Together the system can maximize resources and information while
preventing system failure and did not affect service availability.
1. Distributed Computing System:
This distributed system is used in performance computation which requires high computing.
• Cluster Computing: A collection of connected computers that work together as
a unit to perform operations together, functioning in a single system. Clusters are
generally connected quickly via local area networks & each node is running the
same operating system.
When input comes from a client to the main computer, the master CPU divides the task into
simple jobs and send it to slaves note to do it when the jobs are done by the slave nodes, they
send back to the master node, and then it shows the result to the main computer.
Advantages:
• High Performance
• Easy to manage
• Scalable
• Expandability
• Availability
• Flexibility
Disadvantages:
• High cost
• The problem in finding fault
• More space is needed
Applications of Cluster Computing:
• In many web applications functionalities such as Security, Search Engines,
Database servers, web servers, proxy, and email.
• It is flexible to allocate works as small data tasks for processing.
• Assist and help to solve complex computational problems
• Cluster computing can be used in weather modelling
• Earthquake, Nuclear, Simulation, and tornado forecast
• Grid computing: In grid computing, the subgroup consists of distributed
systems, which are often set up as a network of computer systems, each system
can belong to a different administrative domain and can differ greatly in terms of
hardware, software, and implementation network technology.
The different department has a different computer with different OS to mage the control node
is present which help different computer with different OS to communicate with each other
and transfer messages to work.
Advantages:
• Can solve bigger and more complex problems in a shorter time frame. Easier
collaboration with other organizations and better use of existing equipment
Disadvantages:
• Grid software and standards continue to evolve
• Getting started learning curve
• Non-interactive job submission
• You may need a fast connection between computer resources.
• Licensing on many servers can be prohibitive for some applications.
Applications of Grid Computing
• Organizations that develop grid standards and practices for the guild line.
• Works as a middleware solution for connecting different businesses.
• It is a solution-based solution that can meet computing, data, and network needs.
2. Distributed Information System:
• Distributed transaction processing: It works across different servers using
multiple communication models. The four characteristics that transactions have:
• Atomic: the transaction taking place must be indivisible for the others
• Consistent: The transaction should be consistent after the
transaction has been done
• Isolated: A transaction must not interfere with another transaction
• Durable: Once an engaged transaction, the changes are permanent.
Transactions are often constructed as several sub-transactions, jointly
forming a nested transaction.
Each database can perform its own individual query containing data retrieval from two
different databases to give one single result
In the company’s middleware systems, the component that manages distributed (or nested)
transactions has formed the application integration core at the server or database. This was
referred to as the Transaction Processing Monitor (TP Monitor). Its main task was to allow
an application to access multiple servers/databases by providing a transactional programming
model. Many requests are sent to the database to get the result, to ensure each request gets
successfully executed and deliver result to each request, this work is handled by the TP
Monitor.
• Enterprise application integration: Enterprise Application Integration (EAI) is
the process of bringing different businesses together. The databases and
workflows associated with business applications ensure that the business uses
information consistently and changes in data done by one business application are
reflected correctly in another’s. Many organizations collect different data from
different plate forms in the internal systems and then they use those data are used
in the Trading system /physical medium.
• RPC: Remote Procedure Calls (RPC), a software element that sends a request to
every other software element with the aid of using creating a nearby method name
and retrieving the data Which is now known as remote method invocation
(RMI). An app can have a different database for managing different data and then
they can communicate to each other on different platforms. Suppose, if you login
into your android device and watching you’re a video on YouTube then you go to
your laptop open YouTube you can see the same video is in your watch list. RPC
and RMI have the disadvantage that sender and receiver must be running at the
time for communication.
Purposes:
• Targets the application rules and implements them in the EAI system so that even
if one of the lines of business applications is replaced by the application of another
vendor.
• An EAI system can use a group of applications as a front end, provide only one,
consistent access interface to those applications, and protect users from learning
how to use different software packages.
3. Distributed Pervasive System:
Pervasive Computing is also abbreviated as ubiquitous (Changed and removed) computing
and it is the new step towards integrating everyday objects with microprocessors so that this
information can communicate. a computer system available anywhere in the company or as
a generally available consumer system that looks like that same everywhere with the same
functionality but that operates from computing power, storage, and locations across the globe.
• Home system: Nowadays many devices used in the home are being digital so
that we can control them from anywhere and effectively.
• Electronic health system: Nowadays smart medical wearable devices are also
present through which we can monitor our health regularly.
• Sensor network (IoT devices): Internet devices only send data to the client to
act according to the data send to the device.
• Before sensory devices only send and send data to the client but now, they can
store and process the data to manage it efficiently.
Challenges & Examples of Distributed System:
The distributed information system is defined as “a number of interdependent computers
linked by a network for sharing information among them”. A distributed information
system consists of multiple autonomous computers that communicate or exchange
information through a computer network.
Design issues of distributed system –
1. Heterogeneity: Heterogeneity is applied to the network, computer hardware,
operating system and implementation of different developers. A key component
of the heterogeneous distributed system client-server environment is middleware.
Middleware is a set of services that enables application and end-user to interacts
with each other across a heterogeneous distributed system.
2. Openness: The openness of the distributed system is determined primarily by
the degree to which new resource-sharing services can be made available to the
users. Open systems are characterized by the fact that their key interfaces are
published. It is based on a uniform communication mechanism and published
interface for access to shared resources. It can be constructed from heterogeneous
hardware and software.
3. Scalability: Scalability of the system should remain efficient even with a
significant increase in the number of users and resources connected.
4. Security: Security of information system has three components Confidentially,
integrity and availability. Encryption protects shared resources, keeps sensitive
information secrets when transmitted.
5. Failure Handling: When some faults occur in hardware and the software
program, it may produce incorrect results or they may stop before they have
completed the intended computation so corrective measures should to
implemented to handle this case. Failure handling is difficult in distributed
systems because the failure is partial i, e, some components fail while others
continue to function.
6. Concurrency: There is a possibility that several clients will attempt to access a
shared resource at the same time. Multiple users make requests on the same
resources, i.e. read, write, and update. Each resource must be safe in a concurrent
environment. Any object that represents a shared resource in a distributed system
must ensure that it operates correctly in a concurrent environment.
7. Transparency: Transparency ensures that the distributes system should be
perceived as a single entity by the users or the application programmers rather
than the collection of autonomous systems, which is cooperating. The user should
be unaware of where the services are located and the transferring from a local
machine to a remote one should be transparent.
Inter-Process Communication:
Inter-Process Communication is a process of exchanging the data between two or more
independent process in a distributed environment is called as Inter-Process communication.
Inter-Process communication on the internet provides both Datagram and stream
communication.
Examples Of Inter-Process Communication:
1. N number of applications can communicate with the X server through network
protocols.
2. Servers like Apache spawn child processes to handle requests.
3. Pipes are a form of IPC: grep foo file | sort
It has two functions:
1. Synchronization:
Exchange of data is done synchronously which means it has a single clock pulse.
2. Message Passing:
When processes wish to exchange information. Message passing takes several
forms such as: pipes, FIFO, Shared Memory, and Message Queues.
Characteristics Of Inter-Process Communication:
There are mainly five characteristics of inter-process communication in a distributed
environment/system.
1. Synchronous System Calls:
In the synchronous system calls both sender and receiver use blocking system
calls to transmit the data which means the sender will wait until the
acknowledgment is received from the receiver and receiver waits until the
message arrives.
2. Asynchronous System Calls:
In the asynchronous system calls, both sender and receiver use non-blocking
system calls to transmit the data which means the sender doesn’t wait from the
receiver acknowledgment.
3. Message Destination:
A local port is a message destination within a computer, specified as an integer.
A port has exactly one receiver but many senders. Processes may use multiple
ports from which to receive messages. Any process that knows the number of a
port can send the message to it.
4. Reliability:
It is defined as validity and integrity.
5. Integrity:
Messages must arrive without corruption and duplication to the destination.
6. Validity:
Point to point message services are defined as reliable, If the messages are
guaranteed to be delivered without being lost is called validity.
7. Ordering:
It is the process of delivering messages to the receiver in a particular order.
Some applications require messages to be delivered in the sender order i.e the
order in which they were transmitted by the sender.
Internet Protocol APIs:
http://gvpcew.ac.in/LN-CSE-IT-22-32/CSE-IT/4-Year/42-DisSys/DSys-
ChVVDP-UNIT-2.pdf
External Data Representation and Marshalling:
A Distributed system consists of numerous components located on different machines that
communicate and coordinate operations to seem like a single system to the end-user.
External Data Representation:
Data structures are used to represent the information held in running applications. The
information consists of a sequence of bytes in messages that are moving between components
in a distributed system. So, conversion is required from the data structure to a sequence of
bytes before the transmission of data. On the arrival of the message, data should also be able
to be converted back into its original data structure.
Different types of data are handled in computers, and these types are not the same in every
position where data must be transmitted. Individual primitive data items can have a variety
of data values, and not all computers store primitive values like integers in the same order.
Different architectures also represent floating-point numbers differently. Integers are ordered
in two ways, big-endian order, in which the Most Significant Byte (MSB) is placed first, and
little-endian order, in which the Most Significant Byte (MSB) is placed last or the Least
Significant Byte (LSB) is placed first. Furthermore, one more issue is the set of codes used
to represent characters. Most applications on UNIX systems use ASCII character coding,
which uses one byte per character, whereas the Unicode standard uses two bytes per character
and allows for the representation of texts in many different languages.
There should be a means to convert all of this data to a standard format so that it can be sent
successfully between computers. If the two computers are known to be of the same type, the
external format conversion can be skipped otherwise before transmission, the values are
converted to an agreed-upon external format, which is then converted to the local format on
receiving. For that, values are sent in the sender’s format, along with a description of the
format, and the recipient converts them if necessary. It’s worth noting, though, that bytes are
never changed during transmission. Any data type that can be supplied as a parameter or
returned, as a result, must be able to be converted and the individual primitive data values
expressed in an accepted format to support Remote Procedure Call (RPC) or Remote Method
Invocation (RMI) mechanisms. So, an external data representation is a standard for
representing data structures and primitive values that have been agreed upon.
• Marshalling: Marshalling is the process of transferring and formatting a
collection of data structures into an external data representation type appropriate
for transmission in a message.
• Unmarshalling: The converse of this process is unmarshalling, which involves
reformatting the transferred data upon arrival to recreate the original data
structures at the destination.
Approaches:
There are three ways to successfully communicate between various sorts of data between
computers.
1. Common Object Request Broker Architecture (CORBA):
CORBA is a specification defined by the Object Management Group (OMG) that is currently
the most widely used middleware in most distributed systems. It allows systems with diverse
architectures, operating systems, programming languages, and computer hardware to work
together. It allows software applications and their objects to communicate with one
another. It is a standard for creating and using distributed objects. It is made up of five major
components. Components and their function are given below:
• Object Request Broker (ORB): It provides a communication infrastructure for
the objects to communicate across a network.
• Interface Definition Language (IDL): It is a specification language used to
provide an interface in a software component. To exemplify, it allows
communication between software components written in C++ and Java.
• Dynamic Invocation Interface (DII): Using DII, client applications are
permitted to use server objects without even knowing their types at compile time.
Here client obtains an instance of a CORBA object and then invocation requests
can be made dynamically on the corresponding object.
• Interface Repository (IR): As the name implies, interfaces can be added to the
interface repository. The purpose of IR is that a client should be able to find an
object which is not known at compile-time and information about its interface
then request is made to be sent to ORB.
• Object Adapter (OA): It is used to access ORB services like object reference
generation.
Data Representation in CORBA:
Common Data Representation (CDR) is used to describe structured or primitive data types
that are supplied as arguments or results during remote invocations on CORBA distributed
objects. It allows clients and servers’ built-in computer languages to communicate with one
another. To exemplify, it converts little-endian to big-endian.
There are 15 primitive types: short (16-bit), long (32-bit), unsigned short, unsigned long,
float (32-bit), double (64-bit), char, Boolean (TRUE, FALSE), octet (8-bit), and any (which
can represent any basic or constructed type), as well as a variety of composite types.
CORBA CDR Constructed Types:
Let’s have a look at Types with their representation:
• sequence: It refers to length (unsigned long) to be followed by elements in order
• string: It refers to length (unsigned long) followed by characters in order (can
also have wide characters)
• array: The elements of the array follow order and length is fixed so not specified.
• struct: in the order of declaration of components
• enumerated: It is unsigned long and here; the values are specified by the order
declared.
• union: type tag followed by the selected member
Example:
struct Person {
string name;
string place;
long year;
};
Marshalling CORBA:
From the specification of the categories of data items to be transmitted in a message,
Marshalling CORBA operations can be produced automatically. CORBA IDL describes the
types of data structures and fundamental data items and provides a language/notation for
specifying the types of arguments and results of RMI methods.
2. Java’s Object Serialization:
Java Remote Method Invocation (RMI) allows you to pass both objects and primitive data
values as arguments and method calls. In Java, the term serialization refers to the activity of
putting an object (an instance of a class) or a set of related objects into a serial format suitable
for saving to disk or sending in a message.
Java provides a mechanism called object serialization. This allows an object to be represented
as a sequence of bytes containing information about the object’s data and the type of object
and the type of data stored in the object. After the serialized object is written to the file, it
can be read from the file and deserialized. You can recreate an object in memory with type
information and bytes that represent the object and its data.
Moreover, objects can be serialized on one platform and deserialized on completely different
platforms as the whole process is JVM independent.
For example, the Java class equivalent to the Person struct defined in CORBA IDL might be:
• Java
import java.io.*;
public class Person implements Serializable {
public String name;
public String place;
public int phonenumber;
public void letter() {
System.out.println("Issue a letter to " + name + " " + place);
}
}
3. Extensible Markup Language (XML):
Clients communicate with web services using XML, which is also used to define the
interfaces and other aspects of web services. However, XML is utilized in a variety of
different applications, including archiving and retrieval systems; while an XML archive is
larger than a binary archive, it has the advantage of being readable on any machine. Other
XML applications include the design of user interfaces and the encoding of operating system
configuration files.
In contrast to HTML, which employs a fixed set of tags, XML is extensible in the sense that
users can construct their tags. If an XML document is meant to be utilized by several
applications, the tag names must be unique.
Clients, for example, typically interface with web servers via SOAP messages. SOAP is an
XML standard with tags that web services and their customers can utilize. Because it is
expected that the client and server sharing a message have prior knowledge of the order and
types of information it contains, some external data representations (such as CORBA CDR)
do not need to be self-describing. On the other hand, XML was designed to be utilized by a
variety of applications for a variety of reasons. This has been made possible by the inclusion
of tags and the usage of namespaces to specify the meaning of the tags. Furthermore, the
usage of tags allows applications to pick only the portions of a document that they need to
process.
Example:
XML definition of the Person struct:
<person id="9865">
<name>John</name>
<place>England</place>
<year>1876</year>
<!-- comment -->
</person>
Usage:
Marshalling is used to create various remote procedure call (RPC) protocols, where separate
processes and threads often have distinct data formats, necessitating the need for marshalling
between them.
To transmit data across COM object boundaries, the Microsoft Component Object Model
(COM) interface pointers employ marshalling. When a common-language-runtime-based
type has to connect with other unmanaged types via marshalling, the same thing happens in
the.NET framework. DCOM stands for Distributed Component Object Model.
Scripts and applications based on the Cross-Platform Component Object Model (XPCOM)
technology are two further examples where marshalling is crucial. The Mozilla Application
Framework makes heavy use of XPCOM, which makes considerable use of marshalling.
So, XML (Extensible Markup Language) is a text-based format for expressing structured
data. It was designed to represent data sent in messages exchanged by clients and servers in
web services
The primitive data types are marshalled into a binary form in the first two ways- CORBA
and Java’s object serialization. The primitive data types are expressed textually in the third
technique (XML). A data value’s textual representation will typically be longer than its
binary representation. The HTTP protocol is another example of the textual approach.
On the other hand, type information is included in both Java serialization and XML, but in
distinct ways. Although Java serializes all of the essential type information, XML documents
can refer to namespaces, which are externally specified groups of names (with types).
Client Server Communications:
Client/Server communication involves two components, namely a client and a server. They
are usually multiple clients in communication with a single server. The clients send requests
to the server and the server responds to the client requests.
There are three main methods to client/server communication. These are given as follows −
Sockets
Sockets facilitate communication between two processes on the same machine or different
machines. They are used in a client/server framework and consist of the IP address and port
number. Many application protocols use sockets for data connection and data transfer between
a client and a server.
Socket communication is quite low-level as sockets only transfer an unstructured byte stream
across processes. The structure on the byte stream is imposed by the client and server
applications.
A diagram that illustrates sockets is as follows −
Remote Procedure Calls
These are inter-process communication techniques that are used for client-server-based
applications. A remote procedure call is also known as a subroutine call or a function call.
A client has a request that the RPC translates and sends to the server. This request may be a
procedure or a function call to a remote server. When the server receives the request, it sends
the required response back to the client.
A diagram that illustrates remote procedure calls is given as follows −
Pipes
These are inter-process communication methods that contain two end points. Data is entered
from one end of the pipe by a process and consumed from the other end by the other process.
The two different types of pipes are ordinary pipes and named pipes. Ordinary pipes only allow
one way communication. For two-way communication, two pipes are required. Ordinary pipes
have a parent child relationship between the processes as the pipes can only be accessed by
processes that created or inherited them.
Named pipes are more powerful than ordinary pipes and allow two-way communication.
These pipes exist even after the processes using them have terminated. They need to be
explicitly deleted when not required anymore.
A diagram that demonstrates pipes are given as follows −
Group communications:
Communication between two processes in a distributed system is required to exchange
various data, such as code or a file, between the processes. When one source process tries to
communicate with multiple processes at once, it is called Group Communication. A group
is a collection of interconnected processes with abstraction. This abstraction is to hide the
message passing so that the communication looks like a normal procedure call. Group
communication also helps the processes from different hosts to work together and perform
operations in a synchronized manner, therefore increasing the overall performance of the
system.
Types of Group Communication in a Distributed System:
• Broadcast Communication: When the host process tries to communicate with
every process in a distributed system at same time. Broadcast communication
comes in handy when a common stream of information is to be delivered to each
and every process in most efficient manner possible. Since it does not require
any processing whatsoever, communication is very fast in comparison to other
modes of communication. However, it does not support a large number of
processes and cannot treat a specific process individually.
A broadcast Communication: P1 process communicating with every process in the system
• Multicast Communication: When the host process tries to communicate with a
designated group of processes in a distributed system at the same time. This
technique is mainly used to find a way to address problem of a high workload on
host system and redundant information from process in system. Multitasking can
significantly decrease time taken for message handling.
A multicast Communication: P1 process communicating with only a group of the process in
the system
• Unicast Communication: When the host process tries to communicate with a
single process in a distributed system at the same time. Although, same
information may be passed to multiple processes. This works best for two
processes communicating as only it has to treat a specific process only.
However, it leads to overheads as it has to find exact process and then exchange
information/data.
A unicast Communication: P1 process communicating with only P3 process
IPC in UNIX:
https://cs.brown.edu/people/slewando/files/IPCWinNTUNIX.pdf
UNIT 2
Distributed Objects and Remote Method Invocation:
Distributed objects are objects that are distributed across different address spaces, either in
multiple computers connected via a network or even indifferent processes on the same
computer, but which work together by sharing data and invoking methods. This often
involves location transparency, where remote objects appear the same as local objects.
The main method of distributed object communication is with remote method invocation.
Invoking a method on a remote object is known as remote method invocation (generally by
message-passing).
Message-passing: One object sends a message to another object in a remote machine or
process to perform some tasks. The results are sent back to the calling object.
The remote procedure call (RPC) approach extends the common programming abstraction of
the procedure call to distributed environments, allowing a calling Process to call a procedure
in a remote node as if it is local. Remote method invocation (RMI) is similar to RPC but for
distributed objects, with added benefits in terms of using object-oriented programming
concepts in Distributed systems and also extending the concept of an object reference to the
Global distributed environments, and allowing the use of object references as Parameters in
remote invocations.
Remote procedure call (RPC): Client calls the procedures in a server program that is running
in a different process.
Remote method invocation (RMI): An object in one process can invoke methods of objects in
another process.
Event notification: Objects receive notification of events at other objects for which they have
registered.
Middleware Roles: Provide high-level abstractions such as RMI enable location transparency
free from specifics of communication protocols operating systems and communication
hardware
Communication between distributed objects:
In a distributed computing environment, distributed object communication realizes
communication between distributed objects. The main role is to allow objects to access data
and invoke methods on remote objects (objects residing in non-local memory space). Invoking
a method on a remote object is known as remote method invocation (RMI) or remote
invocation, and is the object-oriented programming analogy of a remote procedure call (RPC).
Stub
The client-side object participating in distributed object communication is known as
a stub or proxy, and is an example of a proxy object.
The stub acts as a gateway for client-side objects and all outgoing requests to server-side
objects that are routed through it. The stub wraps client object functionality and by adding the
network logic ensures the reliable communication channel between client and server. The stub
can be written up manually or generated automatically depending on chosen communication
protocol.
The stub is responsible for:
• initiating the communication towards the server skeleton
• translating calls from the caller object
• marshalling of the parameters
• informing the skeleton that the call should be invoked
• passing arguments to the skeleton over the network
• unmarshalling of the response from the skeleton
• informing the caller that the call is complete
Skeleton
The server-side object participating in distributed object communication is known as
a skeleton (or stub; term avoided here).
A skeleton act as gateway for server-side objects and all incoming clients’ requests are routed
through it. The skeleton wraps server object functionality and exposes it to the clients,
moreover by adding the network logic ensures the reliable communication channel between
clients and server. Skeletons can be written up manually or generated automatically depending
on chosen communication protocol.
The skeleton is responsible for:
• translating incoming data from the stub to the correct up-calls to server objects
• unmarshalling of the arguments from received data
• passing arguments to server objects
• marshalling of the returned values from server objects
• passing values back to the client stub over the network
Class stubs and skeletons
The widely used approach on how to implement the communication channel is realized by
using stubs and skeletons. They are generated objects whose structure and behaviour depends
on chosen communication protocol, but in general provide additional functionality that ensures
reliable communication over the network.
In RMI, a stub (which is the bit on the client) is defined by the programmer as an interface.
The RMIC (RMI Compiler) uses this to create the class stub. The stub performs type checking.
The skeleton is defined in a class which implements the interface stub.
When a caller wants to perform remote call on the called object, it delegates requests to
its stub which initiates communication with the remote skeleton. Consequently, the stub passes
caller arguments over the network to the server skeleton. The skeleton then passes received
data to the called object, waits for a response and returns the result to the client stub. Note that
there is no direct communication between the caller and the called object.
In more details, the communication consists of several steps:
1. caller calls a local procedure implemented by the stub
2. stub marshalls call type and the input arguments into a request message
3. client stub sends the message over the network to the server and blocks the
current execution thread
4. server skeleton receives the request message from the network
5. skeleton unpacks call type from the request message and looks up
the procedure on the called object
6. skeleton unmarshalls procedure arguments
7. skeleton executes the procedure on the called object
8. called object performs a computation and returns the result
9. skeleton packs the output arguments into a response message
10. skeleton sends the message over the network back to the client
11. client stub receives the response message from the network
12. stub unpacks output arguments from the message
13. stub passes output arguments to the caller, releases execution thread and caller
then continues in execution
The advantage of this architecture is that neither the caller nor the called object has to
implement network related logic. This functionality, that ensures reliable communication
channel over the network, has been moved to the stub and the skeleton layer.
Distributed object model:
https://help.sap.com/saphelp_SCM700_ehp02/helpdata/en/a4/38de11fc7d7b
42a18c1a244b973b0e/content.htm?no_cache=true#:~:text=The%20distribu
ted%20applications%20consist%20of,by%20obtaining%20references%20
to%20them.
Design issues of RMI:
https://www.studocu.com/in/document/pes-university/distributed-
systems/remote-method-invocation/18392713
Implementation of RMI:
The RMI implementation consists of three abstraction layers.
These abstraction layers are:
1. The Stub and Skeleton layer, which intercepts method calls made by the client to the
interface reference variable and redirects these calls to a remote RMI service.
2. The Remote Reference layer understands how to interpret and manage references
made from clients to the remote service objects.
3. The bottom layer is the Transport layer, which is based on TCP/IP connections
between machines in a network. It provides basic connectivity, as well as some
firewall penetration strategies.
Distributed Garbage collection:
Distributed garbage collection (DGC) in computing is a particular case of garbage
collection where a remote client can hold references to an object.
DGC uses some combination of the classical garbage collection (GC) techniques, tracing
and reference counting. It has to cooperate with local garbage collectors in each process in
order to keep global counts, or to globally trace accessibility of data. In general, remote
processors do not have to know about internal counting or tracing in a given process, and the
relevant information is stored in interfaces associated with each process.
DGC is complex and can be costly and slow in freeing memory. As a cheap way of avoiding
DGC algorithms, one can rely on a time lease – set or configured on the remote object; it is
the stub's task to periodically renew the lease on the remote object. If the lease has expired, the
server process (the process owning the remote object) can safely assume that either the client
is no longer interested in the object, or that a network partition or crash obstructed lease
renewal, in which case it is "hard luck" for the client if it is in fact still interested. Hence, if
there is only a single reference to the remote object on the server representing a remote
reference from that client, that reference can be dropped, which will mean that the local garbage
collector on the server will garbage-collect the object at some future point in time.
The RMI subsystem implements reference counting based Distributed Garbage Collection
(DGC) to provide automatic memory management facilities for remote server objects.
When the client creates (unmarshalls) a remote reference, it calls dirty () on the server-side
DGC. After the client has finished with the remote reference, it calls the corresponding clean
() method.
A reference to a remote object is leased for a time by the client holding the reference. The lease
period starts when the dirty () call is received. The client must renew the leases by making
additional dirty () calls on the remote references it holds before such leases expire. If the client
does not renew the lease before it expires, the distributed garbage collector assumes that the
remote object is no longer referenced by that client.
Remote Procedure Call:
Remote Procedure Call (RPC) is a communication technology that is used by one program
to make a request to another program for utilizing its service on a network without even
knowing the network’s details. A function call or a subroutine call are other terms for a
procedure call.
It is based on the client-server concept. The client is the program that makes the request, and
the server is the program that gives the service. An RPC, like a local procedure call, is based
on the synchronous operation that requires the requesting application to be stopped until the
remote process returns its results. Multiple RPCs can be executed concurrently by utilizing
lightweight processes or threads that share the same address space. Remote Procedure Call
program as often as possible utilizes the Interface Definition Language (IDL), a
determination language for describing a computer program component’s Application
Programming Interface (API). In this circumstance, IDL acts as an interface between
machines at either end of the connection, which may be running different operating systems
and programming languages.
Working Procedure for RPC Model:
• The process arguments are placed in a precise location by the caller when the
procedure needs to be called.
• Control at that point passed to the body of the method, which is having a series of
instructions.
• The procedure body is run in a recently created execution environment that has
duplicates of the calling instruction’s arguments.
• At the end, after the completion of the operation, the calling point gets back the
control, which returns a result.
• The call to a procedure is possible only for those procedures that are
not within the caller’s address space because both processes (caller and
callee) have distinct address space and the access is restricted to the
caller’s environment’s data and variables from the remote procedure.
• The caller and callee processes in the RPC communicate to exchange
information via the message-passing scheme.
• The first task from the server-side is to extract the procedure’s
parameters when a request message arrives, then the result, send a
reply message, and finally wait for the next call message.
• Only one process is enabled at a certain point in time.
• The caller is not always required to be blocked.
• The asynchronous mechanism could be employed in the RPC that
permits the client to work even if the server has not responded yet.
• In order to handle incoming requests, the server might create a thread
that frees the server for handling consequent requests.
Types of RPC:
Call-back RPC: In a Call-back RPC, a P2P (Peer-to-Peer) paradigm opts between
participating processes. In this way, a process provides both client and server functions which
are quite helpful. Call-back RPC’s features include:
• The problems encountered with interactive applications that are handled remotely
• It provides a server for clients to use.
• Due to the call-back mechanism, the client process is delayed.
• Deadlocks need to be managed in call-backs.
• It promotes a Peer-to-Peer (P2P) paradigm among the processes involved.
RPC for Broadcast: A client’s request that is broadcast all through the network and handled
by all servers that possess the method for handling that request is known as a broadcast RPC.
Broadcast RPC’s features include:
• You have an option of selecting whether or not the client’s request message ought
to be broadcast.
• It also gives you the option of declaring broadcast ports.
• It helps in diminishing physical network load.
Batch-mode RPC: Batch-mode RPC enables the client to line and separate RPC inquiries in
a transmission buffer before sending them to the server in a single batch over the network.
Batch-mode RPC’s features include:
• It diminishes the overhead of requesting the server by sending them all at once
using the network.
• It is used for applications that require low call rates.
• It necessitates the use of a reliable transmission protocol.
Local Procedure Call Vs Remote Procedure Call:
• Remote Procedure Calls have disjoint address space i.e., different address space,
unlike Local Procedure Calls.
• Remote Procedure Calls are more prone to failures due to possible processor
failure or communication issues of a network than Local Procedure Calls.
• Because of the communication network, remote procedure calls take longer than
local procedure calls.
Advantages of Remote Procedure Calls:
• The technique of using procedure calls in RPC permits high-level languages to
provide communication between clients and servers.
• This method is like a local procedure call but with the difference that the called
procedure is executed on another process and a different computer.
• The thread-oriented model is also supported by RPC in addition to the process
model.
• The RPC mechanism is employed to conceal the core message passing method.
• The amount of time and effort required to rewrite and develop the code is minimal.
• The distributed and local environments can both benefit from remote procedure
calls.
• To increase performance, it omits several of the protocol layers.
• Abstraction is provided via RPC. To exemplify, the user is not known about the
nature of message-passing in network communication.
• RPC empowers the utilization of applications in a distributed environment.
Disadvantages of Remote Procedure Calls:
• In Remote Procedure Calls parameters are only passed by values as pointer values
are not allowed.
• It involves a communication system with another machine and another process,
so this mechanism is extremely prone to failure.
• The RPC concept can be implemented in a variety of ways, hence there is no
standard.
• Due to the interaction-based nature, there is no flexibility for hardware
architecture in RPC.
• Due to a remote procedure call, the process’s cost has increased.
Sun RPC:
The fundamental building block of all network information systems is a mechanism for
performing remote procedure calls. This mechanism, usually called RPC, allows a program
running on one computer to more-or-less transparently execute a function that is actually
running on another computer.
RPC systems can be categorized as blocking systems, which cause the calling program to
cease execution until a result is returned, or as non-blocking (asynchronous systems), which
means that the calling program continues running while the remote procedure call is
performed. (The results of a non-blocking RPC, if they are returned, are usually provided
through some type of call-back scheme.)
RPC allows programs to be distributed: a computationally intensive algorithm can be run on a
high-speed computer, a remote sensing device can be run on another computer, and the
results can be compiled on a third. RPC also makes it easy to create network-based
client/server programs: the clients and servers communicate with each other using remote
procedure calls.
One of the first UNIX remote procedure call systems was developed by Sun Microsystems
for use with NIS and NFS. Sun's RPC uses a system called XDR (external data
representation), to represent binary information in a uniform manner and bit
order. XDR allows a program running on a computer with one byte order, such as
a SPARC workstation, to communicate seamlessly with a program running on a computer
with an opposite byte order, such as a workstation with an Intel x86
microprocessor. RPC messages can be sent with either the TCP or UDP IP protocols
(currently, the UDP version is more common). After their creation by
Sun, XDR and RPC were reimplemented by the University of California at Berkeley and are
now freely available.
Sun's RPC is not unique. A different RPC system is used by the Open Software
Foundation's Distributed Computing Environment (DCE). Yet another RPC system has been
proposed by the Object Management Group. Called CORBA (Common Object Request
Broker Architecture), this system is optimized for RPC between object-oriented programs
written in C++ or Small Talk.
Java RMI:
Java Remote Method Invocation (Java RMI) enables you to create distributed Java technology-
based applications that can communicate with other such applications. Methods of remote Java
objects can be run from other Java virtual machines (JVMs), possibly on different hosts.
RMI uses object serialization to marshal and unmarshal parameters and does not truncate types,
supporting object-oriented polymorphism. The RMI registry is a lookup service for ports.
• The RMI implementation:
Java Remote Method Invocation (RMI) provides a simple mechanism for distributed
Java programming. The RMI implementation consists of three abstraction layers.
• Thread pooling for RMI connection handlers:
When a client connects to the server socket, a new thread is forked to deal with the
incoming call.
• Understanding distributed garbage collection:
The RMI subsystem implements reference counting based Distributed Garbage
Collection (DGC) to provide automatic memory management facilities for remote
server objects.
• Debugging applications involving RMI:
When debugging applications involving RMI you need information on exceptions and
properties settings, solutions to common problems, answers to frequently asked
questions, and useful tools.
UNIT 3
UNIT 4