Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views32 pages

Final Distributed Systems

The document provides an extensive overview of distributed systems, covering their definitions, key characteristics, types, design goals, and various aspects of transparency. It discusses architectural styles such as Client-Server and Peer-to-Peer, middleware roles, and the importance of reliability, scalability, and fault tolerance. Additionally, it addresses practical examples like the World Wide Web and delves into technical concepts like remote procedure calls, naming systems, and consistency models.

Uploaded by

aimakeryt99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views32 pages

Final Distributed Systems

The document provides an extensive overview of distributed systems, covering their definitions, key characteristics, types, design goals, and various aspects of transparency. It discusses architectural styles such as Client-Server and Peer-to-Peer, middleware roles, and the importance of reliability, scalability, and fault tolerance. Additionally, it addresses practical examples like the World Wide Web and delves into technical concepts like remote procedure calls, naming systems, and consistency models.

Uploaded by

aimakeryt99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

1.​ What is a distributed system?

Explain its key characteristics and discuss the


different types of distributed systems with examples.
2.​ What are the main design goals for building a distributed system? Explain
transparency, scalability, and reliability in detail.
3.​ Explain the different aspects of distribution transparency in brief (e.g.,
access, location, failure transparency).
4.​ Differentiate between a Network Operating System and a Distributed
Operating System.
5.​ How does the World Wide Web serve as a practical example of a distributed
system?
6.​ What is middleware? Explain its role and general organization in a distributed
system.
7.​ What are the different architectural styles of distributed systems? Explain
Client-Server, Peer-to-Peer, and Layered architectures in detail with suitable
diagrams.
8.​ Differentiate between stateful and stateless servers with examples.
9.​ What is virtualization? Explain its role in distributed systems and describe the
different types of virtualization.
10.​ Explain the difference between a thread and a process in the context of a
distributed system.
11.​ How is process resilience achieved through groups in a distributed
system?
12.​ What is code migration? Explain the different models for code migration
(e.g., weak vs. strong mobility).
13.​ Explain Remote Procedure Call (RPC) and its working process with a
suitable diagram. Also, discuss its advantages and disadvantages.
14.​ Differentiate Remote Procedure Call (RPC) from Message-Oriented
Communication, providing an illustrative example for each.
15.​ Differentiate between persistent and transient communication.
16.​ Explain multicast communication and its use cases.
17.​ What is the purpose of a naming system? Explain the different entity
naming schemes (e.g., name, address, identifier).
18.​ Explain structured naming and attribute-based naming with examples.
19.​ Explain the process of name resolution in a distributed system.
20.​ What is a logical clock? Distinguish between Lamport's timestamp and
Vector clock.

click and join telegram group for more


21.​ Why is clock synchronization needed in a distributed system? Different
Between Berkeley algorithm and Network Time Protocol (NTP) for physical
clock synchronization.
22.​ What is an election algorithm? Explain the Bully algorithm and the Ring
algorithm with a suitable diagram showing the election process.
23.​ What is mutual exclusion and how is it maintained in distributed
environments?
24.​ Explain the concept of gossip-based coordination.
25.​ What is the role of a coordinator in a distributed system?
26.​ What is replication and why is it used? Explain the different data-centric
consistency models (e.g., Strict, Sequential, Causal, Eventual).
27.​ Explain client-centric consistency models and differentiate them from
data-centric models.
28.​ Difference between continuous consistency and sequential consistency.
29.​ What is replica management and why is it important?
30.​ What are the different types of failures that can occur in a distributed
system? Explain the methods used to recover from a crash.
31.​ Define distributed commit. Explain the two-phase commit (2PC)
protocol in detail with a diagram.
32.​ What is a fault and an error? Differentiate between them.
33.​ Explain reliable client-server communication.
34.​ Explain Authentication and Authorization in a distributed system. Difference
between Authentication and Authorization
35.​ Describe the key security challenges in a distributed system, including
concepts like secure channels and access control.
36.​ Write short notes on the following:
○​ Message Passing Interface (MPI)
○​ Access Control Matrix
○​ Denial of Service (DoS) attacks
○​ Firewall
○​ Secure Naming
○​ Public Key Cryptography (Asymmetric Cryptography)
○​ Idempotent Operations
○​ Distributed Hash Table (DHT)
○​ Content Delivery Network (CDN)
○​ KerberosLocation Systems
○​ Distributed Event Matching
○​ Caching and Replication in the Web
2
1: What is a distributed system? Explain its key characteristics and discuss
the different types of distributed systems with examples.

A distributed system is a collection of independent, autonomous computers that are


connected by a network and equipped with software that enables them to coordinate
their actions and share resources. To its users, a well-designed distributed system
appears as a single, coherent, and powerful computer.

Key Characteristics of a Distributed System:


1.​ Resource Sharing: Users can share hardware (like printers, storage),
software (like databases, files), and data across the network. This is the
primary motivation for building distributed systems.
2.​ Concurrency: Multiple processes or users can operate concurrently on
different computers in the network, accessing shared resources
simultaneously. This requires careful management to avoid conflicts.
3.​ Scalability: The system can easily be expanded by adding more computers to
the network without a significant drop in performance. It can be scaled in three
dimensions: size (more users/resources), geography (across distances), and
administration (across different organizations).
4.​ Fault Tolerance (Reliability): The system can continue to function, perhaps
at a reduced level, even if some of its hardware or software components fail.
This is typically achieved through redundancy (replicating components and
data).
5.​ Openness: The system is built on standard protocols and interfaces, making it
easy to extend and modify. It allows components from different vendors to
interoperate. For example, a client on Windows can communicate with a
server on Linux.
6.​ Transparency: This is a crucial goal where the system hides its distributed
and complex nature from the users. Users interact with it as if it were a single
system. (This is detailed in the next answer).

Types of Distributed Systems


1.​ Client-Server Systems: A centralized model where client processes request
services and resources from a server process, which provides a response.
This architecture follows a simple request-response communication pattern.
2.​ Peer-to-Peer Systems: A decentralized model where all nodes (peers) are
equal and act as both a client and a server, communicating directly with one
another. This design is highly scalable and fault-tolerant as there is no single
point of failure.
3.​ Middleware: A software layer that acts as a bridge to connect different
applications, hiding network complexity and enabling communication between
them. It provides common services like Remote Procedure Call (RPC) to make
programming easier.
4.​ Three-Tier: An architecture that separates an application into three logical
layers: a Presentation Tier (the UI), an Application Tier (business logic), and a
Data Tier (database). This separation allows each layer to be developed and
scaled independently.
5.​ N-Tier: An extension of the three-tier model that splits an application into more
than three layers to handle greater complexity and improve scalability. It is
commonly used for large-scale enterprise systems and microservices.

1
2: What are the main design goals for building a distributed system? Explain
transparency, scalability, and reliability in detail.
The main design goals for building a distributed system are to make it easy to use,
powerful, and robust. These goals can be summarized as achieving Transparency,
Scalability, and Reliability.
1. Transparency​
Transparency (also called "invisibility") is the concealment of the separation of
components in a distributed system from the user and the application programmer.
The aim is to make the system appear as a single computer.
●​ Access Transparency: Hides differences in data representation and how a
resource is accessed. Example: A user accesses a file without knowing if it's
stored on a Linux (ext4) or Windows (NTFS) file system.
●​ Location Transparency: Hides where a resource is physically located.
Example: A URL like http://www.google.com/index.html does not reveal the
server's actual IP address or location.
●​ Replication Transparency: Hides the fact that a resource is replicated
(copied) across multiple computers. The user interacts with what they believe
is a single copy.
●​ Failure Transparency: Hides the failure and recovery of a resource. If a
server fails, the user's request might be automatically redirected to a backup
server without the user ever knowing.
2. Scalability​
Scalability is the ability of the system to handle a growing amount of work by adding
resources. A scalable system maintains its performance and efficiency even as the
number of users, objects, or computers increases.
●​ Size Scalability: The ability to add more users and resources without
affecting performance. Example: Gmail can support millions of new users
without slowing down for existing ones.
●​ Geographical Scalability: The system remains effective even when users
and resources are far apart. This is challenging due to communication delays
(latency). Example: Content Delivery Networks (CDNs) place data closer to
users worldwide to reduce latency.
●​ Administrative Scalability: The system remains easy to manage even if it
spans multiple independent administrative organizations with different policies.
3. Reliability (and Availability)​
Reliability ensures that the system can be trusted to work correctly and continue
functioning even when faults occur.
●​ Availability: The property of a system being ready for immediate use. It is the
fraction of time the system is operational. A highly available system is
accessible most of the time (e.g., 99.999% uptime).
●​ Fault Tolerance: The ability of a system to provide its services even in the
presence of faults (e.g., server crashes, network disconnections). This is often
achieved through redundancy—having backup components or data.

2
3: Explain the different aspects of distribution transparency in brief.
Distribution transparency is the property of a distributed system that hides its
distributed nature from users and programmers. The goal is to make the system
look and feel like a single, centralized system.
The key aspects (or types) of transparency are:
1.​ Access Transparency: Hides differences in data representation and the way
resources are accessed. For example, a program should be able to access a
local file and a remote file using the exact same read() or write() operations.
2.​ Location Transparency: Hides the physical location of a resource. Users and
applications do not need to know the IP address or host name where a
resource (like a file or database) resides. They use a logical name, and the
system resolves it.
3.​ Migration (or Mobility) Transparency: Hides the fact that a resource or a
process may move from one physical location to another while it is in use. For
example, a mobile user's session should continue uninterrupted even when
their device switches from a Wi-Fi network to a cellular network.
4.​ Replication Transparency: Hides the existence of multiple copies (replicas)
of a resource. The user interacts with the resource as if it were a single entity,
while the system manages the replicas in the background for performance and
reliability.
5.​ Concurrency Transparency: Hides the fact that a resource may be shared
by several competing users simultaneously. The system ensures that
concurrent access is managed correctly (e.g., through locking mechanisms) so
that the resource remains in a consistent state.
6.​ Failure Transparency: Hides the failure and recovery of a component. If a
server fails, the system might automatically redirect requests to a backup
server without the user noticing anything more than a slight delay.

4: Differentiate between a Network Operating System and a Distributed


Operating System.
Feature Network Operating System (NOS) Distributed Operating System (DOS)
1. Core A collection of autonomous A collection of autonomous
Concept computers that are aware of each computers that act as a single virtual
other. computer.
2. Each computer runs its own local A single, system-wide operating
Autonom OS and remains highly system manages all computers in
y & OS autonomous. the system.
3. Users explicitly access remote Resources are managed globally
Resource resources by logging into the and accessed transparently. Users
Access remote machine (e.g., don't know or care where a resource
\\server\share). It is not is located.
transparent.
4. User Users perceive the system as a Users perceive the system as a
Perceptio collection of distinct machines single, powerful, centralized
n connected by a network. computer.

3
5. Fault The failure of one node does not Higher fault tolerance is a key design
Tolerance affect others, but services on the goal. The system can often
failed node become unavailable. automatically migrate tasks from a
failed node to a working one.

5: How does the World Wide Web serve as a practical example of a distributed
system?
The World Wide Web (WWW) is arguably the largest and most successful
distributed system in existence. It exhibits all the key characteristics of a distributed
system:
1.​ Vast Resource Sharing: The WWW is built on sharing resources (web pages,
images, videos, services) on a global scale. These resources are stored on
millions of different servers worldwide.
2.​ Client-Server Architecture: It operates on a classic client-server model. Web
browsers (clients) request resources from web servers (servers) using the
HTTP protocol.
3.​ Heterogeneity: The WWW is extremely heterogeneous. Servers run on
different hardware (Intel, ARM) and operating systems (Linux, Windows). Web
pages are created using different technologies. This heterogeneity is managed
through open, standard protocols like HTTP, HTML, and TCP/IP, which act as
a form of middleware.
4.​ Massive Scalability: The Web has scaled to billions of users and servers.
This is managed through the Domain Name System (DNS), a distributed
database that maps human-readable names (like www.google.com) to
machine-readable IP addresses.
5.​ Openness: Its success is due to its open standards. Anyone can create a web
page, set up a web server, or build a web browser as long as they adhere to
the public standards (HTTP, URL, HTML).
6.​ Fault Tolerance (Partial): The Web is resilient to failures. If a single web
server or network path goes down, it does not bring down the entire Web.
Other parts remain accessible. However, it does not provide strong failure
transparency (if a site is down, you get a "404 Not Found" error).

6: What is middleware? Explain its role and general organization in a


distributed system.
Middleware is a layer of software that sits between the operating system and the
application programs on each computer in a distributed system. Its primary purpose
is to hide the complexity and heterogeneity of the underlying distributed environment
from the application developer.
Role of Middleware
1.​ Hide Heterogeneity: It masks differences between operating systems,
hardware platforms, and programming languages. An application component
written in Java on Linux can communicate seamlessly with a component
written in C# on Windows.
2.​ Provide High-Level Communication Facilities: Instead of forcing
developers to use low-level sockets, middleware offers high-level
communication paradigms like Remote Procedure Calls (RPC), Message
Queues (MQ), and event notification systems.

4
3.​ Offer Common Services: Middleware often provides essential services that
many distributed applications need, such as:
○​ Naming services: To locate other processes or resources in the
network.
○​ Security services: For authentication, authorization, and encryption.
○​ Transaction services: To ensure operations are completed reliably.
○​ Persistence services: For storing application data.
4.​ Enhance Transparency: Middleware is the key to achieving many forms of
distribution transparency (location, access, failure, etc.), making the distributed
system look like a single machine to the programmer.
General Organization
Middleware is often organized as a layer that extends the services of the local
operating system on each machine, providing a more powerful and uniform interface
for applications.
Example: CORBA (Common Object Request Broker Architecture) and Java RMI
(Remote Method Invocation) are classic examples of middleware.

7: What are the different architectural styles of distributed systems? Explain


Client-Server, Peer-to-Peer, and Layered architectures in detail with suitable
diagrams.
Architectural styles are high-level patterns for organizing the components of a
distributed system.
1. Client-Server Architecture​
This is the most common architectural style. Processes are divided into two roles:
servers, which provide a service, and clients, which use that service.
●​ How it works: A server process waits for incoming requests on a known port.
A client process sends a request to the server, waits for a response, and then
continues its work. Communication is one-to-one between a client and a
server.
Diagram:

●​ Example: Web browsing (browser is the client, web server is the server),
databases.
2. Peer-to-Peer (P2P) Architecture​
In P2P architecture, all processes (or nodes) are considered equal and play the role
of both client and server simultaneously. Each node, called a peer, contributes
resources (processing power, storage, bandwidth) to the network.
●​ How it works: Peers communicate directly with each other without the need
for a central server. They form an overlay network on top of the physical
internet to discover each other and exchange data.
5
Diagram:

Example: BitTorrent (file sharing), Skype (early versions for voice calls),
Cryptocurrencies (Bitcoin, Ethereum).
3. Layered Architecture​
In this style, components are organized into a series of layers. Each layer provides a
service to the layer above it and uses services from the layer below it. Requests and
responses flow down and up through the layers.
●​ How it works: A request from a top-level layer is passed down through
successive layers, with each layer adding its own logic or data, until it reaches
the bottom. The response travels back up the stack.
Diagram:​

●​ Example: The OSI network model, three-tier web application architecture


(Presentation Layer -> Application Logic Layer -> Data Layer).

6
8: Differentiate between stateful and stateless servers with examples.
A key design choice for servers in a distributed system is whether to maintain
information about past interactions with clients. This leads to two types of servers:
stateful and stateless.
Feature Stateless Server Stateful Server
1. Client Does not keep any information Maintains a record of client
Information (state) about past client interactions and their current state.
requests.
2. Request Each request is treated as Uses stored state information to
Handling independent and must contain process new requests from a client.
all information needed to
process it.
3. Failure Recovery from a server crash Recovery is complex. The server's
Recovery is simple. A new server can state must be restored before it can
immediately take over since no resume serving clients.
state is lost.
4. Generally more scalable, as Less scalable, as a client may need
Scalability any server can handle any to be "stuck" to the specific server
client's request, making load that holds its state.
balancing easy.
5. Example An HTTP Web Server. Each A File Server that allows clients to
HTTP request is independent. open a file, read parts of it, and then
The server forgets about the close it. The server must remember
client after sending the which file is open for which client.
response.

9: What is virtualization? Explain its role in distributed systems and describe


the different types of virtualization.
Virtualization is the technology of creating a virtual (rather than actual) version of
something, such as a hardware platform, operating system, storage device, or
network resource. In the context of servers, it involves using a special type of
software called a hypervisor to create and run multiple isolated virtual machines
(VMs) on a single physical machine.

Role of Virtualization in Distributed Systems


1.​ Portability and Flexibility: A VM encapsulates an entire environment (OS +
application). This VM can be easily moved (migrated) from one physical
machine to another, even while it's running. This is crucial for load balancing
and maintenance.
2.​ Isolation: VMs are strongly isolated from each other. A crash or security
breach in one VM does not affect other VMs running on the same physical
host. This improves the overall reliability and security of the distributed system.
3.​ Hardware Abstraction: It hides the specifics of the underlying hardware from
the applications. An application designed to run on a specific OS can be run
on any hardware that supports the hypervisor.

7
4.​ Resource Consolidation: Multiple servers can be consolidated onto a single
physical machine, saving power, cooling, and space. This is a key principle of
modern cloud computing data centers.
5.​ Simplified Deployment: New servers (as VMs) can be provisioned and
deployed in minutes by cloning a template, which drastically speeds up the
process of scaling a distributed application.

Types of Virtualization
1.​ Hardware Virtualization (or Platform Virtualization): This is the most
common type. The hypervisor creates a complete virtual hardware platform for
each VM. Each VM can then run its own full-fledged guest operating system.
○​ Type 1 (Bare-Metal) Hypervisor: Runs directly on the physical
hardware. Examples: VMware ESXi, Microsoft Hyper-V.
○​ Type 2 (Hosted) Hypervisor: Runs as an application on top of a
conventional host operating system. Examples: Oracle VirtualBox,
VMware Workstation.
2.​ Operating System-Level Virtualization (Containerization): The host OS
kernel is shared by multiple isolated user-space instances called containers.
Containers are more lightweight and have less overhead than VMs because
they don't need a full guest OS. Example: Docker, Kubernetes, LXC.

10: Explain the difference between a thread and a process in the context of a
distributed system.
Feature Process Thread
1. Memory & Has its own private memory Shares the memory address space
Resources address space and and resources of its parent process.
dedicated system resources.
2. Isolation & Highly isolated. The crash of Not isolated. If one thread crashes, it
Faults one process does not affect brings down the entire parent
other processes. process, including all its other
threads.
3. Inter-Process Inter-Thread Communication is
Communicat Communication (IPC) is simple and fast, as they can directly
ion complex and slower (e.g., via access shared memory.
sockets).
4. Creation "Heavyweight." Slower and "Lightweight." Faster and cheaper to
Cost more resource-intensive for create.
the OS to create.
5. Typical Used for running separate, Used within a single server to handle
Use Case isolated services or multiple client requests concurrently
applications on a server. and efficiently.

8
11: How is process resilience achieved through groups in a distributed
system?
Process resilience is the ability of a system to continue functioning correctly even
when some of its processes fail. This is a form of fault tolerance. A key technique to
achieve this is by organizing processes into groups.
How Resilience is Achieved:
1.​ Replication: The service or data is replicated across multiple processes in the
group. All processes in the group are functionally identical (they can perform
the same tasks).
○​ Flat Groups: All processes are peers. Decisions are made collectively.
This is democratic but complex.
○​ Hierarchical Groups: One process acts as a coordinator (or primary),
and others are backups (or secondaries). This is simpler to manage.
2.​ Failure Detection: The group members need a mechanism to detect when
another member has failed. This is often done using "heartbeat" messages,
where each process periodically sends an "I'm alive" message to others. If a
heartbeat is missed for a certain period, the process is assumed to have
crashed.
3.​ Takeover Mechanism: When a failure is detected, the group must react.
○​ In a hierarchical group, if the primary fails, an election algorithm (like
the Bully or Ring algorithm) is run among the backups to choose a new
primary. The new primary then takes over all responsibilities.
○​ In a flat group, the remaining processes may need to re-distribute the
workload of the failed process among themselves.

12: What is code migration? Explain the different models for code migration
(e.g., weak vs. strong mobility).
Code migration is the mechanism of moving a process, or a part of it (like a specific
function or object), from one machine to another in a distributed system. The key
idea is to move the computation closer to the data or resources it needs, rather than
moving large amounts of data to the computation.

Reasons for Code Migration:


●​ Performance: Move computation to the machine where the required data is
stored to reduce network traffic.
●​ Load Balancing: Distribute processes across less-loaded machines to
improve overall system throughput.
●​ Flexibility: Dynamically deploy new software components to remote machines
without re-installation.

Models for Code Migration​


The models are distinguished by what exactly is transferred from the source
machine to the target machine.
1.​ Weak Mobility Model:
○​ What is transferred: Only the code segment of a process is transferred,
along with some initialization data.
○​ Execution State: The execution state (e.g., the current value of
variables, the program counter, the stack) is not transferred.
○​ How it works: When the code arrives at the target machine, it starts
executing from the beginning. It's like running a fresh copy of the
program.
9
○​ Example: Java applets. The applet's code is downloaded from a web
server to the client's browser and starts running from scratch.
2.​ Strong Mobility Model:
○​ What is transferred: Both the code segment and the execution state
are transferred.
○​ Execution State: The entire state of the running process, including the
program counter, registers, open files, and the contents of the stack and
heap, is captured and moved.
○​ How it works: When the process arrives at the target machine, its
execution state is restored, and it resumes running from the exact point
where it left off on the source machine.
○​ Example: This is much more complex to implement and less common in
practice. It's a key feature of process migration in some
high-performance computing systems.

Difference between Weak and Strong Mobility


Feature Weak Mobility Strong Mobility
1. Only the code segment and Code segment AND the entire
Components initialization data. execution state.
Moved
2. Execution Always starts execution Resumes execution from the exact
Start from the beginning. point it was paused.
3. Relatively simple to Very complex to implement, especially
Complexity implement. in heterogeneous systems.
4. Autonomy The migrated code must be The process continues as if nothing
able to run independently. happened.
5. Common Java Applets, Mobile Process migration in cluster
Example Agents. management systems.

13: Explain Remote Procedure Call (RPC) and its working process with a
suitable diagram. Also, discuss its advantages and disadvantages.
Remote Procedure Call is a communication mechanism that allows a program on
one computer to execute a procedure (or function) on another computer as if it were
a local call. The goal of RPC is to hide the complexities of network communication
(like socket programming, data marshalling) from the programmer, providing a high
degree of access transparency.

Working Process of RPC​


The magic behind RPC is the use of stubs. A client-side stub and a server-side stub
are generated to handle the communication.
1.​ Client Call: The client program makes a normal-looking procedure call to the
client stub.
2.​ Client Stub (Marshalling): The client stub "marshals" the procedure
parameters. This means it packs the parameters into a standardized message
format that can be sent over the network.

10
3.​ Network Communication: The client's operating system sends this message
to the remote server machine.
4.​ Server Stub (Unmarshalling): The message is received by the server's OS
and passed to the server stub. The server stub "unmarshals" the message,
unpacking the parameters.
5.​ Server Procedure Execution: The server stub calls the actual procedure on
the server, passing it the unpacked parameters.
6.​ Return and Marshalling: When the procedure finishes, it returns the result to
the server stub. The server stub marshals the return value into another
message.
7.​ Network Return: The server's OS sends the result message back to the client
machine.
8.​ Client Unmarshalling & Return: The client's OS gives the message to the
client stub, which unmarshals the return value and passes it back to the
original client program.

Diagram of RPC Workflow:

Advantages of RPC:
●​ Simplicity and Transparency: Programmers use a familiar procedure call
syntax, hiding the underlying network communication.
●​ Abstraction: It abstracts away the complexities of data representation and
network protocols.
●​ Efficiency: Can be highly optimized for request-reply interactions.

Disadvantages of RPC:
●​ Limited to Request-Reply: Not suitable for all communication patterns (e.g.,
asynchronous messaging, streaming).
●​ Coupling: The client and server are tightly coupled. A change in the server's
procedure signature often requires the client to be recompiled.
●​ Failure Handling: Handling failures (e.g., a crashed server or lost message)
is more complex than with local calls and must be explicitly programmed.

11
14: Differentiate Remote Procedure Call (RPC) from Message-Oriented
Communication, providing an illustrative example for each.
Feature Remote Procedure Call Message-Oriented Communication
(RPC) (MOM)
1. Synchronous: The client Asynchronous: The client sends a
Communicati calls a function and blocks message and continues its work without
on Style (waits) until it gets a reply. waiting.
2. Coupling Tightly Coupled: The client Loosely Coupled: A message queue
and server must both be acts as a middleman; client and server
active at the same time. can be offline at different times.
3. Addressing Direct: The client Indirect: The client sends a message
communicates directly with a to a named queue, not to a specific
known server address. receiver.
4. Paradigm Like making a direct phone Like dropping a letter in a mailbox;
Analogy call; you wait for the other you post it and walk away, knowing it
person to answer. will be delivered later.
5. Failure If the server is down, the call If the receiver is down, the message is
Handling fails immediately. stored reliably in the queue until the
receiver is available.
Illustrative Examples:
●​ RPC Example: The app directly calls the server to get the account balance
and waits for the reply before showing it.
●​ MOM Example: The site sends the order to a queue and instantly shows a
confirmation; backend services handle the rest later.

15: Differentiate between persistent and transient communication.


Feature Persistent Communication Transient Communication
1. Message The message is stored by The message is discarded if the
Storage the middleware until the receiver is not active and ready to
receiver can accept it. accept it at the time of sending.
2. Receiver The receiver can be offline The receiver must be online and active
Availability when the message is sent. for communication to succeed.
3. Reliability More reliable; guarantees Less reliable; messages can be lost if
eventual delivery even with the receiver is temporarily down.
temporary failures.
4. Typical Message-Oriented TCP-based protocols like RPC and
Implementation Middleware (e.g., Message sockets.
Queues like RabbitMQ).
5. Analogy Sending a letter to a Making a phone call. If no one
mailbox. The letter waits answers, the communication attempt
securely until collected. fails.
12
16: Explain multicast communication and its use cases.
Multicast is a one-to-many communication method where a single message is sent
from one source to multiple specific recipients. It’s more efficient than unicast
(one-to-one) and more targeted than broadcast (one-to-all).
How it Works:
●​ Devices join a multicast group using a special IP address.
●​ The sender sends one packet to that group.
●​ Routers forward it only to networks with group members.
Comparison:
●​ Unicast: One-to-one, inefficient for many recipients.
●​ Broadcast: One-to-all, wastes resources.
●​ Multicast: One-to-many, efficient and targeted.
Use Cases in Distributed Systems:
●​ Replicated Database Updates: Primary server multicasts data changes to all
replica servers at once for synchronization.
●​ Service Discovery: New services send multicast messages to announce their
presence to all listening clients.
●​ Financial Data Distribution: Stock exchanges multicast live price updates to
thousands of traders with low latency.
●​ Live Video/Audio Streaming (IPTV): A single multicast stream delivers live
events (e.g., sports) to all subscribed viewers.
●​ Multiplayer Online Gaming: Game servers multicast state updates (like
player movements) to keep all players in sync.

17: What is the purpose of a naming system? Explain the different entity
naming schemes (e.g., name, address, identifier).
In a distributed system, a naming system is used to uniquely identify and locate
entities such as files, users, services, or servers. Purpose of a Naming System:
1.​ Identification – To uniquely refer to an entity
2.​ Location – To find and communicate with that entity.
3.​ Sharing – To let multiple users/processes access the same entity via a
common name.
4.​ Abstraction – To hide the physical location; the name remains constant even
if the address changes.
Entity Naming Schemes:
1.​ Name (Human-Readable):
○​ A string that users can easily read and remember.
○​ It doesn't reveal location details.
○​ Example: www.example.com, /home/user/file.txt
2.​ Address (Location-Based):
○​ Contains actual location information to access the entity.
○​ Changes if the entity moves.
○​ Example: 192.168.1.100:8080, MAC address
3.​ Identifier (Unique System-Generated):
○​ A unique, location-independent bit string.
○​ Remains constant throughout the entity’s life.
○​ Example: UUID like 123e4567-e89b-12d3-a456-426614174000,
database user ID

13
18: Explain structured naming and attribute-based naming with examples.
Feature Structured Naming (Hierarchical) Attribute-Based Naming (Descriptive)

1. Organizes names in a rigid, Allows entities to be described by an


Organizati tree-like hierarchy. unordered set of attributes (key-value
on pairs).
2. Naming The name itself defines the path The "name" is effectively a query that
Scheme to the entity. describes the properties of the
desired entity.
3. Inflexible. To find something, you Highly flexible. Allows finding entities
Flexibility must know its explicit path. based on their characteristics without
knowing a specific name.

4. Use Locating a known resource Searching for a resource that meets


Case whose path is already certain criteria.
determined.
5. Example File System Path: Printer Search: (Type = "Laser")
/home/user/document.txt<br>DN AND (Location = "2nd
S Name: www.google.com Floor")<br>Book Search: (Author =
"Tanenbaum")

19: Explain the process of name resolution in a distributed system.


Name resolution is the process of translating a human-readable name (e.g.,
www.google.com) into a machine-understandable address (e.g., an IP address).
Because distributed systems are large and spread out, name-to-address mappings
are stored across multiple name servers. The resolution process involves querying
one or more of these servers.
1. Iterative Name Resolution: In iterative resolution, the client does the work of
contacting multiple name servers step by step.
How it Works:
●​ Client asks its local name server.
●​ If the server doesn’t know the answer, it replies with the address of another
name server.
●​ The client then queries the referred server.
●​ This continues until the authoritative name server returns the final address.
Example Flow:
1.​ Client → Local DNS
2.​ Local DNS → Root server
3.​ Root → .com server
4.​ .com → google.com server
5.​ google.com server → IP address
Key Point: Client handles each step by itself.
2. Recursive Name Resolution: In recursive resolution, the client sends one
request and the server resolves everything on its behalf.
How it Works:
●​ Client asks its local DNS for an address (recursively).
14
●​ The local DNS queries other servers down the line (root → TLD →
authoritative).
●​ The result is passed back up the chain to the client.
Example Flow:
1.​ Client → Local DNS
2.​ Local DNS → Root → .com → google.com
3.​ google.com → IP address
4.​ IP sent back to client
Key Point: The server handles all steps; client only contacts one server.
3. Hybrid Approach (Used in DNS)
●​ Client ↔ Local DNS: Recursive
●​ DNS servers ↔ other DNS servers: Iterative
This hybrid model balances performance and scalability by offloading work from
root servers while keeping the process efficient for end-users.

20: What is a logical clock? Distinguish between Lamport's timestamp and


Vector clock.
A logical clock is a mechanism for assigning a number (a "timestamp") to an event in
a distributed system without needing a synchronized physical clock. Its goal is not
to measure real time but to capture the chronological and causal ordering of events.
It is based on the happened-before relationship (->):
Feature Lamport's Timestamp Vector Clock

1. Data A single integer counter per A vector (array) of integers per


Structure process. process, of size N.
2. Causality If A -> B, then C(A) < C(B). The A -> B if and only if VC(A) < VC(B).
reverse is not true. Captures causality perfectly.
3. Cannot definitively identify Can definitively identify concurrent
Concurrency concurrent events. events.
4. Overhead Low overhead. Only one integer is Higher overhead. An entire vector
sent with each message. is sent with each message.
5. Simple to implement. More complex to implement and
Complexity manage.

21: What is an election algorithm? Explain the Bully algorithm and the Ring
algorithm with a suitable diagram showing the election process.
An election algorithm is a procedure used in a distributed system for processes to
agree on a single process that will act as a coordinator or leader. This process is
necessary when the previous coordinator fails or when the system initializes. The
goal is for all working processes to reach a consensus on the new leader. Any active
process can initiate an election.
1. The Bully Algorithm: The Bully algorithm assumes every process knows the ID
of all other processes. The process with the highest ID is always chosen as the
leader.

15
Explanation:
1.​ When a process P detects a coordinator failure, it sends an ELECTION
message to all processes with a higher ID.
2.​ If no higher-ID process responds after a timeout, P declares itself the winner
and sends a COORDINATOR message to all other processes.
3.​ If P receives a RESPONSE from a higher-ID process Q, it stops its own
election effort and waits for Q (or another, even higher-ID process) to
eventually announce victory.
4.​ A process that receives an ELECTION message from a lower-ID process
sends a RESPONSE back and then starts its own election, effectively
"bullying" the lower-ID process out of the race.

2. The Ring Algorithm: The Ring algorithm assumes processes are organized in a
logical ring where each process only knows its immediate successor.

Explanation:
1.​ When a process P detects a coordinator failure, it creates an ELECTION
message containing its own ID and sends it to its successor.
2.​ Each subsequent process that receives the message adds its own ID to the
list in the message and forwards it to its successor.
3.​ The message circulates the entire ring until it returns to the process that
started it (P).
4.​ At this point, P has a complete list of all active processes. It elects the process
with the highest ID from this list as the new coordinator.
5.​ P then circulates a final COORDINATOR message around the ring to inform all
other processes who the winner is.
16
22: Why is Clock Synchronization Needed in a Distributed System? Different
Between the Berkeley Algorithm and Network Time Protocol (NTP).
Clock Synchronization is crucial in distributed systems to ensure consistency,
coordination, and correctness across multiple nodes. Without synchronized clocks,
systems may face issues like:
1.​ Event Ordering: Logical ordering of events becomes difficult (e.g., in
distributed transactions or logs).
2.​ Consistency: Databases or caches may return inconsistent results due to
timestamp mismatches.
3.​ Timeout Handling: Distributed algorithms (e.g., leader election, failure
detection) rely on accurate timeouts.
4.​ Debugging: Logs from different machines are hard to correlate without a
common time reference.
Feature Berkeley Algorithm Network Time Protocol (NTP)
1. Primary Internal Consistency: Makes all External Accuracy: Synchronizes
Goal clocks in a group agree with each clocks to the official world time
other. (UTC).
2. Centralized: A master polls Hierarchical: Servers are organized
Architecture slaves and computes an average. in layers (strata) based on accuracy.
3. Time None required. The average time Relies on reference clocks (e.g.,
Source of the group becomes the atomic, GPS) at the top of the
standard. hierarchy.
4. Method Master sends an adjustment Client calculates its precise offset
value (e.g., "+5ms") based on the and network delay using a
group average. four-timestamp exchange.

5. Low.Best for small, local networks High. Designed to scale globally for
Scalability (e.g., a single server cluster). the entire internet.

23: What is mutual exclusion and how is it maintained in distributed


environments?
Mutual exclusion is a fundamental property in concurrent programming that ensures
no two concurrent processes can be in their critical section at the same time.
How is it Maintained in Distributed Environments?
Maintaining mutual exclusion in a distributed system is significantly more
challenging than in a centralized one because there is no shared memory or global
clock to coordinate access. There are three main approaches:
1. Centralized Algorithm: This approach uses a single, designated process as a
coordinator to manage access to the critical section.
●​ How it works:
1.​ To enter a critical section, a process sends a REQUEST message to the
coordinator.
2.​ If the critical section is free, the coordinator sends back a GRANT
message, giving permission.
3.​ If the critical section is occupied, the coordinator queues the request and
does not reply until the resource is free.
17
4.​ When a process exits the critical section, it sends a RELEASE message
to the coordinator, which then grants permission to the next process in
its queue.
●​ Pros/Cons: It's simple but creates a single point of failure and a performance
bottleneck.
2. Distributed (Token-Based) Algorithm: This approach avoids a central
coordinator by passing a special message, called a token, among the processes.
Only the process currently holding the token is allowed to enter the critical section.
●​ How it works (Example: Ring Algorithm):
1.​ Processes are organized in a logical ring.
2.​ A single token is initialized and given to one process.
3.​ The token is continuously passed around the ring from one process to its
successor.
4.​ A process wanting to enter its critical section waits until it receives the
token. It then holds onto the token, enters the critical section, and only
passes the token along after it has exited.
●​ Pros/Cons: It is fair and avoids starvation, but recovering a lost token or
handling a crashed process can be complex.
3. Distributed (Permission-Based) Algorithm: This approach is fully decentralized
and requires a process to get permission from all other processes (or a majority)
before entering its critical section.
●​ How it works (Example: Ricart & Agrawala's Algorithm):
1.​ A process P wanting to enter its critical section sends a timestamped
REQUEST message to all other processes.
2.​ A process Q that receives the request sends back a REPLY immediately
if it is not in or requesting entry to the critical section itself. If it is also
requesting, it compares timestamps to decide who has priority.
3.​ Process P can only enter its critical section after it has received a
REPLY from every other process in the system.
4.​ After exiting, P sends REPLY messages to any requests it had deferred.
●​ Pros/Cons: It is fully decentralized with no single point of failure, but it
generates very high message traffic and is vulnerable to the failure of any
single process.

24: Explain gossip-based coordination.


Gossip-based coordination (also called epidemic protocols) is a decentralized
method for spreading information across nodes in a distributed system, similar to
how rumors or diseases spread in a population.
1.​ Core Mechanism:
○​ Initiation: A node gets new information (e.g., an update).
○​ Periodic Gossiping: It randomly picks other nodes to share the info
with.
○​ Information Exchange: The chosen nodes receive the info and start
spreading it too.
○​ Termination: Gossiping stops once most or all nodes have the
information.
2.​ Key Characteristics:
○​ Scalable: Each node contacts only a few peers, keeping overhead low.
○​ Fault-tolerant: Works even if some nodes fail or messages are lost.
○​ Simple: Nodes follow basic logic—pick a peer, share info.

18
○​ Eventually Consistent: All nodes will get the data over time, but exact
timing is not guaranteed.
3.​ Use Cases:
○​ Database Replication (e.g., Amazon DynamoDB
○​ Failure Detection (nodes gossip about alive/dead nodes
○​ Cluster Membership (sharing list of active nodes
○​ Data Aggregation (e.g., average load, max value

25: What is the role of a coordinator in a distributed system?


The coordinator in a distributed system is a designated process that manages or
synchronizes other processes. Key roles include:
●​ Granting Mutual Exclusion: Acting as a gatekeeper to ensure only one
process can access a shared resource at a time.
●​ Driving Transaction Commitment: Deciding whether a distributed
transaction should be globally committed or aborted, as seen in the
Two-Phase Commit protocol.
●​ Leading Process Groups: Serving as the primary point of contact and
assigning tasks within a hierarchical group.
●​ Making Global Decisions: Aggregating information from other processes to
make a system-wide decision.
●​ Acting as a Synchronization Point: Forcing other processes to wait at a
certain point in an algorithm until all have reached it.

26: What is replication and why is it used? Explain the different data-centric
consistency models.
Replication is the process of creating and maintaining multiple copies of data or
resources on different computers in a distributed system. These copies are called
replicas.
Why is Replication Used?
1.​ Increased Reliability and Availability: If one server holding a replica fails,
the system can continue to operate by using other available replicas. This
makes the system more fault-tolerant and highly available.
2.​ Improved Performance: By placing replicas closer to the users who access
them, replication can significantly reduce data access latency. For example, a
user in Asia can access a replica on a server in Asia instead of one in North
America. It also allows for load balancing, as client requests can be distributed
among multiple replica servers.
Data-Centric Consistency Models​
The main challenge with replication is keeping the replicas consistent. If data is
written to one replica, how and when do the other replicas get updated?
Consistency models are contracts between the data store and the application that
define the rules for the consistency of data.
Here are the most common data-centric models, ordered from strongest to weakest:
1.​ Strict Consistency: All reads return the result of the most recent write
instantly, requiring a perfect global clock—making it theoretically strong but
impractical in real-world systems.Used where absolute correctness is critical,
though rarely feasible in practice.
2.​ Sequential Consistency: All processes see operations in the same
sequential order, preserving each process’s program order, though the order

19
may not reflect actual real-time execution. Provides a balance between
usability and enforceability without needing global time.
3.​ Causal Consistency: Causally related writes (e.g., a reply following a
message) must be seen in the correct order by all processes, while
concurrent, unrelated operations can be observed in different orders. Common
in collaborative applications where operation dependencies matter.
4.​ Eventual Consistency: The weakest model, where all replicas eventually
converge to the same value if no new updates occur; allows temporary
inconsistencies and favors availability over immediate accuracy. Ideal for
large-scale, fault-tolerant systems like DNS or NoSQL databases.

27: Explain client-centric consistency models and Difference between


Client-Centric and Data-Centric Models
Client-centric consistency models ensure a consistent view of data for each client,
even when accessed from different devices or locations over time. They focus on
making the user experience smooth by avoiding outdated or missing data during
repeated access.
Common Client-Centric Models:
1.​ Monotonic Reads: Once a client reads a certain version of data, all future
reads by that client will return the same or a newer version, preventing the
client from seeing older, outdated data.
2.​ Monotonic Writes: A client’s write operations on the same data are
guaranteed to be executed in the order they were issued, ensuring updates
don’t get applied out of sequence.
3.​ Read Your Writes: After a client writes (updates) data, any subsequent read
by the same client will always reflect that update, so users see their own
changes immediately.
4.​ Writes Follow Reads: When a client writes data after reading it, the write is
guaranteed to be based on the latest version the client has seen, preserving
the logical flow of operations.
Feature Data-Centric Models Client-Centric Models
1. Focus Guarantees consistency of the Guarantees consistency from the
data store as a whole, for all perspective of a single, individual
clients. client.
2. Scope Provides a global guarantee Provides a local guarantee for a
across multiple, concurrent users. specific client's session over time.
3. Main "What is the most recent value "What is a client guaranteed to see,
Question any client is allowed to read?" based on its own past actions?"
4. Use Ensuring system-wide data Providing a sensible, non-confusing
Case integrity (e.g., in a distributed user experience (e.g., in a mobile
database). app).
5. Sequential Consistency, Eventual Monotonic Reads, Read Your Writes.
Example Consistency.
Model

20
28: Difference between continuous consistency and sequential consistency.
Feature Sequential Consistency Continuous Consistency
1. Core A strict ordering model that A flexible framework for measuring
Concept defines a correct sequence of and bounding the level of data
operations. inconsistency.
2. Type of Guarantees that all processes Guarantees that data will not deviate
Guarantee see the same single, global from a perfectly consistent state
order of all operations. beyond a specified tolerance.
3. Binary: A system is either Quantitative: Measures
Measurement sequentially consistent or it is inconsistency by numerical value,
not. staleness (time), and order.
4. Flexibility Rigid. It is an all-or-nothing Tunable. Developers can specify the
property with no room for exact level of inconsistency their
negotiation. application can tolerate.
5. Typical Systems where a predictable, Large, highly available systems where
Use Case global order is critical (e.g., some data staleness is acceptable
simple distributed locks). (e.g., online gaming, caching).

29: What is replica management and why is it important?

Replica management is the process of deciding where, when, and how to create
and maintain replicas (copies) of data or services in a distributed system. It involves
a set of policies and mechanisms that govern the lifecycle of replicas.

Why is it Important?​
Effective replica management is crucial for achieving the primary goals of
replication: performance and reliability. Poor management can negate the benefits
and even introduce new problems. Key decisions in replica management include:
1.​ Replica Placement: Deciding where to place the replicas.
○​ Goal: To improve performance by reducing latency for users and to
increase fault tolerance by placing replicas in different physical locations
(different racks, data centers, or continents) to avoid a single point of
failure.
○​ Types of Placement:
■​ Permanent Replicas: The initial, fixed set of replicas of a data
store (e.g., a database cluster with 3 servers).
■​ Server-Initiated Replicas: A server dynamically creates a replica
of a heavily accessed file on another server to balance the load.
■​ Client-Initiated Replicas (Caching): A client creates a temporary,
local copy (a cache) of data for fast, repeated access.
2.​ Content Replication and Update Propagation: Deciding what to replicate
and how to spread updates.
○​ Goal: To keep replicas consistent without overwhelming the network.
○​ Key Decisions:

21
■​ Push vs. Pull: Should the server push updates to the replicas as
they happen (active), or should replicas periodically pull updates
from the server (passive)?
■​ State vs. Operations: Should the entire modified data item (state)
be transferred, or only the operation that was performed
(operation)? Transferring operations is often more efficient. For
example, instead of sending the entire new bank balance, just
send the operation DEPOSIT($100).
Proper replica management ensures that the system is both fast and resilient. For
instance, placing replicas too close together compromises fault tolerance, while
placing them too far apart can increase the latency of keeping them synchronized.

30: What are the different types of failures that can occur in a distributed
system? Explain the methods used to recover from a crash.

Types of Failures in a Distributed System


1.​ Crash Failure (Fail-Stop): A process or server halts and stops responding
entirely. It's easy to detect and recover from. Example: A server shuts down
due to power loss and doesn’t respond to any request.
2.​ Omission Failure: A process fails to send or receive messages, often due to
lost packets or buffer overflows. Example: A network packet is dropped,
causing the receiver to miss an update.
3.​ Timing Failure: A response is given but outside the expected time window,
usually caused by delays or overload. Example: A system expected a reply
within 2 seconds but received it after 5 seconds.
4.​ Response Failure (Value Failure): A process replies with incorrect data due
to logic errors or data corruption. Example: A database returns the wrong
balance due to memory corruption.
5.​ Byzantine Failure: The most complex failure where a process behaves
arbitrarily or maliciously—e.g., sending inconsistent messages to peers.
Example: A faulty node sends “yes” to one server and “no” to another for the
same request.

Methods of Recovery from Crash Failures


1.​ Stateless Server Recovery: Simply restart the server. Since no client-specific
data is stored, clients can resend requests without issue. Used in systems like
web servers where each request is independent and retryable.
2.​ Stateful Server Recovery (Checkpointing & Logging):
i.​ Checkpointing: Restore the last saved snapshot of the server’s
state. Captures the system's state at regular intervals for quick
recovery.
ii.​ Logging: Replay recent operations from the log to update the
state fully. Ensures no data is lost between the last checkpoint and
crash.
3.​ Redundancy and Failover: Use backup servers with failure detection (e.g.,
heartbeats). If the primary crashes, a standby server takes over immediately to
maintain availability. Widely used in high-availability systems like banking,
cloud services, and telecom.

22
31: Define distributed commit. Explain the two-phase commit (2PC) protocol in
detail with a diagram.

A distributed commit is the process that ensures a transaction spanning multiple


systems is atomic. This means it guarantees that the transaction is either
committed (made permanent) on all participating systems or aborted (undone) on
all of them, preventing an inconsistent state where the transaction succeeds on
some systems and fails on others.

Explain the Two-Phase Commit (2PC) Protocol


2PC is an algorithm that coordinates a distributed commit using a coordinator
process and multiple participant processes. It works in two phases:

Phase 1: The Voting Phase (The "Prepare" Phase)


1.​ Request: The coordinator sends a VOTE-REQUEST message to all
participants.
2.​ Voting: Each participant checks if it can commit.
○​ If yes, it writes a PREPARE record to its log, becomes "prepared," and
sends a VOTE-COMMIT back.
○​ If no, it sends VOTE-ABORT.

Phase 2: The Decision Phase (The "Commit" Phase)


1.​ Decision: The coordinator makes the final decision.
○​ It decides to commit only if it received VOTE-COMMIT from all
participants.
○​ It decides to abort if even one participant sent VOTE-ABORT or timed
out.
2.​ Broadcast: The coordinator sends this final decision (GLOBAL-COMMIT or
GLOBAL-ABORT) to all participants.
3.​ Completion: Participants execute the decision—either making their changes
permanent (commit) or undoing them (abort)—and send an ACK back to the
coordinator.

23
32: What is a fault and an error? Differentiate between them.
Faults and errors are related but distinct concepts in the study of system reliability.
They represent different stages in the chain of events leading to a system failure.
●​ Fault: A fault is the underlying cause of a problem. It is a defect or flaw in a
hardware or software component. It can be a bug in the code, a faulty
hardware component, or a design mistake. A fault is static and may exist in the
system for a long time without causing any issue.
●​ Error: An error is a part of the system's state that is incorrect and may lead to
a failure. An error is the manifestation of a fault. When a fault is activated (e.g.,
a buggy piece of code is executed), it produces an error in the system's state.
●​ Failure: A failure is the observable deviation of a system from its specified
behavior. It is what the user sees when the system does not perform its
function correctly. A failure occurs when an error in the system's state
propagates to the output.
Example Chain:
1.​ Fault: A programmer writes x = y / 0; in the code. This is the defect.
2.​ Error: When this line of code is executed, the system's state becomes
erroneous (e.g., a "division by zero" exception is raised).
3.​ Failure: The program crashes or displays an error message to the user, which
is a deviation from its specified behavior.
Feature Fault Error
1. The root cause of a potential An incorrect state within the system
Definition problem (a defect). caused by a fault.
2. Nature It is a static condition or flaw in It is a dynamic, incorrect state that the
a component. system enters.
3. Analogy A disease or virus present in The symptoms of the disease (e.g.,
the body. fever).
4. Example A bug in the source code; a A variable having a wrong value; a
loose wire. corrupted packet.
5. Causal A fault causes an error. An error is caused by a fault and
Relation causes a failure.
24
33: Explain reliable client-server communication.
Reliable client-server communication aims to guarantee that a client's request is
processed by a server exactly once, and a reply is successfully returned, even in the
presence of network failures or server crashes. This is primarily achieved by
systematically handling three main problems.
1. Handling Lost Messages
●​ Problem: A request or reply message can be lost in the network. The client
will wait for a response that never comes.
●​ Solution: The client sets a timeout after sending a request. If no reply arrives
within this period, it assumes the message was lost and retransmits the
original request.
2. Handling Duplicate Requests
●​ Problem: Retransmission can lead to duplicate processing. If the server
successfully processed the first request but its reply was lost, the server will
receive the same request again. For non-idempotent operations (like
add_to_balance(100)), this can lead to incorrect results.
●​ Solution: The server must be able to detect duplicates. This is done by having
the client include a unique request ID with every message. The server
maintains a history of recently processed IDs. If a request arrives with an ID
that has already been processed, the server does not re-execute the
operation; instead, it simply re-sends its stored reply.
3. Handling Server Crashes
●​ Problem: A server can crash at any point, leaving the client uncertain about
the state of its request (was it started, partially completed, or finished?).
●​ Solution: The system provides different levels of guarantees, known as
execution semantics, to manage this uncertainty:
○​ At-Least-Once: Retries until success; may run the operation multiple
times. Safe only for idempotent actions.
○​ At-Most-Once: Ensures no duplicate execution; operation might be lost
if failure occurs.
○​ Exactly-Once: Guarantees the operation runs once and only once using
persistent logs or transactions—ideal but complex.

34: Explain Authentication and Authorization in a distributed system.


Difference between Authentication and Authorization
1. Authentication: "Who are you?"​
Authentication is the process of verifying the identity of a user, process, or device. It
answers the question, "Are you really who you claim to be?". The goal is to prevent
unauthorized entities from impersonating legitimate ones.
Common Authentication Methods
●​ Something you know: Passwords or PINs; simple but vulnerable.
●​ Something you have: Devices like tokens or phones for OTPs.
●​ Something you are: Biometrics such as fingerprints or facial scans.
●​ Multi-Factor Authentication (MFA): Combines two or more methods for
enhanced security.
Challenges in Distributed Systems:​
An authentication protocol must be secure against network threats. For example,
sending a password in plain text is insecure because an attacker could eavesdrop
on the network (a "man-in-the-middle" attack). Secure protocols like Kerberos or
protocols using public-key cryptography (e.g., TLS/SSL) are used to perform
authentication without exposing credentials.
25
2. Authorization: "What are you allowed to do?"​
Authorization is the process of determining whether an authenticated user has the
permission to perform a specific action or access a particular resource. It happens
after successful authentication. It answers the question, "Now that I know who you
are, what are your privileges?".
Common Authorization Mechanisms
●​ Access Control Lists (ACLs): Resource-based; each resource stores a list of
users and their allowed actions (e.g., read/write).
●​ Capability Lists: User-based; each user holds a list of access rights (tokens)
to specific resources.
●​ Role-Based Access Control (RBAC): Role-based; users are assigned roles,
and roles have defined permissions—simplifies large-scale permission
management.

The Relationship:​
Authentication always comes before Authorization. You cannot grant access rights
to someone until you have reliably confirmed their identity.
Feature Authentication Authorization
1. To verify identity. To grant or deny permissions.
Purpose
2. "Who are you?" "What are you allowed to do?"
Question
3. Validates user credentials Checks user privileges against an
Process (password, token, biometric). access control policy (ACL, RBAC).
4. Timing The first step in the security The second step, performed after
process. successful authentication.
5. Output A decision of "Valid Identity" or A decision of "Access Granted" or
"Invalid Identity". "Access Denied".

35: Describe the key security challenges in a distributed system, including


concepts like secure channels and access control.
Security in distributed systems is challenging because communication occurs over
public networks and there is no central point of trust. This exposes the system to
several key threats.
Key Security Challenges:
●​ Eavesdropping (Confidentiality Attack): An unauthorized party intercepts
messages on the network to read sensitive data.
●​ Tampering (Integrity Attack): An attacker intercepts and modifies messages
in transit. For example, changing the amount in a financial transaction.
●​ Impersonation/Spoofing (Authentication Attack): A malicious process
pretends to be a legitimate client or server to gain unauthorized access or trick
users.
●​ Denial of Service (DoS) (Availability Attack): An attacker floods a server
with requests, overwhelming it and making it unavailable to legitimate users.
To combat these challenges, distributed systems rely on fundamental security
mechanisms like authentication, secure channels, and access control.
26
Core Security Concepts:
1.​ Authentication: The process of verifying the identity of a user or process. It
answers the question, "Are you who you say you are?" This is the first line of
defense, often done using passwords, digital certificates, or Kerberos.
2.​ Secure Channels: A communication link protected against both
eavesdropping and tampering. A secure channel provides:
○​ Confidentiality: Achieved through encryption (e.g., using SSL/TLS).
Message content is scrambled so only the intended recipient with the
correct key can read it.
○​ Integrity: Achieved using digital signatures or Message
Authentication Codes (MACs). These techniques ensure that
messages have not been altered during transit.
3.​ Access Control (Authorization): The process of determining if an
authenticated user has the permission to perform a specific action on a
resource. It answers the question, "What are you allowed to do?" Common
implementations include:
○​ Access Control Lists (ACLs): A list attached to an object (e.g., a file)
specifying which users have which permissions (read, write, execute).
○​ Capabilities: A token given to a user that acts as unforgeable proof of
their rights to access a specific resource.

36: Write short notes on:

1. Message Passing Interface (MPI)


●​ MPI is a standardized communication protocol used in parallel and distributed
computing.
●​ It allows multiple processes to communicate with one another by sending and
receiving messages.
●​ Common in high-performance computing (HPC) environments.
●​ Supports functions like point-to-point communication and collective
communication.
●​ Example: Used in supercomputers for large scientific simulations.

2. Access Control Matrix


●​ A security model that defines permissions for each user (subject) on every
resource (object).
●​ Represented as a table: rows are users, columns are resources, and cells
show access rights (read, write, execute).
●​ Helps enforce security policies in operating systems and distributed
environments.
●​ Can be implemented using Access Control Lists (ACLs) or Capability Lists.

3. Denial of Service (DoS) Attacks


●​ A type of cyberattack that floods a system or network with excessive traffic to
make it unavailable to users.
●​ Goal is to exhaust resources like CPU, memory, or bandwidth.
●​ Variants include Distributed DoS (DDoS) where multiple machines launch the
attack simultaneously.
●​ Defense: firewalls, rate limiting, intrusion detection systems (IDS).

27
4. Firewall
●​ A network security device or software that monitors and controls incoming and
outgoing traffic based on security rules.
●​ Acts as a barrier between trusted internal networks and untrusted external
networks (like the internet).
●​ Can block unauthorized access while allowing legitimate communication.
●​ Types: Packet-filtering firewall, stateful firewall, proxy firewall.

5. Secure Naming
●​ Refers to the process of securely mapping names (like URLs or hostnames) to
network addresses.
●​ Prevents attacks like DNS spoofing or man-in-the-middle attacks.
●​ Secure naming systems use cryptographic methods to ensure authenticity and
integrity.
●​ Example: DNSSEC (Domain Name System Security Extensions) adds
security to DNS.

6. Public Key Cryptography (Asymmetric Cryptography)


●​ A type of cryptography using a pair of keys: public key (shared) and private
key (kept secret).
●​ Used for secure communication, encryption, and digital signatures.
●​ Public key is used to encrypt, and only the corresponding private key can
decrypt.
●​ Example algorithms: RSA, ECC.
●​ Widely used in HTTPS, email security, and digital certificates.

7. Idempotent Operations
1.​ Definition: An operation is idempotent if repeating it multiple times produces
the same result as performing it just once.
2.​ Core Principle: f(f(x)) = f(x). Applying the function again does not change the
outcome.
3.​ Importance in DS: Crucial for fault tolerance. It allows clients to safely
retransmit requests after a timeout without fear of corrupting data (e.g., if a
server's reply was lost).
4.​ Examples: Setting an absolute value (SET x = 10), reading a piece of data, or
deleting a specific record by its unique ID.
5.​ Non-Examples: Appending data to a log file or incrementing a counter (x = x
+ 1), as these operations change the state with each execution.

8. Distributed Hash Table (DHT)


1.​ Definition: A decentralized, distributed system that partitions ownership of
(key, value) pairs among a set of participating nodes (peers).
2.​ Functionality: Provides a lookup service similar to a traditional hash table,
allowing efficient storage and retrieval of values based on their keys.
3.​ Mechanism: Uses a consistent hash function to map a key to a peer that is
responsible for it. The system maintains routing information to find the correct
peer.
4.​ Key Properties: Highly scalable, decentralized (no central point of failure),
and self-organizing as peers join and leave the network.
5.​ Use Cases: A foundational technology for many P2P systems, including
BitTorrent (for peer discovery) and distributed databases like Amazon
DynamoDB.
28
9. Content Delivery Network (CDN)
1.​ Definition: A geographically distributed network of proxy servers designed to
deliver web content to users with high availability and low latency.
2.​ Mechanism: Caches copies of static content (images, CSS, video) on servers
located in various "Points of Presence" (PoPs) worldwide.
3.​ Core Function: When a user requests content, the CDN redirects the request
to the server in the PoP that is geographically closest to them.
4.​ Primary Benefits:
○​ Reduces latency (improves speed).
○​ Increases availability and fault tolerance.
○​ Balances the load on the origin server.
5.​ Relevance: An essential component of the modern internet, used by nearly all
major websites and streaming services to ensure a fast user experience
globally.

10. Kerberos
1.​ Definition: A network authentication protocol that uses a trusted third party,
the Key Distribution Center (KDC), to provide secure identity verification for
users and services.
2.​ Core Concept: Works on the basis of "tickets" to avoid sending passwords
over the network. It provides a single sign-on (SSO) experience.
3.​ Key Components:
○​ Authentication Server (AS): Verifies the user's identity initially.
○​ Ticket-Granting Server (TGS): Issues temporary tickets for specific
services.
4.​ Workflow: A user gets a main Ticket-Granting Ticket (TGT) once, then uses it
to request temporary service tickets for any resource they need to access,
without re-entering their password.
5.​ Use Case: The standard for authentication in trusted, managed enterprise
networks, such as Windows Active Directory domains.

11. Location Systems


1.​ Definition: A location system is a distributed system designed to track the
physical location of mobile entities, such as users, devices, or assets, in
real-time.
2.​ Goal: To enable location-aware services by providing applications with the
current geographical position of entities. This bridges the gap between the
digital and physical worlds.
3.​ Key Components:
○​ Positioning System: The underlying technology that determines
physical coordinates (e.g., GPS for outdoors, Wi-Fi/Bluetooth
triangulation for indoors).
○​ Tracking Infrastructure: A network of servers that collect, store, and
process location data.
4.​ Architectural Challenge: A major challenge is balancing scalability and
privacy. A centralized server is a bottleneck and a privacy risk, while
decentralized approaches (e.g., peer-to-peer tracking) are more complex.
5.​ Use Cases: "Find my nearest..." services (ATMs, restaurants), asset tracking
in a warehouse, fleet management for delivery trucks, and location-based
advertising.

29
12. Distributed Event Matching
1.​ Definition: A system where clients (subscribers) express interest in complex
patterns of events, and a network of brokers matches sequences of events
published by other clients (publishers) against these patterns.
2.​ Core Functionality: It goes beyond simple "publish/subscribe." Instead of
subscribing to a single topic (e.g., "stock_prices"), a subscriber might define a
pattern like: "Notify me if Stock A drops 5% AND Stock B rises 3% within 10
seconds."
3.​ Mechanism: It involves a broker network that filters, aggregates, and
correlates events from multiple sources to detect when a composite event
pattern has occurred.
4.​ Challenges: Efficiently matching millions of complex patterns against a
high-volume stream of events in a distributed, low-latency manner.
5.​ Use Cases: Algorithmic trading in finance, real-time network intrusion
detection, supply chain monitoring (e.g., tracking a package through multiple
checkpoints), and IoT sensor data analysis.

13. Caching and Replication in the Web


1.​ Definition: The specific application of caching and replication strategies to
enhance the performance, scalability, and availability of the World Wide Web.
2.​ Goal: To reduce user-perceived latency by placing content closer to the user
and to decrease the load on origin web servers.
3.​ Key Technologies:
○​ Browser Cache: Stores copies of web assets (images, CSS) on the
user's local machine.
○​ Web Proxies: Intermediary servers (at an ISP or company) that cache
popular content for a group of users.
○​ Content Delivery Networks (CDNs): A global network of replica
servers that cache content near major population centers.
4.​ Consistency Management: Caches must be kept fresh. This is managed
using HTTP headers like Cache-Control (which defines a Time-To-Live or TTL)
and ETag (which allows a cache to validate if its version is still current without
re-downloading the whole file).
5.​ Impact: This is a fundamental reason the modern web is fast and can handle
billions of users. It is a massive, practical implementation of distributed
caching.

30

You might also like