Ds Pyqs
Ds Pyqs
1 (a, b, c, d)
(a) What is Distributed System?
A distributed system is a collection of independent computers that appear to users as a single
coherent system. These systems work together to achieve a common goal by communicating
over a network.
(d) Explain the Remote Procedure Call (RPC) mechanism with various functional
components.
RPC is a communication mechanism that allows a program to execute a procedure on a remote
system as if it were local.
1. Client Stub:
– The client-side representation of the procedure that marshals arguments into a
message to send to the server.
2. Server Stub:
– Receives the request on the server-side, unpacks the arguments, and invokes the
appropriate procedure.
3. Communication Module:
– Handles message transmission between the client and server.
4. Dispatcher:
– Maps the received request to the correct procedure on the server.
5. Binding:
– Establishes the connection between the client and server before communication.
1. The client invokes the remote procedure via the client stub.
2. The client stub marshals the request into a message and sends it to the server.
3. The server stub unpacks the message and executes the procedure.
4. Results are marshaled back into a message and sent to the client.
5. The client stub unpacks the result and provides it to the client.
1. Transparency: Hides details like file location, replication, and access mechanisms from
users.
2. Scalability: Can scale by adding more nodes and storage to handle growing data and
users.
3. Fault Tolerance: Provides mechanisms like data replication and redundancy to ensure
availability even in case of failures.
4. Consistency: Ensures that changes to files are synchronized across all replicas.
5. Performance: Distributes load and uses caching to improve access speeds.
• Thread: A thread is the smallest unit of execution within a process. Multiple threads
can exist within a single process, sharing the same memory space but able to
execute independently. Threads are often used to perform tasks concurrently within
a process.
(c) Differentiate between Network OS and Distributed OS.
Aspect Network OS Distributed OS
Definition A network operating system allows A distributed operating system
multiple computers to integrates multiple computers into a
communicate and share resources, unified system where users and
but each system operates applications interact as if they are
independently. running on a single system.
Resource Each machine manages its Resources (CPU, memory, etc.) are
Management resources independently, and the centrally managed across all nodes,
OS provides mechanisms for inter- making them appear as a single
machine communication. system.
Transparency Limited transparency, as each High transparency, making it seem
machine's resources are not fully like a single system to the user, even
abstracted. if resources are distributed.
Fault Tolerance Limited, relies on network High fault tolerance, with distributed
protocols and redundancy at the management and replication
application level. mechanisms to handle failures.
Examples Windows Network, UNIX-based Google’s GFS (Google File System),
networking systems. Hadoop Distributed File System
(HDFS).
(d) What are the requirements of a Distributed File System? Give the design issues
of Distributed File System.
Requirements of a Distributed File System (DFS):
1. Transparency: The system should hide the distribution and replication details from the
user. It should provide access transparency, location transparency, and replication
transparency.
2. Fault Tolerance: The system should handle failures and provide recovery mechanisms,
ensuring data is always accessible even if parts of the system fail.
3. Concurrency Control: Multiple clients may access the same file simultaneously. The DFS
should ensure consistent access to avoid conflicts.
4. Security: Access control should be implemented to ensure that unauthorized users do
not access sensitive data.
5. Scalability: The system should be scalable to accommodate increasing data and user
requests without significant degradation in performance.
6. Performance: Optimizing data access speed and minimizing latency, through
mechanisms like caching and load balancing.
7. Consistency: The system must maintain consistency across replicas and ensure that
updates to files are synchronized properly across different nodes.
1. File Replication: Deciding how to replicate files across nodes to ensure fault tolerance
while minimizing overhead.
2. Consistency Models: Choosing the right consistency model (e.g., strong consistency vs.
eventual consistency) depending on the application requirements.
3. Distributed File Access: Ensuring that multiple clients can access and modify files
concurrently without causing conflicts.
4. Fault Tolerance and Recovery: Implementing mechanisms like checkpoints, logs, and
replica consistency to handle node failures.
5. Performance Optimization: Balancing load between nodes, minimizing network traffic,
and using caching effectively.
6. Security: Implementing proper authentication and access control to secure data in a
distributed environment.
1. Layered Architecture: The system can be divided into layers where each layer is
responsible for a particular functionality. For example:
– Hardware Layer: Includes the physical machines.
– Operating System Layer: Manages resources like CPU, memory, and I/O.
– Middleware Layer: Provides services like communication, synchronization, and
file management.
2. Client-Server Model: Distributed OS can implement a client-server architecture where
client processes request services from servers.
3. Multitier Architecture: Distributed OS may have multiple tiers for different
functionalities like presentation, application, and data storage.
1. Transparency: Hides the complexity of the underlying distributed system from users.
– Access Transparency: Users do not need to know the location of resources.
– Failure Transparency: The system should mask faults and continue to provide
services even in the event of failures.
2. Concurrency: Supports concurrent access to resources by multiple processes, ensuring
safe and synchronized access.
3. Scalability: The system should scale seamlessly as new nodes are added, maintaining
performance and fault tolerance.
4. Fault Tolerance: It must recover from hardware or software failures and continue
providing services with minimal disruption.
5. Resource Management: Efficient allocation and management of resources across
distributed nodes, often with the help of scheduling and load balancing.
These answers should provide a comprehensive overview of Q.2 (a, b, c, d). Let me know if you
need any further details!
• Content Delivery Networks (CDNs): Use overlays to deliver content more efficiently by
choosing optimal paths.
• Peer-to-Peer Networks: Use overlays for routing between peers based on logical
connections, independent of the underlying physical network.
Advantages:
Disadvantages:
• Trace Collection: Collecting logs or traces from multiple nodes for post-mortem
analysis.
• Snapshot Debugging: Taking consistent snapshots of the system at various points to
analyze the state.
• Remote Debugging: Allowing the debugging of processes running on remote nodes via a
debugger interface.
Advantages:
1. Flexibility: Unstructured overlays do not have strict routing protocols, which makes
them easier to design and adapt to changing network conditions.
2. Scalability: These networks can scale easily, as they do not require centralized control or
predefined structures.
3. Fault Tolerance: If a node fails, other nodes can continue to operate without significant
disruption.
4. Simplicity: Implementation is relatively simple compared to structured overlays like
DHTs (Distributed Hash Tables).
5. Decentralization: There is no central point of control, reducing the risk of a single point
of failure.
Examples:
• Gnutella network.
• Napster (early versions) used unstructured overlays.
(d) What is Distributed Mutual Exclusion? What are the three basic approaches for
implementing Distributed Mutual Exclusion?
Distributed Mutual Exclusion ensures that when multiple processes or nodes in a distributed
system need to access a shared resource, only one process can access it at a time. It prevents
race conditions and ensures data consistency across the system.
1. Centralized Approach:
– A single node (usually a coordinator) is responsible for granting permission to
access the shared resource.
– Advantages: Simple to implement.
– Disadvantages: The coordinator can become a bottleneck and a single point of
failure.
2. Token-based Approach:
– A unique token circulates among the nodes. Only the node that holds the token
can access the shared resource.
– Advantages: No need for a central coordinator; reduces the message complexity.
– Disadvantages: If the token is lost, the system can become deadlocked until
recovery.
3. Quorum-based Approach:
– A quorum (a set of nodes) is required to grant access to the shared resource. Each
node must request permission from a majority of nodes in the system.
– Advantages: Can be more fault-tolerant as multiple nodes can grant access.
– Disadvantages: Requires more communication overhead, especially in large
systems.
OR: What are the differences between the Agreement Problem and
the Consensus Problem?
Agreement Problem and Consensus Problem are related to decision-making in distributed
systems, where multiple nodes need to agree on a single value.
1. Agreement Problem:
– The agreement problem is concerned with ensuring that multiple processes in a
distributed system agree on a common decision or value.
– It only ensures that all processes agree on a single value, but it doesn't guarantee
that the value is correct or valid.
– In some systems, there may be scenarios where a correct decision cannot always
be reached, such as in the presence of faulty or malicious processes.
2. Consensus Problem:
– The consensus problem is a more stringent version of the agreement problem. It
requires that all processes agree on a single value, and the value must be valid
and consistent, even in the presence of faulty or crashed processes.
– It ensures that:
• Validity: The value agreed upon is one that was proposed by some
process.
• Uniqueness: All processes agree on the same value.
• Fault Tolerance: The system can handle up to a certain number of faulty
processes while still reaching consensus.
Key Differences:
These answers should provide a thorough understanding of the questions. Let me know if you
need further clarification!
Q.4 (a) What is distributed deadlock?
Distributed deadlock occurs in a distributed system when two or more processes wait
indefinitely for resources that are held by each other, creating a circular wait condition. Since the
system is distributed, detecting and resolving such deadlocks is more complex than in
centralized systems.
• Example Scenario:
Process A holds resource R1 and requests R2, while Process B holds resource R2
and requests R1. This creates a cycle of dependency that leads to deadlock.
• Challenges:
a. No central control to detect the deadlock.
• Characteristics:
a. Executes on multiple systems.
Each type of ordering impacts the complexity and performance of message delivery systems in
distributed environments.
These mechanisms collectively ensure that distributed transactions maintain their atomicity
despite failures or interruptions.
Q.5 (a) Describe data replication in distributed systems.
Data replication is the process of storing copies of data on multiple nodes in a distributed
system to improve reliability, fault tolerance, and performance.
• Key Benefits:
a. Fault Tolerance: If one node fails, the replicated data on another node ensures
continuity.
• Key Concepts:
a. Multicast Communication: Messages are sent to all members of a group.
b. Write-Update Protocol: Updates all copies with the new value when a write
occurs.
• Example: In a distributed shared memory system, if Process A updates a variable,
other processes accessing the same variable must see the updated value to maintain
coherence.
• Key Features:
a. Transparency: Applications can use shared memory without being aware of the
underlying network communication.
(A diagram can be drawn showing processes on different nodes connected to a shared memory
space.)
• Example Use Case: Parallel computing applications where processes on multiple nodes
share data, such as scientific simulations.
OR: Explain various memory consistency models. Also give an example
of an application for which causal consistency is the most suitable
consistency model.
Memory consistency models define the rules for the visibility of memory updates in a
distributed system, ensuring correct behavior in concurrent operations.
– Use Case: Collaborative applications like Google Docs, where edits by one user
depend on another user's previous changes.
4. Eventual Consistency:
– Ensures that all replicas will eventually converge to the same value if no further
updates are made.
Q.1 (a)
i. Which of the following is NOT a characteristic of a distributed system?
Answer: C) Centralized processing
• Distributed systems do not rely on centralized processing; instead, they are
characterized by decentralization, scalability, and fault tolerance.
• In the client-server model, clients request services, and a central server processes and
provides those services.
• Key Types:
a. Horizontal Scalability: Adding more nodes or machines to the system.
b. Vertical Scalability: Increasing the resources (e.g., CPU, RAM) of existing nodes.
Q.1 (c) What are the primary characteristics and goals of a distributed
system?
Characteristics:
1. Transparency: Hides the complexity of the distributed system from users. Types include
access, location, replication, and failure transparency.
Goals:
1. Improved Performance: Using multiple nodes for parallel processing.
• Advantages:
– Simplifies debugging and testing.
– High modularity.
2. Object-Based Architecture:
• Definition: Based on the concept of objects that encapsulate data and behavior.
Communication occurs via method calls.
• Advantages:
– Encourages reusability and scalability.
• Advantages:
– High interoperability.
1. Design Challenges:
– Requires synchronization mechanisms (e.g., locks, semaphores) to avoid race
conditions and inconsistencies.
– Negative: Can cause bottlenecks due to contention for shared resources, leading
to potential deadlocks or delays.
To handle concurrency effectively, distributed systems must implement mechanisms like thread
management, distributed locking, and fault-tolerant protocols.
Q.2 (a)
i. In distributed systems, a Remote Procedure Call (RPC) enables:
Correct Answer:
B) A program to call a procedure on a remote server
4. Resource Sharing: Enables multiple nodes to share resources like data, hardware, or
software.
Q.2 (b) Define Synchronization in a Distributed System.
Synchronization in distributed systems refers to the coordination of processes and resources to
maintain consistency and correctness.
• Purpose:
a. Ensures that operations occur in the correct order.
• Uses:
a. Data Lookup: DHT enables efficient searching and retrieval of data in a peer-to-
peer network.
d. Applications: Used in systems like BitTorrent and blockchain for distributed file
storage and retrieval.
• Functions:
a. Manages communication between distributed components.
• Physical Clocks:
– Represents real-world time (e.g., UTC).
Key Difference:
• Physical clocks are synchronized to real-world time, while logical clocks order events
relative to each other.
Q.3 (a)
i. Which architecture relies on every node acting as both a client and a server?
Answer: B) Peer-to-peer (P2P) architecture
• In P2P architecture, every node can act as both a client and a server, enabling direct
sharing of resources without relying on a central server. Examples include BitTorrent and
blockchain networks.
ii. Which of the following is a desirable feature in a Distributed File System?
Answer: B) High availability, fault tolerance, and location transparency
– Location transparency: Users can access files without knowing their physical
location.
• Improved Performance: Decreases the load on servers and network by serving cached
data.
• Cache Management Overhead: Requires policies (e.g., Least Recently Used, Write-
Through) to manage cache effectively.
1. Transparency:
– Access Transparency: Users should not need to know the file's physical location.
OR: Explain Distributed Shared Memory (DSM) and its Benefits in Distributed
Systems.
Distributed Shared Memory (DSM) is an abstraction that provides the illusion of shared
memory to processes across multiple nodes in a distributed system.
• Key Features:
a. Global Address Space: Processes access shared data using a global address
space.
Q.4 (a)
(i) The full names of CORBA and RMI:
1. CORBA: Common Object Request Broker Architecture
• Explanation: In the primary-backup replication strategy, one node acts as the primary to
handle all updates, and changes are propagated to backup nodes to maintain
consistency. This is widely used for fault tolerance.
• Purpose:
a. Prevents race conditions and data inconsistency.
b. Edge Chasing: Probes are sent across the system to detect circular wait
conditions.
Steps:
1. Request Phase:
– A process sends a "REQUEST" message to all other processes, including its
timestamp.
• If busy: Queue the request and send a "REPLY" after exiting the critical
section.
2. Execution Phase:
– A process enters the critical section only after receiving "REPLY" messages from
all other processes.
3. Release Phase:
– After exiting the critical section, the process sends pending "REPLY" messages to
waiting requests.
Features:
• Message Complexity: Requires 2(N - 1) messages per critical section entry (N = number
of processes).
Q.5 (a)
(i) In optimistic concurrency control, a transaction is allowed to:
Answer: C) Execute in isolation and validate before committing
• All operations in a transaction must either be fully completed or fully rolled back in case
of a failure.
Phases:
1. Phase 1: Prepare (Voting Phase)
– The coordinator sends a "PREPARE" message to all participating nodes.
– Each participant responds with "YES" (ready to commit) or "NO" (cannot commit).
2. Phase 2: Commit (or Abort)
– If all participants vote "YES", the coordinator sends a "COMMIT" message, and all
participants commit the transaction.
– If any participant votes "NO", the coordinator sends an "ABORT" message, and all
participants roll back the transaction.
Advantages:
• Ensures atomicity in distributed systems.
Disadvantages:
• Vulnerable to blocking if the coordinator fails.
• Purpose:
a. Prevents conflicts by ensuring that only one transaction accesses a resource at a
time.
b. Exclusive Lock (Write Lock): Allows only one transaction to read and modify
data.
• Example:
a. Transaction A holds Lock 1 and requests Lock 2.
Q.5 (d) What are nested transactions, and how do they enhance the
management of complex operations in database systems?
Nested Transactions are hierarchical transactions where a parent transaction contains multiple
child transactions.
Features:
1. Isolation: Each child transaction operates independently.
2. Commit Dependency: A child transaction can commit only if its parent transaction
commits.
3. Partial Rollback: If a child transaction fails, only that transaction is rolled back, not the
entire parent transaction.
Benefits:
1. Modularity: Complex tasks can be broken into smaller sub-transactions.
OR: What are the challenges in transaction recovery for distributed systems, and
how do they differ from centralized systems?
2. Failure Detection: Identifying which node or link has failed can be difficult.
3. Logging Overhead: Each node maintains its logs, requiring coordination to replay logs.
Example:
In distributed systems, Two-Phase Commit or Three-Phase Commit protocols are often
required for recovery, whereas centralized systems rely on simpler methods like write-ahead
logging.