UNIT 1: INTRODUCTION TO DISTRIBUTED DATABASES
Hrs: 8 | Marks: 8
Topics & Subtopics
● Distributed Data Processing
○ Centralized vs Distributed Systems
○ Client-Server, Peer-to-Peer, Multi-tier architecture
○ Goals of distributed data processing
● What is a Distributed Database System (DDBS)?
○ Definition, Components of DDBS
○ Distributed DBMS vs Centralized DBMS
● Advantages and Disadvantages of DDBS
○ Advantages: Reliability, Availability, Scalability, Autonomy
○ Disadvantages: Complexity, Overhead, Security, Data integrity
● Problem Areas in DDBS
○ Data distribution
○ Query processing
○ Concurrency and Recovery
○ Security & Integrity
● Overview of Database & Computer Network Concepts
○ Basics of Relational Databases (Normalization, ER to Relational)
○ Basics of Networking (LAN, WAN, TCP/IP, OSI model)
○ Role of communication in distributed DBMS
UNIT 2: DDBMS ARCHITECTURE, DESIGN & QUERY
PROCESSING
Hrs: 11
Distributed DBMS Architecture
● Types of Transparencies
○ Distribution, Transaction, Performance, Fault Tolerance, Replication
● Architectures
○ Client-server
○ Peer-to-peer
○ Federated systems
● Global Directory Issues
○ Centralized vs Distributed Directory
○ Directory management and consistency
Distributed Database Design
● Alternative Design Strategies
○ Top-down, Bottom-up, Mixed approach
● Fragmentation
○ Horizontal, Vertical, Mixed
○ Correctness criteria: Completeness, Reconstruction, Disjointness
● Data Allocation
○ Centralized, Replicated, Partitioned
○ Allocation strategies (Optimal, Heuristic)
Semantics & Data Control
● View Management
○ Virtual views in distributed systems
● Data Security
○ Access control, Encryption, Secure channels
● Semantic Integrity Control
○ Constraints (PK, FK, Check), Global constraints, Distributed triggers
Query Processing Issues
● Objectives
○ Minimize response time, Reduce communication cost
● Characterization
○ Query processor components (parsers, optimizers, executors)
● Layers of Query Processing
● Query Decomposition
○ Query parsing, rewriting, and fragment identification
● Localization of Distributed Data
○ Data localization using fragments
UNIT 3: QUERY OPTIMIZATION, TRANSACTIONS &
CONCURRENCY CONTROL
Hrs: 11
Distributed Query Optimization
● Factors Governing Optimization
○ Communication cost, Processing cost, Local site constraints
● Centralized Query Optimization
○ Cost-based, Rule-based approaches
● Ordering of Fragment Queries
○ Semi-join strategy, Join ordering
● Distributed Optimization Algorithms
○ Dynamic programming, Greedy heuristics
○ Algorithm to study: Selinger's Cost-Based Optimization
Transaction Management
● Transaction Concept
○ ACID properties
● Goals
○ Atomicity, Consistency, Isolation, Durability
● Characteristics
○ Read/write sets, Conflicts, Schedules
● Taxonomy of Transaction Models
○ Flat, Nested, Distributed, Compensating transactions
Concurrency Control
● In Centralized Systems
○ Lock-based protocols, Timestamp ordering, MVCC
● In Distributed Systems
○ Distributed 2PL, Timestamp ordering
● Distributed Algorithms
○ Two-Phase Locking (2PL)
○ Wound-Wait & Wait-Die
● Deadlock Management
○ Prevention, Detection, Resolution
○ Global Wait-For Graph
UNIT 4: RELIABILITY AND RECOVERY
Hrs: 8
Reliability Issues
● Reliability vs Availability
Types of Failures
● Transaction failures
● System (site) failures
● Communication failures
Reliability Techniques
● Redundancy, Checkpointing, Logging
Commit Protocols
● Two-Phase Commit (2PC)
● Three-Phase Commit (3PC)
Recovery Protocols
● Distributed recovery
● Log-based recovery
● Cascading rollback, Checkpointing
UNIT 5: PARALLEL DATABASE SYSTEMS
Hrs: 6
Parallel Architectures
● Shared Memory, Shared Disk, Shared Nothing
● Interconnection networks (Bus, Mesh, Hypercube)
Parallel Query Processing
● Intra-query and Inter-query parallelism
● Pipelining vs Partitioning
● Granularity of parallelism
UNIT 6: ADVANCED TOPICS
(May vary by instructor or exam pattern)
Mobile Databases
● Characteristics: limited bandwidth, mobility
● Data synchronization, Caching, Disconnection management
Distributed Object Management
● Object-oriented DBMS concepts
● CORBA, Distributed object references
Multi-databases (Federated DBs)
● Local autonomy
● Schema integration
● Query processing in multi-DBMS
🔑 Summary of Important Algorithms
Topic Algorithms
Fragmentation Fragmentation correctness checks
Query Selinger’s Algorithm, Greedy Heuristic
Optimization
Concurrency 2PL, Timestamp Ordering, Wait-Die, Wound-Wait
Control
Deadlock Global Wait-For Graph Analysis
Handling
Commit 2PC, 3PC
Protocols
Recovery Log-based Recovery (ARIES if covered),
Checkpointing