Parallel and Distributed Computing
(PDC) - Notes
1. Introduction to Parallel and Distributed Computing
Parallel Computing: The simultaneous use of multiple compute resources to solve a
computational problem. Tasks are divided into smaller sub-tasks that run concurrently on
multiple processors.
Distributed Computing: A model in which components of a software system are shared
among multiple computers, communicating and coordinating their actions by passing
messages.
Key difference: Parallel computing shares memory/resources in a single system, while
distributed computing uses multiple independent systems connected by a network.
2. Types of Parallelism
- Data Parallelism: Distributing subsets of data across multiple processors and performing
the same operation.
- Task Parallelism: Different processors execute different tasks simultaneously.
- Pipeline Parallelism: Tasks are divided into stages, and each stage is executed in parallel
on different processors.
3. Models of Parallel Computing
- Shared Memory Model: All processors access the same memory space.
- Distributed Memory Model: Each processor has its own private memory; processors
communicate via message passing (MPI).
- Hybrid Model: Combines shared and distributed memory approaches.
4. Threads and Pthreads
Thread: Smallest unit of processing that can be scheduled. Threads share resources like
memory and file handles.
POSIX Threads (Pthreads): A standard API for creating and managing threads in C/C++.
Key functions: pthread_create(), pthread_join(), pthread_exit(), pthread_mutex_lock(),
pthread_mutex_unlock().
5. Parallel Sorting Algorithms
- Parallel Merge Sort: Splits data among processors, sorts locally, then merges.
- Parallel Quick Sort: Partition data and sort sub-partitions in parallel.
- Bitonic Sort: A comparison-based parallel algorithm useful for hardware implementation.
Efficiency depends on minimizing inter-processor communication.
6. Distributed Computing Basics
Key Characteristics:
- Multiple independent computers (nodes)
- Communication via message passing
- Transparency in access, location, replication, and concurrency
Examples: Client-server systems, peer-to-peer networks, cloud computing.
7. Sockets
Socket: Endpoint for sending or receiving data across a computer network.
- Types: TCP (connection-oriented), UDP (connectionless).
- Common functions: socket(), bind(), listen(), accept(), connect(), send(), recv().
Used for communication in distributed systems.
8. Cloud-Based Parallel Computing
Cloud platforms (e.g., AWS, Azure, GCP) provide infrastructure for parallel and distributed
applications.
- On-demand scalability
- Pay-as-you-go pricing
- Services: Compute (EC2), Storage (S3), Databases (RDS), Parallel data processing (Hadoop,
Spark).
- Security, monitoring, and DevOps integration are crucial.