Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
16 views16 pages

Parallel and Distributed Computing

parallel and distributed computing

Uploaded by

MUHAMMAD AZAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views16 pages

Parallel and Distributed Computing

parallel and distributed computing

Uploaded by

MUHAMMAD AZAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Feature Task Parallelism Data Parallelism

Type of Work Different tasks/functions Same task across data elements


Data Involved May be the same or different Must be large data sets
Hardware Mapping Works well on multicore CPUs Best on GPUs, SIMD, or vector processors
Synchronization Need May be high (shared resources) Often lower (independent data slices)
Scalability Moderate (depends on task count) High (depends on data volume)
Example Parallel modules in a compiler Parallel pixel processing in an image

✅ Task Parallelism (Functional Parallelism)

Definition:
Task parallelism is a parallel computing model where different tasks (functions or threads) are executed
simultaneously on different processors or cores.

 Each task may be performing a different operation on the same or different data.
 Used when different parts of a program can run concurrently.

Example Use Cases:

 A web server handling requests: one thread handles database queries, another handles authentication, and
another serves HTML.
 Video editing software: separate threads for reading frames, applying effects, and encoding.

✅ Data Parallelism

Definition:
Data parallelism involves performing the same operation on different pieces of data simultaneously.

 Common in scientific computing, machine learning, and image processing.


 Emphasizes splitting data across multiple cores or processing units.

Example Use Cases:

 Applying a filter to every pixel of an image (same operation on different data).


 Performing matrix multiplication across blocks of the matrix in parallel.
 Training neural networks using mini-batches.

Feature Static Load Balancing Dynamic Load Balancing


Assignment Time Before execution During execution
Overhead Low High (monitoring + scheduling)
Adaptability Low High
Complexity Simple Complex
Use Case Predictable workloads Variable/unpredictable tasks
Example Parallel loops in OpenMP Job queues in MPI or cloud apps

✅ What is Load Balancing?


Load balancing is the technique of distributing work evenly across multiple processors or computing resources to
maximize performance, avoid overload, and ensure efficient resource utilization in parallel or distributed
systems.

✅ 1. Static Load Balancing

Definition:
In static load balancing, the workload is divided among processors before execution begins, based on prior
knowledge of the tasks and system.

Key Features:

 Task allocation is fixed at compile time or system startup.


 No runtime adaptation to load imbalance.
 Works well when the workload and system characteristics are predictable.

Advantages:

 Simpler to implement.
 Low overhead (no runtime decisions).

Disadvantages:

 Can lead to inefficiency if task times vary or some processors finish early.
 Not suitable for dynamic or unpredictable workloads.

Example Use Case:


Matrix multiplication with equal-sized chunks assigned to each processor in advance.

✅ 2. Dynamic Load Balancing

Definition:
In dynamic load balancing, the workload is distributed during runtime based on the current state of the system
(e.g., processor loads, task completion status).

Key Features:

 Processors request more work when they become idle.


 The system adapts to varying workloads and resource availability.

Advantages:

 Better suited for irregular or unpredictable workloads.


 More responsive to runtime performance fluctuations.

Disadvantages:
 More complex.
 Overhead due to decision-making and task migration.

Example Use Case:


Web servers dynamically assigning incoming requests to the least busy server.

🔶 19. Fault Tolerance in Parallel Systems

Fault tolerance in parallel computing ensures that a system continues to function correctly even when one or more
components fail. It is crucial for high-performance and long-running computations.

✅ 1. Redundancy

Definition:
Adding extra components (hardware/software) that can take over if the primary one fails.

Types:

 Hardware Redundancy: Multiple processors, power supplies, etc.


 Software Redundancy: Multiple copies of code or processes.
 Information Redundancy: Adding parity bits or checksums to detect/correct errors in data.

Example:
Using two processors to execute the same task so that if one fails, the other continues.

✅ 2. Checkpointing

Definition:
Saving the state of a process or program at intervals so it can be restored from that point in case of failure.

How It Works:

 Periodically saves the memory, register states, and process info.


 On failure, the system rolls back to the last checkpoint instead of restarting from scratch.

Types:

 Coordinated: All processes save state at the same time.


 Uncoordinated: Each process checkpoints independently (may need recovery protocols).
 Incremental: Only changes since the last checkpoint are saved.

Use Case:
Supercomputers and long-running simulations.
✅ 3. Replication

Definition:
Running multiple instances of a computation or component in parallel to detect and recover from errors.

Forms:

 Active Replication: All replicas run simultaneously and output is voted.


 Passive Replication: A backup is ready to take over if the primary fails.

Purpose:
Improves both availability and reliability.

Example:
Multiple copies of a distributed database running across data centers.

✅ 4. Error Detection

Definition:
Techniques used to identify faults or corruptions in data, memory, or execution.

Methods:

 Parity Checks
 Checksums
 Watchdog Timers
 Assertions and Exception Handling
 Heartbeat signals in distributed systems

Importance:
Detecting errors early helps trigger recovery (e.g., rollback or failover).

✅ 1. Memory Hierarchy Overview

Definition:
The memory hierarchy is a structure that uses multiple types of memory with different speeds
and sizes to provide a balance between cost, speed, and capacity.

Typical Layers (from fastest to slowest):

 Registers (smallest and fastest)


 L1 Cache
 L2/L3 Cache
 Main Memory (RAM)
 Secondary Storage (e.g., SSD/HDD)

Purpose in Parallel Computing:


 Improve performance by reducing access latency.
 Keep frequently accessed data close to the CPU.
 Exploit locality of reference (temporal and spatial).

✅ 2. Cache Coherence Protocols

Problem:
In multiprocessor systems with private caches, one processor’s update to a shared variable may
not be visible to others—leading to inconsistency.

Solution:
Cache coherence protocols ensure that all processors have a consistent view of memory.

Common Protocols:

Protocol Mechanism Notes


MESI (Modified, Exclusive, Most common in modern
States control sharing & updates
Shared, Invalid) CPUs
Reduces memory write-
MOESI Adds “Owned” state to MESI
back overhead
Uses a central directory to track state of
Directory-Based Protocols Scalable in large systems
each memory block
Caches monitor (snoop) bus
Snooping Protocols Fast but less scalable
transactions

Example Issue:
Two CPUs caching the same memory block—one writes to it. Without coherence, the second
CPU might read stale data.

✅ 3. NUMA Awareness (Non-Uniform Memory Access)

Definition:
In NUMA architectures, memory is divided into local (close to the processor) and remote (far
from it) regions. Accessing local memory is faster than remote memory.

Key Concepts:

 Each processor has its own memory bank.


 All memory is accessible to all processors, but access time varies.
 Optimizing software to access local memory boosts performance.

Importance in Parallel Computing:


 Poor NUMA placement can degrade performance.
 Threads should be bound to CPUs and memory regions carefully.

NUMA-Aware Programming:

 Allocate memory local to the thread/core.


 Use tools like numactl, or memory allocators that support NUMA.

🔹 1. AWS Overview and Infrastructure

 AWS (Amazon Web Services): A leading cloud provider offering scalable computing
power, storage, and services on demand.
 Evolution: Launched in 2006, AWS began with services like EC2 and S3 and now
includes 200+ services.
 Global Infrastructure:
o Regions: Geographical areas (e.g., US-East-1).
o Availability Zones (AZs): Isolated locations within a region.
o Edge Locations: Used for content delivery (via CloudFront).

🔹 2. Core AWS Services

 Compute Services: EC2, Lambda (serverless), Auto Scaling, Elastic Beanstalk.


 Storage Services: S3 (object), EBS (block), Glacier (archival).
 Database Services: RDS (SQL), DynamoDB (NoSQL), Aurora.
 Networking & Content Delivery: VPC, CloudFront, Route 53, API Gateway.

🔹 3. AWS Security and Compliance

 Shared Responsibility Model:


o AWS manages infrastructure.
o Customers manage data, access, apps.
 Key Security Features: IAM, encryption, security groups.
 Compliance Programs: HIPAA, ISO, GDPR, etc.

🔹 4. AWS Pricing and Billing

 On-Demand: Pay per use.


 Reserved Instances: Long-term, cheaper.
 Spot Instances: Unused capacity at discounts.
 Free Tier: Limited services free for 12 months or always.
🔹 5. AWS Management and Monitoring

 AWS Console and CLI: Interfaces to manage services.


 CloudFormation: Infrastructure as Code (IaC).
 CloudWatch: Monitoring and logging.
 AWS Config: Resource compliance tracking.
 Trusted Advisor: Best practices and security checks.

🔹 6. DevOps on AWS

 CI/CD Services: CodePipeline, CodeBuild, CodeDeploy.


 IaC: CloudFormation, Terraform (3rd-party).
 Automation: AWS Systems Manager, Lambda, OpsWorks.

🔹 7. Machine Learning and AI

 Amazon SageMaker: Build, train, deploy ML models.


 Rekognition: Image and video analysis.
 Comprehend, Lex, Polly: NLP, chatbots, text-to-speech.

🔹 8. Big Data and Analytics

 EMR: Hadoop/Spark cluster.


 Glue: ETL service.
 Kinesis: Real-time data streaming.
 Redshift: Data warehouse.
 Athena: SQL on S3 data.

🔹 9. Application Integration

 SQS: Message queues.


 SNS: Push notifications.
 Step Functions: Workflow orchestration.
 EventBridge: Serverless event bus.
🔹 10. Internet of Things (IoT)

 IoT Core: Secure device connection.


 Greengrass: Local compute on edge devices.
 IoT Analytics: Analysis of IoT data.
 FreeRTOS: OS for microcontrollers.

🔹 11. Migration and Transfer

 Migration Hub: Tracks migration progress.


 DMS: Database migration.
 Snow Family: Physical data transfer (Snowcone, Snowball, Snowmobile).
 Transfer Family: SFTP, FTPS, FTP data transfer.

🔹 12. Hybrid Cloud and Edge Services

 AWS Outposts: Run AWS services on-premises.


 Wavelength: Edge computing with 5G.
 Snowcone: Portable edge device.
 Local Zones: Low-latency AWS infrastructure near users.

🔹 13. Use Cases & Real-World Applications

 Startups: Rapid prototyping (e.g., Airbnb).


 Enterprises: Scale apps and storage.
 Government/Healthcare: Secure workloads (HIPAA).
 Media/Gaming: Streaming and rendering workloads.

🔹 14. AWS Popular Customers

 Netflix, NASA, Airbnb, Samsung, McDonald’s, Capital One.

🔹 15. Comparison with Other Platforms


Feature AWS Azure Google Cloud
Market Share Largest Growing fast Strong AI
Strengths Ecosystem Hybrid Cloud Big Data/AI
Billing Flexible Pay-as-you-go Pay-as-you-go

🔹 16. Training and Certification

 Certification Levels:
o Foundational: Cloud Practitioner
o Associate: Developer, Solutions Architect
o Professional: Architect Pro, DevOps Pro
o Specialty: Security, ML, Big Data
 Learning Platforms: AWS Skill Builder, Coursera, Udemy.

🔹 17. Future Directions for AWS

 Serverless Growth: Lambda, Fargate.


 Sustainable Cloud: Carbon footprint awareness.
 AI/ML Evolution: Enhanced SageMaker, custom silicon.
 Hybrid & Quantum Computing: Braket for quantum workloads.

🌐 1. AWS Global Infrastructure

✅ Regions

 Geographic areas around the world (e.g., US-East, Asia-Pacific).


 Each region contains multiple isolated locations called Availability Zones (AZs).

✅ Availability Zones (AZs)

 Physically separated data centers within a region.


 Allow fault-tolerant and highly available architecture.

✅ Edge Locations

 Used by Amazon CloudFront for content delivery.


 Located in cities to reduce latency for end users.
⚙️2. Core AWS Services

📦 Compute Services

 EC2: Virtual servers for compute capacity.


 Lambda: Serverless computing — run code in response to events.
 ECS/EKS: Container services for Docker/Kubernetes.

📦 Storage Services

 S3: Scalable object storage.


 EBS: Block storage for EC2 instances.
 Glacier: Low-cost archival storage.

📦 Database Services

 RDS: Managed relational databases (MySQL, PostgreSQL).


 DynamoDB: NoSQL database.
 Redshift: Data warehouse for analytics.

🌐 Networking

 VPC: Isolated cloud network.


 Route 53: DNS service.
 CloudFront: CDN for fast delivery.
 Elastic Load Balancer: Distributes traffic across EC2 instances.

🔐 3. Security and Compliance

✅ Shared Responsibility Model

 AWS manages security "of" the cloud (hardware, software).


 You manage security "in" the cloud (data, identity, access).

✅ Key Security Features

 IAM: Control access to services/resources.


 KMS: Encryption key management.
 Shield & WAF: DDoS protection and web application firewall.

✅ Compliance Programs

 AWS complies with GDPR, HIPAA, ISO, etc.


💰 4. AWS Pricing Models

💵 On-Demand

 Pay for compute/storage by the hour/second.


 No long-term commitments.

💵 Reserved Instances

 Pay upfront for 1–3 years.


 Significant savings (up to 75%) for predictable workloads.

💵 Spot Instances

 Use spare AWS capacity at reduced prices (up to 90% off).


 May be terminated when AWS needs the capacity.

🛠 5. Management and Monitoring

 AWS Console: Web UI for managing services.


 AWS CLI: Command-line interface.
 CloudWatch: Monitoring and logging for resources.
 Config: Track configuration changes.
 Trusted Advisor: Provides best practice recommendations.

🧪 6. DevOps on AWS

🛠 CI/CD Tools

 CodePipeline: Automates build, test, deploy.


 CodeCommit: Git-based source repo.
 CodeDeploy: Deployment automation.

💻 Infrastructure as Code

 CloudFormation: Declarative templates to define infrastructure.


 Terraform (non-AWS tool, often used with AWS).
🤖 7. Machine Learning and AI

 Amazon SageMaker: Train/deploy machine learning models.


 Rekognition: Image and video analysis.
 Comprehend: NLP (Natural Language Processing).
 Polly: Text-to-speech.
 Lex: Chatbots (used in Alexa).

📈 8. Big Data and Analytics

 EMR: Managed Hadoop for big data.


 Athena: Run SQL queries on S3 data.
 Glue: ETL (Extract, Transform, Load) service.
 Redshift: Scalable data warehouse.

🔄 9. Application Integration

 SQS: Queue-based messaging.


 SNS: Publish-subscribe messaging.
 Step Functions: Serverless workflows.
 EventBridge: Event-driven application integration.

🌍 10. IoT and Edge Computing

 AWS IoT Core: Connect IoT devices to AWS.


 Greengrass: Run local compute, messaging, ML on devices.
 FreeRTOS: Real-time OS for microcontrollers.

🚚 11. Migration and Hybrid Cloud

 Migration Hub: Track and manage cloud migrations.


 DMS: Database Migration Service.
 Snow Family: Physical devices for offline data transfer (Snowcone, Snowball).
 Outposts: AWS services in your on-prem data center.
 Wavelength: Edge compute with 5G.
 Local Zones: Run latency-sensitive apps closer to users.
🧠 12. Use Cases

 Startups: Build MVPs quickly with low cost.


 Enterprises: Migrate large-scale workloads.
 Government/Healthcare: Secure and compliant solutions.
 Media/Gaming: Global content delivery, game server hosting.

🔍 13. Popular Customers

 Netflix, NASA, Airbnb, LinkedIn, Samsung, Pfizer, Capital One, etc.

📚 14. Training & Certification

 Foundational: Cloud Practitioner


 Associate: Solutions Architect, Developer
 Professional: Advanced Architect, DevOps Engineer
 Specialty: Security, ML, Data Analytics, etc.

🚀 15. Future of AWS and Cloud Parallelism

 Growth in serverless (Lambda, Step Functions).


 Emphasis on sustainability (carbon-neutral infrastructure).
 Advancement in AI/ML, quantum computing, and hybrid cloud.

1–2. Introduction & Definition of Distributed Computing

Distributed Computing is a field of computer science where a group of independent


computers (nodes) work together over a network to achieve a common goal. These systems
appear to users as a single coherent system.

🔑 Key Characteristics:

 Transparency: Users don't see the complexity behind the scenes.


 Concurrency: Multiple processes run in parallel.
 Scalability: Systems can grow by adding more nodes.
 Fault Tolerance: System continues working despite failures.
 Resource Sharing: Devices and data can be shared.
3. Architecture of Distributed Systems

Client-Server Architecture

 Central server provides services.


 Multiple clients request/use these services.
 E.g., Email services.

🔁 Peer-to-Peer (P2P) Architecture

 All nodes act as both clients and servers.


 No central coordinator.
 E.g., BitTorrent, blockchain.

🧱 Multi-Tier Architecture

 Divides system into layers: presentation, logic, and data.


 Common in web applications.

🧩 4. Models of Distributed Computing

✉️Message Passing Model

 Nodes communicate by sending/receiving messages.


 Used in MPI, socket programming.

🧠 Shared Memory Model

 Nodes access common memory space.


 Easier to program but harder to scale.

🛠 5. Components of a Distributed System

 Nodes: Independent computers.


 Network: Communication medium (LAN, WAN, Internet).
 Middleware: Software that connects distributed components (e.g., CORBA, RPC,
gRPC).
 Resource Manager: Allocates and manages system resources.
✅ 6. Advantages of Distributed Computing
Benefit Description

Scalability Add more nodes to increase capacity.

Fault Tolerance If one node fails, others continue working.

Resource Sharing Use resources across systems efficiently.

Cost-Effectiveness Use commodity hardware instead of expensive supercomputers.

Parallelism Execute tasks in parallel across machines for faster results.

⚠️7. Challenges in Distributed Computing


Challenge Description

Network Latency Delays in communication can slow the system.

Synchronization Ensuring processes across nodes stay in sync (e.g., clocks, operations).

Fault Detection Detecting failed nodes or dropped messages is hard.

Security Protecting data and systems from attacks.

Data Consistency Ensuring all nodes have a consistent view of shared data.

🧱 8. Distributed Computing Models & Frameworks

 MapReduce: Programming model for processing large data sets across clusters (e.g.,
Hadoop).
 Apache Spark: Faster alternative to MapReduce with in-memory processing.
 MPI (Message Passing Interface): Standard for message-passing in distributed memory
systems, used in HPC.

🚀 9. Applications of Distributed Computing


Area Example Use Case

Cloud Computing AWS, Azure — host apps and services in distributed fashion

Big Data Analytics Hadoop/Spark analyze petabytes of data


Area Example Use Case

Scientific Research Protein folding, climate modeling

Blockchain Decentralized ledgers for cryptocurrencies like Bitcoin

🔄 10. Distributed vs. Parallel Computing


Feature Distributed Computing Parallel Computing

Nodes Multiple independent computers Multiple processors/cores in one system

Memory Usually distributed memory Usually shared memory

Communication Over network (slower) Via shared memory (faster)

Example Google Search, Bitcoin GPU computations, matrix multiplication

🔐 11. Security in Distributed Systems

✅ Authentication & Authorization

 Verifying identities (who are you?) and granting appropriate permissions (what can you
do?).

🔒 Encryption

 Protecting data during transmission (TLS/SSL) and at rest (AES, RSA).

💪 Fault Tolerance

 Security techniques also help maintain availability, such as replication, failover


systems, and distributed backups.

You might also like