Feature Task Parallelism Data Parallelism
Type of Work Different tasks/functions Same task across data elements
Data Involved May be the same or different Must be large data sets
Hardware Mapping Works well on multicore CPUs Best on GPUs, SIMD, or vector processors
Synchronization Need May be high (shared resources) Often lower (independent data slices)
Scalability Moderate (depends on task count) High (depends on data volume)
Example Parallel modules in a compiler Parallel pixel processing in an image
✅ Task Parallelism (Functional Parallelism)
Definition:
Task parallelism is a parallel computing model where different tasks (functions or threads) are executed
simultaneously on different processors or cores.
Each task may be performing a different operation on the same or different data.
Used when different parts of a program can run concurrently.
Example Use Cases:
A web server handling requests: one thread handles database queries, another handles authentication, and
another serves HTML.
Video editing software: separate threads for reading frames, applying effects, and encoding.
✅ Data Parallelism
Definition:
Data parallelism involves performing the same operation on different pieces of data simultaneously.
Common in scientific computing, machine learning, and image processing.
Emphasizes splitting data across multiple cores or processing units.
Example Use Cases:
Applying a filter to every pixel of an image (same operation on different data).
Performing matrix multiplication across blocks of the matrix in parallel.
Training neural networks using mini-batches.
Feature Static Load Balancing Dynamic Load Balancing
Assignment Time Before execution During execution
Overhead Low High (monitoring + scheduling)
Adaptability Low High
Complexity Simple Complex
Use Case Predictable workloads Variable/unpredictable tasks
Example Parallel loops in OpenMP Job queues in MPI or cloud apps
✅ What is Load Balancing?
Load balancing is the technique of distributing work evenly across multiple processors or computing resources to
maximize performance, avoid overload, and ensure efficient resource utilization in parallel or distributed
systems.
✅ 1. Static Load Balancing
Definition:
In static load balancing, the workload is divided among processors before execution begins, based on prior
knowledge of the tasks and system.
Key Features:
Task allocation is fixed at compile time or system startup.
No runtime adaptation to load imbalance.
Works well when the workload and system characteristics are predictable.
Advantages:
Simpler to implement.
Low overhead (no runtime decisions).
Disadvantages:
Can lead to inefficiency if task times vary or some processors finish early.
Not suitable for dynamic or unpredictable workloads.
Example Use Case:
Matrix multiplication with equal-sized chunks assigned to each processor in advance.
✅ 2. Dynamic Load Balancing
Definition:
In dynamic load balancing, the workload is distributed during runtime based on the current state of the system
(e.g., processor loads, task completion status).
Key Features:
Processors request more work when they become idle.
The system adapts to varying workloads and resource availability.
Advantages:
Better suited for irregular or unpredictable workloads.
More responsive to runtime performance fluctuations.
Disadvantages:
More complex.
Overhead due to decision-making and task migration.
Example Use Case:
Web servers dynamically assigning incoming requests to the least busy server.
🔶 19. Fault Tolerance in Parallel Systems
Fault tolerance in parallel computing ensures that a system continues to function correctly even when one or more
components fail. It is crucial for high-performance and long-running computations.
✅ 1. Redundancy
Definition:
Adding extra components (hardware/software) that can take over if the primary one fails.
Types:
Hardware Redundancy: Multiple processors, power supplies, etc.
Software Redundancy: Multiple copies of code or processes.
Information Redundancy: Adding parity bits or checksums to detect/correct errors in data.
Example:
Using two processors to execute the same task so that if one fails, the other continues.
✅ 2. Checkpointing
Definition:
Saving the state of a process or program at intervals so it can be restored from that point in case of failure.
How It Works:
Periodically saves the memory, register states, and process info.
On failure, the system rolls back to the last checkpoint instead of restarting from scratch.
Types:
Coordinated: All processes save state at the same time.
Uncoordinated: Each process checkpoints independently (may need recovery protocols).
Incremental: Only changes since the last checkpoint are saved.
Use Case:
Supercomputers and long-running simulations.
✅ 3. Replication
Definition:
Running multiple instances of a computation or component in parallel to detect and recover from errors.
Forms:
Active Replication: All replicas run simultaneously and output is voted.
Passive Replication: A backup is ready to take over if the primary fails.
Purpose:
Improves both availability and reliability.
Example:
Multiple copies of a distributed database running across data centers.
✅ 4. Error Detection
Definition:
Techniques used to identify faults or corruptions in data, memory, or execution.
Methods:
Parity Checks
Checksums
Watchdog Timers
Assertions and Exception Handling
Heartbeat signals in distributed systems
Importance:
Detecting errors early helps trigger recovery (e.g., rollback or failover).
✅ 1. Memory Hierarchy Overview
Definition:
The memory hierarchy is a structure that uses multiple types of memory with different speeds
and sizes to provide a balance between cost, speed, and capacity.
Typical Layers (from fastest to slowest):
Registers (smallest and fastest)
L1 Cache
L2/L3 Cache
Main Memory (RAM)
Secondary Storage (e.g., SSD/HDD)
Purpose in Parallel Computing:
Improve performance by reducing access latency.
Keep frequently accessed data close to the CPU.
Exploit locality of reference (temporal and spatial).
✅ 2. Cache Coherence Protocols
Problem:
In multiprocessor systems with private caches, one processor’s update to a shared variable may
not be visible to others—leading to inconsistency.
Solution:
Cache coherence protocols ensure that all processors have a consistent view of memory.
Common Protocols:
Protocol Mechanism Notes
MESI (Modified, Exclusive, Most common in modern
States control sharing & updates
Shared, Invalid) CPUs
Reduces memory write-
MOESI Adds “Owned” state to MESI
back overhead
Uses a central directory to track state of
Directory-Based Protocols Scalable in large systems
each memory block
Caches monitor (snoop) bus
Snooping Protocols Fast but less scalable
transactions
Example Issue:
Two CPUs caching the same memory block—one writes to it. Without coherence, the second
CPU might read stale data.
✅ 3. NUMA Awareness (Non-Uniform Memory Access)
Definition:
In NUMA architectures, memory is divided into local (close to the processor) and remote (far
from it) regions. Accessing local memory is faster than remote memory.
Key Concepts:
Each processor has its own memory bank.
All memory is accessible to all processors, but access time varies.
Optimizing software to access local memory boosts performance.
Importance in Parallel Computing:
Poor NUMA placement can degrade performance.
Threads should be bound to CPUs and memory regions carefully.
NUMA-Aware Programming:
Allocate memory local to the thread/core.
Use tools like numactl, or memory allocators that support NUMA.
🔹 1. AWS Overview and Infrastructure
AWS (Amazon Web Services): A leading cloud provider offering scalable computing
power, storage, and services on demand.
Evolution: Launched in 2006, AWS began with services like EC2 and S3 and now
includes 200+ services.
Global Infrastructure:
o Regions: Geographical areas (e.g., US-East-1).
o Availability Zones (AZs): Isolated locations within a region.
o Edge Locations: Used for content delivery (via CloudFront).
🔹 2. Core AWS Services
Compute Services: EC2, Lambda (serverless), Auto Scaling, Elastic Beanstalk.
Storage Services: S3 (object), EBS (block), Glacier (archival).
Database Services: RDS (SQL), DynamoDB (NoSQL), Aurora.
Networking & Content Delivery: VPC, CloudFront, Route 53, API Gateway.
🔹 3. AWS Security and Compliance
Shared Responsibility Model:
o AWS manages infrastructure.
o Customers manage data, access, apps.
Key Security Features: IAM, encryption, security groups.
Compliance Programs: HIPAA, ISO, GDPR, etc.
🔹 4. AWS Pricing and Billing
On-Demand: Pay per use.
Reserved Instances: Long-term, cheaper.
Spot Instances: Unused capacity at discounts.
Free Tier: Limited services free for 12 months or always.
🔹 5. AWS Management and Monitoring
AWS Console and CLI: Interfaces to manage services.
CloudFormation: Infrastructure as Code (IaC).
CloudWatch: Monitoring and logging.
AWS Config: Resource compliance tracking.
Trusted Advisor: Best practices and security checks.
🔹 6. DevOps on AWS
CI/CD Services: CodePipeline, CodeBuild, CodeDeploy.
IaC: CloudFormation, Terraform (3rd-party).
Automation: AWS Systems Manager, Lambda, OpsWorks.
🔹 7. Machine Learning and AI
Amazon SageMaker: Build, train, deploy ML models.
Rekognition: Image and video analysis.
Comprehend, Lex, Polly: NLP, chatbots, text-to-speech.
🔹 8. Big Data and Analytics
EMR: Hadoop/Spark cluster.
Glue: ETL service.
Kinesis: Real-time data streaming.
Redshift: Data warehouse.
Athena: SQL on S3 data.
🔹 9. Application Integration
SQS: Message queues.
SNS: Push notifications.
Step Functions: Workflow orchestration.
EventBridge: Serverless event bus.
🔹 10. Internet of Things (IoT)
IoT Core: Secure device connection.
Greengrass: Local compute on edge devices.
IoT Analytics: Analysis of IoT data.
FreeRTOS: OS for microcontrollers.
🔹 11. Migration and Transfer
Migration Hub: Tracks migration progress.
DMS: Database migration.
Snow Family: Physical data transfer (Snowcone, Snowball, Snowmobile).
Transfer Family: SFTP, FTPS, FTP data transfer.
🔹 12. Hybrid Cloud and Edge Services
AWS Outposts: Run AWS services on-premises.
Wavelength: Edge computing with 5G.
Snowcone: Portable edge device.
Local Zones: Low-latency AWS infrastructure near users.
🔹 13. Use Cases & Real-World Applications
Startups: Rapid prototyping (e.g., Airbnb).
Enterprises: Scale apps and storage.
Government/Healthcare: Secure workloads (HIPAA).
Media/Gaming: Streaming and rendering workloads.
🔹 14. AWS Popular Customers
Netflix, NASA, Airbnb, Samsung, McDonald’s, Capital One.
🔹 15. Comparison with Other Platforms
Feature AWS Azure Google Cloud
Market Share Largest Growing fast Strong AI
Strengths Ecosystem Hybrid Cloud Big Data/AI
Billing Flexible Pay-as-you-go Pay-as-you-go
🔹 16. Training and Certification
Certification Levels:
o Foundational: Cloud Practitioner
o Associate: Developer, Solutions Architect
o Professional: Architect Pro, DevOps Pro
o Specialty: Security, ML, Big Data
Learning Platforms: AWS Skill Builder, Coursera, Udemy.
🔹 17. Future Directions for AWS
Serverless Growth: Lambda, Fargate.
Sustainable Cloud: Carbon footprint awareness.
AI/ML Evolution: Enhanced SageMaker, custom silicon.
Hybrid & Quantum Computing: Braket for quantum workloads.
🌐 1. AWS Global Infrastructure
✅ Regions
Geographic areas around the world (e.g., US-East, Asia-Pacific).
Each region contains multiple isolated locations called Availability Zones (AZs).
✅ Availability Zones (AZs)
Physically separated data centers within a region.
Allow fault-tolerant and highly available architecture.
✅ Edge Locations
Used by Amazon CloudFront for content delivery.
Located in cities to reduce latency for end users.
⚙️2. Core AWS Services
📦 Compute Services
EC2: Virtual servers for compute capacity.
Lambda: Serverless computing — run code in response to events.
ECS/EKS: Container services for Docker/Kubernetes.
📦 Storage Services
S3: Scalable object storage.
EBS: Block storage for EC2 instances.
Glacier: Low-cost archival storage.
📦 Database Services
RDS: Managed relational databases (MySQL, PostgreSQL).
DynamoDB: NoSQL database.
Redshift: Data warehouse for analytics.
🌐 Networking
VPC: Isolated cloud network.
Route 53: DNS service.
CloudFront: CDN for fast delivery.
Elastic Load Balancer: Distributes traffic across EC2 instances.
🔐 3. Security and Compliance
✅ Shared Responsibility Model
AWS manages security "of" the cloud (hardware, software).
You manage security "in" the cloud (data, identity, access).
✅ Key Security Features
IAM: Control access to services/resources.
KMS: Encryption key management.
Shield & WAF: DDoS protection and web application firewall.
✅ Compliance Programs
AWS complies with GDPR, HIPAA, ISO, etc.
💰 4. AWS Pricing Models
💵 On-Demand
Pay for compute/storage by the hour/second.
No long-term commitments.
💵 Reserved Instances
Pay upfront for 1–3 years.
Significant savings (up to 75%) for predictable workloads.
💵 Spot Instances
Use spare AWS capacity at reduced prices (up to 90% off).
May be terminated when AWS needs the capacity.
🛠 5. Management and Monitoring
AWS Console: Web UI for managing services.
AWS CLI: Command-line interface.
CloudWatch: Monitoring and logging for resources.
Config: Track configuration changes.
Trusted Advisor: Provides best practice recommendations.
🧪 6. DevOps on AWS
🛠 CI/CD Tools
CodePipeline: Automates build, test, deploy.
CodeCommit: Git-based source repo.
CodeDeploy: Deployment automation.
💻 Infrastructure as Code
CloudFormation: Declarative templates to define infrastructure.
Terraform (non-AWS tool, often used with AWS).
🤖 7. Machine Learning and AI
Amazon SageMaker: Train/deploy machine learning models.
Rekognition: Image and video analysis.
Comprehend: NLP (Natural Language Processing).
Polly: Text-to-speech.
Lex: Chatbots (used in Alexa).
📈 8. Big Data and Analytics
EMR: Managed Hadoop for big data.
Athena: Run SQL queries on S3 data.
Glue: ETL (Extract, Transform, Load) service.
Redshift: Scalable data warehouse.
🔄 9. Application Integration
SQS: Queue-based messaging.
SNS: Publish-subscribe messaging.
Step Functions: Serverless workflows.
EventBridge: Event-driven application integration.
🌍 10. IoT and Edge Computing
AWS IoT Core: Connect IoT devices to AWS.
Greengrass: Run local compute, messaging, ML on devices.
FreeRTOS: Real-time OS for microcontrollers.
🚚 11. Migration and Hybrid Cloud
Migration Hub: Track and manage cloud migrations.
DMS: Database Migration Service.
Snow Family: Physical devices for offline data transfer (Snowcone, Snowball).
Outposts: AWS services in your on-prem data center.
Wavelength: Edge compute with 5G.
Local Zones: Run latency-sensitive apps closer to users.
🧠 12. Use Cases
Startups: Build MVPs quickly with low cost.
Enterprises: Migrate large-scale workloads.
Government/Healthcare: Secure and compliant solutions.
Media/Gaming: Global content delivery, game server hosting.
🔍 13. Popular Customers
Netflix, NASA, Airbnb, LinkedIn, Samsung, Pfizer, Capital One, etc.
📚 14. Training & Certification
Foundational: Cloud Practitioner
Associate: Solutions Architect, Developer
Professional: Advanced Architect, DevOps Engineer
Specialty: Security, ML, Data Analytics, etc.
🚀 15. Future of AWS and Cloud Parallelism
Growth in serverless (Lambda, Step Functions).
Emphasis on sustainability (carbon-neutral infrastructure).
Advancement in AI/ML, quantum computing, and hybrid cloud.
1–2. Introduction & Definition of Distributed Computing
Distributed Computing is a field of computer science where a group of independent
computers (nodes) work together over a network to achieve a common goal. These systems
appear to users as a single coherent system.
🔑 Key Characteristics:
Transparency: Users don't see the complexity behind the scenes.
Concurrency: Multiple processes run in parallel.
Scalability: Systems can grow by adding more nodes.
Fault Tolerance: System continues working despite failures.
Resource Sharing: Devices and data can be shared.
3. Architecture of Distributed Systems
Client-Server Architecture
Central server provides services.
Multiple clients request/use these services.
E.g., Email services.
🔁 Peer-to-Peer (P2P) Architecture
All nodes act as both clients and servers.
No central coordinator.
E.g., BitTorrent, blockchain.
🧱 Multi-Tier Architecture
Divides system into layers: presentation, logic, and data.
Common in web applications.
🧩 4. Models of Distributed Computing
✉️Message Passing Model
Nodes communicate by sending/receiving messages.
Used in MPI, socket programming.
🧠 Shared Memory Model
Nodes access common memory space.
Easier to program but harder to scale.
🛠 5. Components of a Distributed System
Nodes: Independent computers.
Network: Communication medium (LAN, WAN, Internet).
Middleware: Software that connects distributed components (e.g., CORBA, RPC,
gRPC).
Resource Manager: Allocates and manages system resources.
✅ 6. Advantages of Distributed Computing
Benefit Description
Scalability Add more nodes to increase capacity.
Fault Tolerance If one node fails, others continue working.
Resource Sharing Use resources across systems efficiently.
Cost-Effectiveness Use commodity hardware instead of expensive supercomputers.
Parallelism Execute tasks in parallel across machines for faster results.
⚠️7. Challenges in Distributed Computing
Challenge Description
Network Latency Delays in communication can slow the system.
Synchronization Ensuring processes across nodes stay in sync (e.g., clocks, operations).
Fault Detection Detecting failed nodes or dropped messages is hard.
Security Protecting data and systems from attacks.
Data Consistency Ensuring all nodes have a consistent view of shared data.
🧱 8. Distributed Computing Models & Frameworks
MapReduce: Programming model for processing large data sets across clusters (e.g.,
Hadoop).
Apache Spark: Faster alternative to MapReduce with in-memory processing.
MPI (Message Passing Interface): Standard for message-passing in distributed memory
systems, used in HPC.
🚀 9. Applications of Distributed Computing
Area Example Use Case
Cloud Computing AWS, Azure — host apps and services in distributed fashion
Big Data Analytics Hadoop/Spark analyze petabytes of data
Area Example Use Case
Scientific Research Protein folding, climate modeling
Blockchain Decentralized ledgers for cryptocurrencies like Bitcoin
🔄 10. Distributed vs. Parallel Computing
Feature Distributed Computing Parallel Computing
Nodes Multiple independent computers Multiple processors/cores in one system
Memory Usually distributed memory Usually shared memory
Communication Over network (slower) Via shared memory (faster)
Example Google Search, Bitcoin GPU computations, matrix multiplication
🔐 11. Security in Distributed Systems
✅ Authentication & Authorization
Verifying identities (who are you?) and granting appropriate permissions (what can you
do?).
🔒 Encryption
Protecting data during transmission (TLS/SSL) and at rest (AES, RSA).
💪 Fault Tolerance
Security techniques also help maintain availability, such as replication, failover
systems, and distributed backups.