0% found this document useful (0 votes)

4 views9 pages

Unit 1 (Chapter 3) - Big Data Processing

The document discusses Big Data processing, focusing on parallel and distributed data processing methods, including technologies like Hadoop and MapReduce. It explains how large datasets can be efficiently processed by dividing them into smaller tasks that are handled simultaneously by multiple processors or machines. Key concepts such as batch processing, real-time processing, and the components of Hadoop are also outlined, emphasizing their importance in handling vast amounts of data.

Uploaded by

Prasad Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views9 pages

Unit 1 (Chapter 3) - Big Data Processing

Uploaded by

Prasad Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Big Data Analytics – Unit 1 (Chapter – 3: Big Data processing) 1

Big Data Analytics

Unit-1 (Chapter - 3: Big Data Processing)
• Parallel Data Processing,
• Distributed Data Processing,
• Hadoop,
• MapReduce

Dear students, in today’s world we deal with a huge amount of data. This is not something new, but
the way we handle large amounts of data has changed over time.

Currently, we Break Large Data into Smaller Parts

Imagine you have a big book that needs to be read quickly. Instead of one person reading the whole
book, you divide it into chapters and give them to different people to read at the same time. This
makes the process much faster.

Similarly, in data processing:

❖ A data warehouse is like a library storing a huge collection of books (large data).
❖ A data mart is like a smaller bookshelf that contains books on a specific subject (a smaller
part of the large data).
❖ Dividing data into smaller parts helps in processing it faster.

Traditional databases store all the data in one central place and process it there (like a single computer
doing all the work). But in Big Data, information is stored in different locations, and multiple computers
process it at the same time. This is called distributed processing and makes handling huge amounts of
data much faster.

Dear students, here we have 2 types of Data Processing in Big Data, such as Batch Processing and Real-
time Processing
1. Batch Processing – Some data is collected and processed together at a later time.
2. Real-time Processing – Some data keeps coming continuously and needs to be processed
immediately.
For real-time data, computers use in-memory storage (storing data in IMDG [RAM] instead of a hard
disk) to process data quickly. This helps in making instant decisions, such as detecting fraud in online
transactions or predicting weather changes.

In order to follow Big Data Processing, we need to follow the principles such as Speed, Consistency and
Volume.

To further the discussion of Big Data processing, each of the following concepts will be examined in
turn:
• Parallel data processing,
• Distributed data processing,
• Hadoop,
• MapReduce.

Prof. Prasad Patil,

Department of Computer Applications,
KLE Tech University, Belagavi.
Big Data Analytics – Unit 1 (Chapter – 3: Big Data processing) 2

Parallel Data Processing:

Parallel Data Processing is a computing method where multiple processors work on different
parts of a large dataset simultaneously to complete a task faster and more efficiently.
For example: In general, imagine you have 1000 exam papers to check. If only one teacher checks all
the papers, it will take a long time. But if 10 teachers check 100 papers each at the same time, the work
will be finished much faster.
Similarly, this is exactly how Parallel Data Processing works. Instead of one processor doing all
the work, multiple processors process different parts of the data at the same time to speed up the task.
Parallel data processing means splitting a large task into smaller parts and working on them at
the same time using multiple processors. This speeds up the overall process and makes it more
efficient.

Above figure shows that a task can be divided into 3 sub-tasks that are executed in parallel on 3 different processors within
the same machine.

❖ Parallel processing works on the principle of divide and conquer algorithm

❖ Divide and Conquer Algorithm is a problem-solving technique used to solve problems by
dividing the main problem into subproblems, solving them individually and then merging them
to find solution to the original problem. Divide and Conquer is mainly useful when we divide a
problem into independent subproblems.

Prof. Prasad Patil,

Department of Computer Applications,
KLE Tech University, Belagavi.
Big Data Analytics – Unit 1 (Chapter – 3: Big Data processing) 3

Working of Parallel Processing:

1. Task Division:
A large problem is broken down into smaller tasks that can be executed simultaneously.
2. Assigning Tasks to Processors:
These smaller tasks are distributed among multiple processors. Each processor may have its
own memory or share a common memory.
3. Simultaneous Execution:
All processors execute their assigned tasks at the same time. Depending on the type of parallel
processing, they may follow the same or different instructions.
4. Data Exchange & Synchronization:
If needed, processors communicate and share intermediate results. Synchronization ensures
that tasks complete in the correct order.
5. Combining Results:
Once all tasks are finished, the outputs from different processors are combined to get the final
result.

Benefits of Parallel Processing:

1. Faster Execution:
Multiple processors work together, reducing the time needed to complete tasks.
2. Efficient Resource Utilization:
Uses multiple processors effectively, preventing them from sitting idle.
3. Handles Large Data Sets:
Useful for processing massive amounts of data, like big data analytics and AI training.
4. Improved Performance in Complex Tasks:
Ideal for simulations, scientific computations, and real-time applications

Types of parallel processing:

There are 4 types of parallel processing, such as
1. Single instruction single data (SISD)
2. Single instruction multiple data (SIMD)
3. Multiple instruction single data (MISD)
4. Multiple instruction multiple data (MIMD)

1. Single instruction single data (SISD):

❖ It is a Uniprocessor machine capable of executing single instruction to be operating on single
data stream,
❖ In this type, the instructions are processed sequentially. So, these computers are also called
sequential computers,

Prof. Prasad Patil,

Department of Computer Applications,
KLE Tech University, Belagavi.
Big Data Analytics – Unit 1 (Chapter – 3: Big Data processing) 4

2. Single instruction multiple data (SIMD):

❖ It is a multi-processor system machine, which is capable of executing the same instruction on
all the CPU's.
❖ Here different set of data is inputted to the CPU to be operated by single instruction.

3. Multiple instruction single data (MISD):

It is a multiprocessor capable of executing different instructions on different CPUs but all of them
operating on the same dataset.

4. Multiple instruction multiple data (MIMD):

❖ It is a multi-processor system, capable of executing multiple instructions on multiple datasets.
❖ Each processor in MIMD model have separate instructions and separate data input.

Prof. Prasad Patil,

Department of Computer Applications,
KLE Tech University, Belagavi.
Big Data Analytics – Unit 1 (Chapter – 3: Big Data processing) 5

Distributed Data Processing:

Distributed Processing refers to a computing approach where a task is divided into smaller sub-tasks,
which are executed across multiple physically separate machines (nodes) connected via a network.
These machines work together to complete the task efficiently.
Distributed data processing is closely related to parallel data processing in that the same principle of
“divide-and-conquer” is applied. However, distributed data processing is always achieved through
physically separate machines that are networked together as a cluster. In below figure a task is
divided into 3 sub-tasks that are then executed on 3 different machines sharing one physical switch.

Key Characteristics of Distributed Processing:

❖ Scalability: Distributed systems can easily scale by adding more nodes to handle increased
workloads, accommodating growth without performance degradation.
❖ Multiple Machines (Nodes): Unlike parallel processing (which uses multiple cores within a
single machine), distributed processing involves multiple computers working together.
❖ Fault Tolerance: The system can continue operating even if one or more nodes fail, with
redundancy and data replication ensuring that a failure doesn't impact overall functionality.
❖ Concurrency: Multiple nodes can operate simultaneously, performing tasks in parallel,
improving system efficiency and speed.
❖ Resource Sharing: Nodes can share resources such as processing power, storage, and data,
allowing for efficient utilization of resources across the system.
❖ Transparency: Distributed systems aim to hide the complexity of multiple machines from the
end-user, providing a unified interface as if the system were running on a single computer.
❖ Heterogeneity: Distributed systems can consist of different types of machines, operating
systems, or networks, requiring seamless handling of this diversity.
❖ Openness: The system should be designed to be easily extended and improved, with software
developed and shared openly.
❖ Reliability: Failure of a single node doesn't halt the whole system, which can reassign tasks to
other nodes.

Prof. Prasad Patil,

Department of Computer Applications,
KLE Tech University, Belagavi.
Big Data Analytics – Unit 1 (Chapter – 3: Big Data processing) 6

Example of Distributed Processing: Imagine processing a large dataset in a big data environment
❖ A large dataset is divided into smaller chunks.
❖ Each chunk is processed by a separate machine.
❖ The results from all machines are combined to get the final output.

Types of Distributed Data Processing

Distributed data processing can be categorized based on how data is processed and the nature of
tasks performed. Here are the main types:
1. Batch Processing
2. Stream processing (Real-time processing)

1. Batch Processing
Definition: Batch processing is a type of distributed data processing where large volumes of data
are collected, stored & processed in groups (batches) at scheduled intervals, rather than in real-
time. This method is efficient for handling vast amounts of data that do not require immediate
responses.
Example Technologies: Hadoop (MapReduce), Apache Spark (Batch Mode)
Applications:
❖ Data Warehousing
❖ Periodic Reports (e.g., daily sales reports)
❖ Log Processing (e.g., analyzing server logs)

2. Stream Processing (Real-Time Processing)

Definition: Stream processing, also known as real-time processing, is a distributed data processing
technique where continuous data streams are processed as they arrive. Unlike batch processing,
which handles data in fixed intervals, stream processing processes data in real time, making it
ideal for applications that require immediate insights and actions.
Example Technologies: Apache Kafka, Apache Flink, Apache Spark Streaming, Google Dataflow
Applications:
❖ Fraud Detection in Banking
❖ Real-Time Analytics (e.g., website traffic monitoring)

Working of Distributed Data Processing

1. Data Partitioning
❖ Large datasets are divided into smaller chunks (partitions).
❖ These partitions are distributed across multiple nodes.
2. Parallel Processing
❖ Each node processes its assigned data independently.
❖ This parallelism improves performance and reduces processing time.
3. Distributed Storage
❖ Data is stored across multiple nodes using distributed file systems like HDFS.
❖ Ensures fault tolerance and scalability.
4. Task Scheduling & Execution
❖ A central coordinator assigns tasks to worker nodes.
❖ Worker nodes execute tasks and return results.
5. Aggregation & Final Output
❖ Partial results from worker nodes are combined to produce the final result.
❖ This is done using reducers in Hadoop or actions in Spark.

Prof. Prasad Patil,

Department of Computer Applications,
KLE Tech University, Belagavi.
Big Data Analytics – Unit 1 (Chapter – 3: Big Data processing) 7

Hadoop
Apache Hadoop is an open-source framework for distributed data storage and processing. It is
designed to handle large-scale datasets across a cluster of machines using a parallel computing
model. Hadoop enables efficient batch processing of big data, making it a fundamental technology in
data engineering and analytics.

Core Components of Hadoop:

1. Hadoop Distributed File System (HDFS)
❖ A fault-tolerant, distributed storage system that splits large files into smaller blocks
and distributes them across multiple nodes.
❖ Ensures data reliability through replication (typically three copies per block).
2. MapReduce
❖ A programming model for parallel data processing.
❖ Map Phase: Splits and processes data in parallel across multiple nodes.
❖ Reduce Phase: Aggregates results from the Map phase to produce the final output.
3. Yet Another Resource Negotiator (YARN)
❖ Manages cluster resources and job scheduling.
❖ Allocates CPU, memory, and processing power to tasks dynamically.
4. Hadoop Common
❖ A set of shared utilities and libraries that support other Hadoop components.

Working of Hadoop
1. Data Storage: Large files are divided into blocks (e.g., 128 MB or 256 MB) and stored across
HDFS nodes.
2. Job Submission: A user submits a job using MapReduce or another framework (e.g., Spark).
3. Processing:
❖ The Map task processes chunks of data in parallel.
❖ The Reduce task aggregates and processes the results.
4. Output Generation: The final results are stored back in HDFS or another storage system.

Key Features of Hadoop:

❖ Scalability – Can handle petabytes of data by adding more nodes.
❖ Fault Tolerance – Data replication ensures reliability even if nodes fail.
❖ Cost-Effective – Uses commodity hardware to store and process data.
❖ Parallel Processing – Distributes tasks across multiple nodes for efficiency.
❖ Support for Multiple Processing Frameworks – Works with Apache Spark, Hive, and Pig for
flexible data processing.

Applications of Hadoop:
❖ Big Data Analytics – Processing large datasets for insights (e.g., customer behavior
analysis).
❖ Data Warehousing – Storing and managing structured and unstructured data.
❖ Log Analysis – Processing server logs for system monitoring.
❖ ETL Pipelines – Extracting, transforming, and loading large datasets for business
intelligence.

Prof. Prasad Patil,

Department of Computer Applications,
KLE Tech University, Belagavi.
Big Data Analytics – Unit 1 (Chapter – 3: Big Data processing) 8

MapReduce:
MapReduce is a programming model used for processing and generating large datasets in a
distributed and parallel manner.
❖ MapReduce is the component of Hadoop,
❖ MapReduce is used to deal with the processing with the large dataset,
❖ MapReduce has 2 functions such as Map() function and Reduce() function,
❖ Map() Function is used to processes input data and generates intermediate key-value pairs,
❖ Reduce Function is used to processes the intermediate results to produce the final output,
❖ MapReduce Used in Big Data Processing, Log Analysis, Machine Learning, etc.,

How MapReduce Works?

Map Stage
➢ Input data is divided into smaller chunks or blocks.
➢ Several worker nodes work in parallel to process each chunk independently.
➢ A "Map" function is applied to each data chunk, generating intermediate key-value pairs.
➢ The Map function's goal is to extract relevant information from the input data and prepare it
for further processing.
Reduce Stage
➢ After the Map stage, the intermediate key-value pairs are grouped by key.
➢ The grouped key-value pairs are then shuffled and sorted based on their keys.
➢ The purpose of the shuffle and sort phase is to bring together all the intermediate values
associated with the same key & make them available to the corresponding Reduce function.
➢ Once the shuffling and sorting are complete, a "Reduce" function is applied to perform
aggregation, analysis, or other computations on the grouped data.
➢ The output is a set of final key-value pairs from the computation.

Prof. Prasad Patil,

Department of Computer Applications,
KLE Tech University, Belagavi.
Big Data Analytics – Unit 1 (Chapter – 3: Big Data processing) 9

MapReduce – Word Count Example

Prof. Prasad Patil,

Department of Computer Applications,
KLE Tech University, Belagavi.

Communication Device For The Visual and Hearing Impaired Persons To Convert Braille
No ratings yet
Communication Device For The Visual and Hearing Impaired Persons To Convert Braille
6 pages
2VAA005340 en D Symphony Plus Product Catalog PDF
No ratings yet
2VAA005340 en D Symphony Plus Product Catalog PDF
148 pages
Data Processing 1
100% (1)
Data Processing 1
5 pages
Distributed Data Processing Concepts and Self-Learning
No ratings yet
Distributed Data Processing Concepts and Self-Learning
11 pages
Computer Lecture 7 8
No ratings yet
Computer Lecture 7 8
11 pages
Agenda: Big Data Systems
No ratings yet
Agenda: Big Data Systems
29 pages
Document 15
No ratings yet
Document 15
15 pages
Types of Data Processing
No ratings yet
Types of Data Processing
3 pages
Articol Disteibuted Data Processing
No ratings yet
Articol Disteibuted Data Processing
9 pages
Define Data Processing? Answer 2. What Are Data Processing Activities? Answer
No ratings yet
Define Data Processing? Answer 2. What Are Data Processing Activities? Answer
4 pages
Big Data Unit5
No ratings yet
Big Data Unit5
57 pages
Austin Okoth Omondi EMAQ/01261/2020 Computer Science Assignment
No ratings yet
Austin Okoth Omondi EMAQ/01261/2020 Computer Science Assignment
4 pages
Parallel vs Distributed Computing
No ratings yet
Parallel vs Distributed Computing
70 pages
Lecture 3
No ratings yet
Lecture 3
49 pages
Big Data Processing Concepts
No ratings yet
Big Data Processing Concepts
9 pages
Electronic Data Processing
No ratings yet
Electronic Data Processing
11 pages
Data Processing
No ratings yet
Data Processing
3 pages
Cloud Computing Lecture3
No ratings yet
Cloud Computing Lecture3
50 pages
Parallel VS Distributed Computing
No ratings yet
Parallel VS Distributed Computing
9 pages
BDA Answer Bank
No ratings yet
BDA Answer Bank
24 pages
Data Processing in Data Mining
No ratings yet
Data Processing in Data Mining
11 pages
Module 1-BDA
No ratings yet
Module 1-BDA
82 pages
Chapter 3 - 大数据管理
No ratings yet
Chapter 3 - 大数据管理
38 pages
CC Unit 1.2
No ratings yet
CC Unit 1.2
39 pages
Ch03 - Big Data Processing
No ratings yet
Ch03 - Big Data Processing
31 pages
Parallel Processing) ')
No ratings yet
Parallel Processing) ')
4 pages
Parallel Computing Essentials
No ratings yet
Parallel Computing Essentials
67 pages
Course Outline and Introduction
No ratings yet
Course Outline and Introduction
37 pages
Distributed Database System (KCA045)
No ratings yet
Distributed Database System (KCA045)
9 pages
Big Data Analytics - Chapter 4
No ratings yet
Big Data Analytics - Chapter 4
22 pages
An Approach To Parallel Processing: Yashraj Rai Puja Padiya
No ratings yet
An Approach To Parallel Processing: Yashraj Rai Puja Padiya
3 pages
Computer Architecture 4
No ratings yet
Computer Architecture 4
6 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Cloud Computing: Mr. Ajay B. Kapase
No ratings yet
Cloud Computing: Mr. Ajay B. Kapase
20 pages
Introduction To Data Processing
No ratings yet
Introduction To Data Processing
6 pages
Data Processing
No ratings yet
Data Processing
10 pages
Data Processing for ICT Students
No ratings yet
Data Processing for ICT Students
16 pages
Big Data Analytics Module 1
No ratings yet
Big Data Analytics Module 1
21 pages
Data Processing by Application Type
No ratings yet
Data Processing by Application Type
17 pages
Chapter 6 Parallel and Concurrent Computing
No ratings yet
Chapter 6 Parallel and Concurrent Computing
27 pages
COA Unit - 4
No ratings yet
COA Unit - 4
31 pages
Ecs765p W1
No ratings yet
Ecs765p W1
39 pages
Parallel Processingpipeliningarithmetic Pipelineand Instruction Pipeline
No ratings yet
Parallel Processingpipeliningarithmetic Pipelineand Instruction Pipeline
36 pages
Unit 4 COA
No ratings yet
Unit 4 COA
8 pages
Terminal Remote Company Interface Internet Website Product Orders Dealing Payments Offers Savings Efficiency Business Sales Operations System Terms
No ratings yet
Terminal Remote Company Interface Internet Website Product Orders Dealing Payments Offers Savings Efficiency Business Sales Operations System Terms
4 pages
DWDM Article
No ratings yet
DWDM Article
3 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
4 Data Processing
No ratings yet
4 Data Processing
7 pages
Computer Processing Techniques
No ratings yet
Computer Processing Techniques
25 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Pipeliningandvectorprocessing 140612142847 Phpapp01
No ratings yet
Pipeliningandvectorprocessing 140612142847 Phpapp01
53 pages
J Ijdsa 20241005 11
No ratings yet
J Ijdsa 20241005 11
14 pages
Information Technology Infrastructure Module 5
No ratings yet
Information Technology Infrastructure Module 5
23 pages
Big Data Analytics Unit 3
No ratings yet
Big Data Analytics Unit 3
9 pages
Database As Information System
No ratings yet
Database As Information System
76 pages
Module 3
No ratings yet
Module 3
12 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
Cloud Computing - Lecture 3
No ratings yet
Cloud Computing - Lecture 3
22 pages
??? ? (Jhed)
No ratings yet
??? ? (Jhed)
5 pages
Cyber Security Unit-1 (CH - 2) Notes
No ratings yet
Cyber Security Unit-1 (CH - 2) Notes
23 pages
DMS Assignment
No ratings yet
DMS Assignment
2 pages
Unit 2 (Chapter 5) - Big Data Technologies
No ratings yet
Unit 2 (Chapter 5) - Big Data Technologies
9 pages
Unit 1 (Chapter 2) - Big Data Storage
No ratings yet
Unit 1 (Chapter 2) - Big Data Storage
34 pages
Unit 2 (Chapter 4) - Big Data Modelling
No ratings yet
Unit 2 (Chapter 4) - Big Data Modelling
9 pages
Unit 1 (Chapter 1) - Introduction
No ratings yet
Unit 1 (Chapter 1) - Introduction
10 pages
Exam Application
No ratings yet
Exam Application
1 page
MikroTik VPN - Back To Home - RouterOS - MikroTik Documentation
No ratings yet
MikroTik VPN - Back To Home - RouterOS - MikroTik Documentation
4 pages
Importance of Software Testing in Software Development Life Cycle
No ratings yet
Importance of Software Testing in Software Development Life Cycle
4 pages
JazzHR Automate Recruitment PDF
No ratings yet
JazzHR Automate Recruitment PDF
12 pages
B. Sc. H Computer S 62VSRUe
No ratings yet
B. Sc. H Computer S 62VSRUe
6 pages
I400 WBF Software For Continuous Dosing
100% (1)
I400 WBF Software For Continuous Dosing
2 pages
(Hasan Et Al., 2024) UI-UX Design Impact On E-Commerce Attracting Users
100% (1)
(Hasan Et Al., 2024) UI-UX Design Impact On E-Commerce Attracting Users
8 pages
DSP Lab: Sampling & Aliasing
No ratings yet
DSP Lab: Sampling & Aliasing
5 pages
XSpider en
No ratings yet
XSpider en
226 pages
Student Feedback Form
No ratings yet
Student Feedback Form
54 pages
Aos Question Bank
No ratings yet
Aos Question Bank
12 pages
Transportation Engineering
50% (2)
Transportation Engineering
2 pages
Course Information Sheet (Theory Based Course)
No ratings yet
Course Information Sheet (Theory Based Course)
5 pages
FJP13007 High Voltage Fast-Switching NPN Power Transistor: Features
No ratings yet
FJP13007 High Voltage Fast-Switching NPN Power Transistor: Features
6 pages
1
No ratings yet
1
103 pages
Detailed Drawing Exercises: Solidworks Education
No ratings yet
Detailed Drawing Exercises: Solidworks Education
51 pages
Eloma Joker T Combination Oven
No ratings yet
Eloma Joker T Combination Oven
48 pages
Torque-Hose Clamps
No ratings yet
Torque-Hose Clamps
3 pages
Logetgfgfg
No ratings yet
Logetgfgfg
14 pages
Lync 2013 Debugging Guide
No ratings yet
Lync 2013 Debugging Guide
13 pages
Otc 31913 Ms
No ratings yet
Otc 31913 Ms
16 pages
Sony NW-A1000 Service Manual v1.0 2005
No ratings yet
Sony NW-A1000 Service Manual v1.0 2005
58 pages
RSW Arc 155
No ratings yet
RSW Arc 155
10 pages
CPD20/25/30/35ESL: Efficiency Rules The Future
No ratings yet
CPD20/25/30/35ESL: Efficiency Rules The Future
2 pages
E-Content Development Presentation
No ratings yet
E-Content Development Presentation
26 pages
Crop 4679 Stics of The Agricultural Environment Using Various Feature Selection Techniques and Classifiers
No ratings yet
Crop 4679 Stics of The Agricultural Environment Using Various Feature Selection Techniques and Classifiers
6 pages
Aptitude Test: Paw: Cat:: Hoof: ?
No ratings yet
Aptitude Test: Paw: Cat:: Hoof: ?
15 pages
Ebook Interactive Technical Documents Tcm1023-256256
No ratings yet
Ebook Interactive Technical Documents Tcm1023-256256
15 pages

Unit 1 (Chapter 3) - Big Data Processing

Uploaded by

Unit 1 (Chapter 3) - Big Data Processing

Uploaded by

Big Data Analytics – Unit 1 (Chapter – 3: Big Data processing) 1

Big Data Analytics

Currently, we Break Large Data into Smaller Parts

Similarly, in data processing:

Prof. Prasad Patil,

Parallel Data Processing:

❖ Parallel processing works on the principle of divide and conquer algorithm

Prof. Prasad Patil,

Working of Parallel Processing:

Benefits of Parallel Processing:

Types of parallel processing:

1. Single instruction single data (SISD):

Prof. Prasad Patil,

2. Single instruction multiple data (SIMD):

3. Multiple instruction single data (MISD):

4. Multiple instruction multiple data (MIMD):

Prof. Prasad Patil,

Distributed Data Processing:

Key Characteristics of Distributed Processing:

Prof. Prasad Patil,

Types of Distributed Data Processing

2. Stream Processing (Real-Time Processing)

Working of Distributed Data Processing

Prof. Prasad Patil,

Core Components of Hadoop:

Key Features of Hadoop:

Prof. Prasad Patil,

How MapReduce Works?

Prof. Prasad Patil,

MapReduce – Word Count Example

Prof. Prasad Patil,

You might also like