0% found this document useful (0 votes)

23 views65 pages

CIS721 - Big Data Introduction

Uploaded by

لنا محمود ملكاوي

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views65 pages

CIS721 - Big Data Introduction

Uploaded by

لنا محمود ملكاوي

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 65

BIG DATA

MANAGEMENT
Mohammed Shatnawi
Background Information
 The amount of data produced by mankind
is growing rapidly due to:
 advent of new technologies and devices.
 communication means
• The amount of data produced by us
from the beginning of time till 2003
was 5 billion GB.
• The same amount was created in
every two days in 2011,
• and in every ten minutes in 2013.
?What is Big Data

 Big data is a collection of large datasets

that cannot be processed using
traditional computing techniques.

 It is not a single technique or a tool.

?What Comes Under Big Data
 Big data involves the data produced by different devices and
applications:
 Black Box Data − It is a component of helicopter, airplanes, and jets, etc.

 Social Media Data − Social media such as Facebook and Twitter hold
information and the views posted by millions of people across the globe.

 Stock Exchange Data − The stock exchange data holds information about
the ‘buy’ and ‘sell’ decisions.

 Power Grid Data − The power grid data holds information consumed by a
particular node with respect to a base station.

 Transport Data − Transport data includes model, capacity, distance and

availability of a vehicle.

 Search Engine Data − Search engines retrieve lots of data from different
databases.
Types of Data
 Structured data − Relational data.

 Semi Structured data − XML data.

 Unstructured data − Word, PDF, Text,

Media Logs.
Benefits of Big Data

 the marketing agencies learn about the

response for their campaigns,
promotions, and other advertising
mediums.

 production planning.

 better and quick service.

Big Data Technologies

 Operational Big Data: This include systems like

MongoDB that provide operational capabilities for real-
time, interactive workloads where data is primarily
captured and stored.

 Analytical Big Data: These includes systems like

Massively Parallel Processing (MPP) database systems and
MapReduce that provide analytical capabilities
Operational vs. Analytical Systems
Big Data Challenges

The major challenges associated with big

data are as follows :−
 Capturing data
 Curation
 Storage
 Searching
 Sharing
 Transfer
 Analysis
 Presentation
Traditional Approach
Limitation of Traditional Approach

 Works fine with those applications that

process less voluminous data.

 or up to the limit of the processor that is

processing the data.
Google’s Solution
Hadoop

 Using the solution provided by

Google, Doug Cutting and his team
developed an Open Source Project
called HADOOP.
 Hadoop runs applications using the
MapReduce algorithm,
 Data is processed in parallel with others.
 In short, Hadoop is used to develop
applications that could perform complete
statistical analysis on huge amounts of
data.
Hadoop
Hadoop Architecture

 At its core, Hadoop has two major layers

namely:-
 Processing/Computation layer (MapReduce),
and
 Storage layer (Hadoop Distributed File
System).
Hadoop
MapReduce

 MapReduce is a parallel programming

model for writing distributed applications
 Devised at Google for efficient processing
of large amounts of data (multi-terabyte
data-sets), on large clusters (thousands
of nodes) of commodity hardware in a
reliable, fault-tolerant manner.
 The MapReduce program runs on Hadoop
which is an Apache open-source
framework.
Hadoop Distributed File System

 The Hadoop Distributed File System (HDFS) is based

on the Google File System (GFS)
 Provides a distributed file system that is designed
to run on commodity hardware.
 It has many similarities with existing distributed file
systems.
 However, the differences from other distributed file
systems are significant. It is highly fault-tolerant
and is designed to be deployed on low-cost
hardware.
 It provides high throughput access to application
data and is suitable for applications having large
datasets.
Hadoop Models
Hadoop framework also includes the 
− following two modules
Hadoop Common − These are Java 
libraries and utilities required by other
.Hadoop modules
Hadoop YARN − This is a framework for 

job scheduling and cluster resource

.management

Big Data Complete Notes
No ratings yet
Big Data Complete Notes
33 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
BDA Unit-1
No ratings yet
BDA Unit-1
33 pages
Big Data Analytics - Overview
No ratings yet
Big Data Analytics - Overview
66 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
Bda Unit 1 - Mam
No ratings yet
Bda Unit 1 - Mam
198 pages
Big Data Module 1,2,3
No ratings yet
Big Data Module 1,2,3
59 pages
Chap 1
No ratings yet
Chap 1
41 pages
BigData Unit1
No ratings yet
BigData Unit1
74 pages
Big Data Unit 1 AKTU Notes
100% (1)
Big Data Unit 1 AKTU Notes
87 pages
Big Data Analytics Course Guide
No ratings yet
Big Data Analytics Course Guide
31 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
32 pages
Unit 1 BDA
No ratings yet
Unit 1 BDA
43 pages
Hadoop PPT
100% (1)
Hadoop PPT
25 pages
CIS721 - Big Data Introduction
No ratings yet
CIS721 - Big Data Introduction
64 pages
Hadoop Ecosystem Overview
No ratings yet
Hadoop Ecosystem Overview
229 pages
Big Data Basics for IT Professionals
No ratings yet
Big Data Basics for IT Professionals
108 pages
Chapter - 2 Hadoop
100% (1)
Chapter - 2 Hadoop
32 pages
Unit1 - BDH
No ratings yet
Unit1 - BDH
77 pages
Big Data Analytics - Lecture Slides
No ratings yet
Big Data Analytics - Lecture Slides
72 pages
Big Data-2
No ratings yet
Big Data-2
40 pages
Data Science
No ratings yet
Data Science
87 pages
Lecture 4
No ratings yet
Lecture 4
32 pages
Historian B0193YL J
No ratings yet
Historian B0193YL J
336 pages
Data Science
No ratings yet
Data Science
54 pages
Big Data Overview
No ratings yet
Big Data Overview
18 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
51 pages
Module 1
No ratings yet
Module 1
54 pages
IDP - Employee Central Core Hybrid - Employee Identifiers V1.7
100% (1)
IDP - Employee Central Core Hybrid - Employee Identifiers V1.7
22 pages
The Developer S Guide To Chatgpt Enhancing Your Skills With Ai
100% (1)
The Developer S Guide To Chatgpt Enhancing Your Skills With Ai
41 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
Big Data A Comprehensive Overview
No ratings yet
Big Data A Comprehensive Overview
25 pages
BigDataAnalytics 1.2
No ratings yet
BigDataAnalytics 1.2
25 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Big Data
No ratings yet
Big Data
25 pages
Biggdata
No ratings yet
Biggdata
24 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Da Unit - I - Notes
No ratings yet
Da Unit - I - Notes
30 pages
Ashish Presentation Stage1 Modify LR
No ratings yet
Ashish Presentation Stage1 Modify LR
24 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
School Email List for Marketers
No ratings yet
School Email List for Marketers
2 pages
Digital Forensics - A Intro
100% (1)
Digital Forensics - A Intro
40 pages
Big Data NoSLQ Kopyası
No ratings yet
Big Data NoSLQ Kopyası
51 pages
BDA - Lecture 3
100% (1)
BDA - Lecture 3
17 pages
Data Science Essentials & Big Data Concepts
No ratings yet
Data Science Essentials & Big Data Concepts
20 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
Big Data Streams Analytics: Challenges, Analysis, and Applications
No ratings yet
Big Data Streams Analytics: Challenges, Analysis, and Applications
55 pages
Big Data Distributed Computing
No ratings yet
Big Data Distributed Computing
21 pages
The Growing Enormous of Big Data Storage
No ratings yet
The Growing Enormous of Big Data Storage
6 pages
Big Data: Introduction To Terms, Concepts and Tools
No ratings yet
Big Data: Introduction To Terms, Concepts and Tools
23 pages
Lect 2 Big Data Lesson01
No ratings yet
Lect 2 Big Data Lesson01
26 pages
Unit 1
No ratings yet
Unit 1
11 pages
Unit II Big Data Final PDF
No ratings yet
Unit II Big Data Final PDF
25 pages
Big Data
No ratings yet
Big Data
17 pages
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
No ratings yet
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
30 pages
Guardium Data Encryption Data Sheet
No ratings yet
Guardium Data Encryption Data Sheet
8 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
41 pages
ABAP ALV Grid: Subtotals & Totals
No ratings yet
ABAP ALV Grid: Subtotals & Totals
41 pages
Part 01 - Overview of Big Data
No ratings yet
Part 01 - Overview of Big Data
11 pages
Question Bank For Agri-Informatics
0% (2)
Question Bank For Agri-Informatics
2 pages
Hadoop Quick Guide
No ratings yet
Hadoop Quick Guide
32 pages
SRS For ATM System
No ratings yet
SRS For ATM System
21 pages
Big Data: Challenges and Solutions
No ratings yet
Big Data: Challenges and Solutions
10 pages
Big Data
No ratings yet
Big Data
31 pages
Professional Cloud DevOps Engineer
No ratings yet
Professional Cloud DevOps Engineer
7 pages
SQL Question
No ratings yet
SQL Question
15 pages
BDA - Introduction To Big Data Analytics Part 02
No ratings yet
BDA - Introduction To Big Data Analytics Part 02
13 pages
Contact Management System
No ratings yet
Contact Management System
11 pages
UNIT-V Notes Advance Java
No ratings yet
UNIT-V Notes Advance Java
28 pages
Big Data: Presented By, Nishaa R
No ratings yet
Big Data: Presented By, Nishaa R
24 pages
Course Introduction: Dsecl Zc556 Stream Processing and Analytics Lecture No. 1.0
No ratings yet
Course Introduction: Dsecl Zc556 Stream Processing and Analytics Lecture No. 1.0
52 pages
Cloud Computing Assignment
No ratings yet
Cloud Computing Assignment
17 pages
BioDiversity Pro Notes
No ratings yet
BioDiversity Pro Notes
2 pages
Looker Admin & Developer Guide
No ratings yet
Looker Admin & Developer Guide
16 pages
2 PDF
No ratings yet
2 PDF
105 pages
IBM Watson Analytics Automating Visualization Desc
No ratings yet
IBM Watson Analytics Automating Visualization Desc
12 pages
SMJ Author Instructions January 2022
No ratings yet
SMJ Author Instructions January 2022
17 pages
Arun Resum
No ratings yet
Arun Resum
1 page
Funcational Dependancies and Normalization Lesson-3
No ratings yet
Funcational Dependancies and Normalization Lesson-3
63 pages
Syllabus of MCA - Management - 2020 Patt - Sem III and IV - 13122021
No ratings yet
Syllabus of MCA - Management - 2020 Patt - Sem III and IV - 13122021
23 pages
Dbs 600 Final Database Systems
No ratings yet
Dbs 600 Final Database Systems
10 pages
Chapter 2 Data Merise
No ratings yet
Chapter 2 Data Merise
12 pages
Vmware NSX Intelligence Solution Brief
No ratings yet
Vmware NSX Intelligence Solution Brief
3 pages
sp2 Paper
No ratings yet
sp2 Paper
5 pages
MAZZOCCHI, 2017. Knowledge Organization System (IEKO)
No ratings yet
MAZZOCCHI, 2017. Knowledge Organization System (IEKO)
22 pages
Itc Lab Report 11: Ojective
No ratings yet
Itc Lab Report 11: Ojective
7 pages
Dbms Unit4 Views
No ratings yet
Dbms Unit4 Views
25 pages

CIS721 - Big Data Introduction

Uploaded by

CIS721 - Big Data Introduction

Uploaded by

BIG DATA

 Big data is a collection of large datasets

 It is not a single technique or a tool.

 Transport Data − Transport data includes model, capacity, distance and

 Semi Structured data − XML data.

 Unstructured data − Word, PDF, Text,

 the marketing agencies learn about the

 better and quick service.

 Operational Big Data: This include systems like

 Analytical Big Data: These includes systems like

The major challenges associated with big

 Works fine with those applications that

 or up to the limit of the processor that is

 Using the solution provided by

 At its core, Hadoop has two major layers

 MapReduce is a parallel programming

 The Hadoop Distributed File System (HDFS) is based

job scheduling and cluster resource

You might also like