Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
17 views24 pages

BA Intro Bigdata Analytics

Uploaded by

Abhiram Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views24 pages

BA Intro Bigdata Analytics

Uploaded by

Abhiram Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

BIG DATA ANALYTICS

Contents
• What is Big Data
• C/C’s of Big Data
• Structure of Big Data
• Big Data approaches
• Issues in legacy systems
• Hadoop
• Big Data analytics
• Types of Big Data Analytics
• Analytics life cycle
• Analytics tools
What is Big Data!
“Big Data” is similar to “small data”, but bigger in size. having data
bigger it requires different approaches, Techniques, tools, frameworks
and architecture
It is the Technology which deals with large and complex dataset which
are varied in data format and structures, does not fit into the memory.
Big Data generates value from the storage and processing of very large
quantities of digital information that cannot be analyzed with
traditional computing techniques and finds new insight into the
existing data and guidelines to capture and analyze future data
Every minute we send 204 million emails, generate 1.8 million Facebook
likes, send 278 thousand Tweets, and up-load 200,000 photos to
Facebook
What is Big Data!
Old Model: Few companies are generating data, all others are
consuming data

New Model: all of us are generating data, and all of us are consuming
data
Characteristic's of Big Data 5Vs

Volume Velocity Variety Veracity Value

Data at Data in Data in Data Data in map


Scale motion many forms uncertainty reduce
Characteristic's of Big Data 5Vs
1st Volume
..refers to the vast amounts of data generated every second. We are not
talking Terabytes but Zettabytes.

..If we take all the data generated in the world between the beginning of
time and 2000, the same amount of data will soon be generated every
minute
..44x increase from 2009 to 2020 From
0.8 zettabytes to 35zb
Characteristic's of Big Data 5Vs
2nd Velocity
..refers to the speed at which new data is generated and the speed at which data
moves around.
Just think of:
• Social media messages going viral in seconds
• High-frequency stock trading algorithms reflect market changes
within microseconds
• Machine to machine processes exchange data between billions of
devices infrastructure and sensors generate massive log data in
realtime
• On-line gaming systems support millions of concurrent users, each
producing multiple inputs per second.
Characteristic's of Big Data 5Vs
3rd Variety
..refers to the different types of data we can now use. In the past we
only focused on structured data that neatly fitted into tables or
relational databases, such as financial data. In fact, 80% of the world’s
data is unstructured (text, images, video, voice, etc.)
Characteristic's of Big Data 5Vs
4th Veracity
..refers to the messiness or trustworthiness of the data. With many
forms of big data quality and accuracy are less controllable (just think
of Twitter posts with hash tags, abbreviations, typos and colloquial
speech as well as the reliability and accuracy of content
Characteristic's of Big Data 5Vs
5th Value
..refers to how data is useful for us, we have access to big data but
unless we can turn it into value it is useless. It can be easily established
that ‘value’ is the most important V of Big Data Map Reduce
Map Reduce is a processing technique for distributed computing based
on Java.
Hadoop is the most popular implementation of Map Reduce
because of ease of availability as it is an entirely open source
platform for handling Big Data.
Structure of Big Data
Structured:
• Most traditional data sources

Semi-structured:
• Many sources of big data

Unstructured:
• Video data, audio data
Big Data approach (platform)
• Process any type of Data
(Structured, unstructured or semi)
• Built for purpose engines
(Designed to handle different requirements)
• Manage and govern Data in the ecosystem
• Enterprise Data integration
• Grow and evolve on current infrastructure
Issues in legacy systems
• Limited Storage Capacity
• Limited Processing Capacity
• No Scalability
• Single point of Failure
• Sequential Processing
• RDBMSs can handle Structured Data
• Requires preprocessing of Data
• Information is collected according to
• current business needs
says he has a solution to our BIG problem !

Apache Hadoop: Is A Framework That Allows For The Distributed Processing Of Large
Datasets Across Clusters Of Commodity Computers Using A Simple Programming Model.
Hadoop Approach
HDFS(Hadoop Distributed File System)
• Highly Fault tolerant , distributed ,
reliable , scalable file system for
data storage.
• Stores multiple copies of data on
different nodes
• A File is split up into blocks and
stored on multiple machines
• Hadoop cluster typically has a
single namenode and no. of data
nodes to form a hadoop cluster.
Hadoop Approach
MAP REDUCE
Is a programming model that is simultaneously process and analyzes
huge data sets logically into separate clusters , while Map sorts the
data, Reduce segregates in to logical clusters, thus removing ‘bad’ data
and retaining the necessary information
ANALYTICS IS IN YOUR BLOOD

Do you realize that you do analytics everyday?


I need to go to campus faster!
Hmm.. Looking at the sky today, I think it’ll be rain
Based on my mid term and assignment score, I need to get at least 80
in my final exam to pass this course
I stalked her social media. I think she is single because most of her
post only about food :p
Big Data Analytics
• Examining large amount of data
• Appropriate information
• Identification of hidden patterns, unknown correlations
• Competitive advantage
• Better business decisions: strategic and operational
• Effective marketing, customer satisfaction, increased
• revenue

Traffic Control Quality Search


Smarter
Healthcare Manufacturing
Types of Big Data Analytics
Descriptive statistics is the term given to the analysis of data that helps
describe, show or summarize data in a meaningful way such that, for
example, patterns might emerge from the data.
i.e: Fulan only post his activity on Facebook at weekend

Predictive analytics is the branch of data mining concerned with the


prediction of future probabilities and trends.
i.e: Fulan should be has a job. Because he always left home at 7 in the
morning and get back at 6 afternoon
types of predictive analytics:
• Supervised analytics is when we know the truth about something in the
past
• Unsupervised analytics is when we don’t know the truth about something
in the past. The result is segment that we need to interpret
Analytics Life Cycle
Analytics Life Cycle
Analytics Tools
Thank You

You might also like