Module 5
Data Analytics for IOT
Syllabus
Introduction
Apache Hadoop
Using Hadoop Map Reduce for Batch Data Analysis
Apache Oozie
Apache Spark
Apache Storm
Using Apache Storm for Real-time Data Analysis
Structural Health Monitoring Case Study.
Apache Hadoop
It is a open source framework for distributed batch
processing of big data.
MapReduce is a parallel programming model suitable
for analysis of big data
◦ Map Reducing Programming Model
◦ Hadoop MapReduce Job Execution.
◦ MapReduce Job Execution Overflow
Hadoop Cluster Setup
Hadoop filesystem HDFS is highly fault
tolerant.
Preferred OS- LINUX
OTHER OS_ Cygwin Environment.
STEPS INVOVLVED IN SETTTING UP A
HADOOP CLUSTER:
Install Java.
Install Hadoop.
Configure Network.
Configure Hadoop
Starting and Stopping Hadoop Cluster
Install Java
Hadoop requires Java 6 or later version.
Commands for installing java:
#set the properties
sudo apt-get –q –y install python-software-properties
sudo add-apt-repository –y ppa:webupd8team/java
sudo apt-get –q –y update
#state that you accepted the license
echo debconf shared/accepted-oracle-license-v1-1 select true |
sudo debconf-set-selections
echo debconf shared/accepted-oracle-license-v1-1 seen true |
sudo debconf-set-selections
Using Hadoop Map Reduce for Batch
Data Analysis
Apache Oozie
Apache Oozie is a scheduler system to manage &
execute Hadoop jobs in a distributed environment.
Apache Oozie is a workflow scheduler for Hadoop. It
is a system which runs the workflow of dependent jobs.
Here, users are permitted to create Directed Acyclic
Graphs of workflows, which can be run in parallel and
sequentially in Hadoop.
Apache Spark
Apache Storm
Using Apache Storm for Real-time Data
Analysis
Structural Health Monitoring Case
Study.