Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
57 views3 pages

Big Data Framework

BDA syllabus

Uploaded by

lekha.cce
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views3 pages

Big Data Framework

BDA syllabus

Uploaded by

lekha.cce
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

SEMESTER - I

24PBDPC1 BIG DATA L T P C


03 FRAMEWORKS 3 0 0 3
SDG NO. 4 AND
TECHNOLOGIES

OBJECTIVES:
 To understand the need of a framework to store and process the big
data.
 To have knowledge on the Big Data Technologies for processing the
Different types of Data.
 To understand the advanced frame work for faster accessing and
processing of Big Data.
 To integrate wide range of data processing and analysis tools.
UNIT I BIG DATA 9
Understanding Big Data: Concepts and terminology, Big Data
Characteristics, Different types of Data, Identifying Data Characteristics -
Need of big data frameworks -Big Data Architecture - Big Data Storage: File
system and Distributed FileSystem, NoSQL Databases, Sharding, Replication,
Sharding and Replication.

UNIT II HADOOP FRAMEWORK 9


Hadoop Architecture - Hadoop Distributed File System (HDFS) –YARN –
Hadoop I/O – Map Reduce:Developing a map-reduce application – Map-
reduce working procedure – Types and Formats - Features of Mapreduce:
sorting and joins- Pipelining MapReduce jobs.

UNIT III HADOOP TECHNOLOGIES : PIG 9


Introduction, Parallel processing using Pig, Pig Architecture, Grunt, Pig Data
Model-scalar and complex types. Pig Latin- Input and output, Relational
operators, User defined functions -Working with scripts - Hadoop
Operations.

UNIT IV HIVE AND SPARK 9


Introduction-Hive modules, Data types and file formats, Hive QL-Data
Definition and Data Manipulation-HiveQL queries, Hive QL views- reduce
query complexity. Hive scripts. Hive QL Indexes- Aggregate functions
Bucketing vs Partitioning. Overview of Spark – Hadoop Overview of Spark –
Hadoop vs. Spark – Cluster Design – Cluster Management – performance,
Application Programming interface (API): Spark Context, Resilient
Distributed Datasets, Creating RDD, RDD Operations, and Saving RDD - Lazy
Operation – Spark Jobs - Spark Programming in Scala, Python, R, Java.

UNIT V IMPALA 9
Introducing Cloudera Impala - Architecture of Impala - Components of the
Impala : The Impala Daemon, The Impala Statestore - The Impala Catalog
Service Query Processing Interfaces - Impala Shell Command Reference -
Impala Data Types - Creating and deleting databases and tables Inserting
and overwriting table data - Record Fetching and ordering - Grouping
records - Working of Impala with Hive.
TOTAL: 45 PERIODS
TEXT BOOKS:
1. Thomas Erl, Wajid Khattak, and Paul Buhler, Big Data Fundamentals:
Concepts, Drivers &Techniques, Pearson India Education Service Pvt. Ltd.,
First Edition, 2016.
2. Tom White, Hadoop: The Definitive Guide, O’Reilly Media, Inc., Fourth
Edition, 2015.
REFERENCES:
1. Alan Gates, Programming Pig Dataflow Scripting with Hadoop, O’Reilly
Media, Inc, 2011.
2. Jason Rutherglen, Dean Wampler, Edward Caprialo, Programming Hive,
O’ReillyMedia, Inc,2012
3. Mike Frampton, “Mastering Apache Spark”, Packt Publishing, 2015.
4. Getting Started with Impala,by John Russell, Publisher(s): O'Reilly Media,
Inc, September 2014

WEBREFERENCES:
1. https://www.bigdataframework.org/an-overview-of-the-big-data-
framework/
2. https://techreviewer.co/blog/the-most-popular-big-data-frameworks
3. https://www.javatpoint.com/java-big-data-frameworks
OUTCOMES:
Upon completion of the course, the student should be able to
1. Understand the need of new frame work to deal with huge amounts of
Data.
2. Demonstrate the Hadoop framework Hadoop Distributed File System and
MapReduce
3. Demonstrate the Pig architecture and evaluation of pig scripts.
4. Describe the Hive architecture and execute Hive queries on sample data
sets.
5. Demonstrate spark programming with different programming languages
and graph algorithms and execute Impala scripts

You might also like