BTCS606 Big Data Architecture and
L
T
P
C
Programming 3 0 2 4
Total Credits:4 Total Hours in semester :45 Total Marks:150
1 Course Pre-requisites: Data Structure
2 Course Category:
3 Course Revision/ Approval Date:
4 Course Objectives :
4.1 To understand the need of Big Data, challenges and different analytical architectures.
4.2 Installation and understanding of Hadoop Architecture and its ecosystems
4.3 Processing of Big Data with Advanced architectures like Spark.
4.4 Describe graphs and streaming data in Spark
4.5 To realistically assess the application of big data technologies for different usage
scenarios.
Course Content Weightage Contact Pedagogy
hours
Unit 1: Introduction 20% 9 halk – Talk,
C
Presentation
hat is big data, why big data,
W
convergence of key trends, unstructured
data,industryexamplesofbigdata,web
analytics, big data and marketing, fraud
andbigdata,riskandbigdata,creditrisk
management, big data and algorithmic
trading,bigdataandhealthcare,bigdata
inmedicine,advertisingandbigdata,big
data technologies, introduction to
Hadoop,opensourcetechnologies,cloud
and big data, mobile business
intelligence, Crowd sourcing analytics,
inter and trans firewall analytics.
Unit 2: NoSQL 20% 9 halk – Talk,
C
Presentation
I ntroduction to NoSQL, aggregate data
models, aggregates, key-value and
document data models, relationships,
graph databases, schemaless databases,
materialized views, distribution models,
sharding, master-slave replication,
peer-peer replication, sharding and
replication, consistency, relaxing
consistency,versionstamps,map-reduce,
p artitioning and combining, composing
map-reduce calculations.
Unit 3: Hadoop 20% 9 halk – Talk,
C
Presentation
ata format, analyzing data with
D
Hadoop, scaling out,Hadoopstreaming,
Hadoop, pipes, design of Hadoop
distributed file system (HDFS), HDFS
concepts, Java, interface, data flow,
HadoopI/O,dataintegrity,compression,
serialization, Avro, file-based data
structures
Unit 4: Mapreduce 20% 9 omputer based
C
learning, Chalk –
ap Reduce workflows, unit tests with
M Talk,
MR Unit, test data and local tests, Presentation
anatomyofMapReducejobrun,classic
Map-reduce, YARN, failures in classic
Map-reduce and YARN, jobscheduling,
shuffle and sort, task execution, Map
Reduce types, input formats, output
formats
Unit 5: Advanced Big Data Tools 20% 9 omputer based
C
learning, Chalk –
base,datamodelandimplementations,
H Talk,
Hbase clients, Hbase examples, praxis. Presentation
Cassandra, Cassandra data model,
Cassandra examples, Cassandra clients,
Hadoop integration. Pig,Grunt,pigdata
model,PigLatin,developingandtesting
PigLatinscripts.Hive,datatypesandfile
formats, HiveQL data definition,
HiveQL data manipulation, HiveQL
queries
Learning Resources
1. Textbooks:
1. Kevin Night and Elaine Rich, Nair B., “Artificial Intelligence (SIE)”,
Mc-Graw Hill- 2008.
2. Dan W. Patterson, “Introduction to AI and ES”, Pearson Education, 2007.
2. Reference Books:
1. Polyglot Persistence", Addison-Wesley Professional, 2012.
2. T omWhite,"Hadoop:TheDefinitiveGuide",ThirdEdition,O'Reilley,
2012.
3. Eric Sammer, "Hadoop Operations", O'Reilley, 2012.
4. E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive",
O'Reilley, 2012.
5. Lars George, "HBase: The Definitive Guide", O'Reilley, 2011.
6. Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010.
7. Alan Gates, "Programming Pig", O'Reilley, 2011
4.http://nptel.ac.in
.
3 Journals & Periodicals:
5. Other Electronic Resources:
Evaluation Scheme Total Marks
Mid semester Marks 30
End Semester Marks 50
Attendance 5 marks
Quiz 5 marks
Continuous Evaluation kill enhancement activities / case
S 5 marks
Marks study
Presentation/ miscellaneous 5 marks
activities
.Explain the motivation for big data systems and identify the
1
main sources of Big Data in the real world
Course Outcomes 2.Demonstrate an ability to use frameworks like Hadoop,
NOSQL to efficiently store, retrieve and process Big Data for
Analytics.
3.Implement several Data Intensive tasks using the Map
Reduce Paradigm
4.Apply several newer algorithms for Clustering Classifying
and finding associations in Big Data .
5.Design algorithms to analyze Big data like streams, Web
Graphs and Social Media data.