Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
22 views2 pages

De Theory

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views2 pages

De Theory

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

VNR VIGNANA JYOTHI INSTITUTE OF ENGINEERING AND TECHNOLOGY

B.Tech. VI Semester

(22PC1AM202) DATA ENGINEERING

TEACHING SCHEME EVALUATION SCHEME


L T/P C SE CA ELA SEE TOTAL
3 0 3 30 5 5 60 100

COURSE OBJECTIVES:
To explore data preprocessing techniques
To learn the techniques related to feature engineering
To exploit the statistics for data engineering
To explore the Hadoop environment and its framework activities
To understand, explore various databases for big data analytics

COURSE OUTCOMES: After completion of the course, the student should be able to
CO-1: Perform data preprocessing and apply the related techniques
CO-2: Demonstrate feature engineering techniques and handle the issues related to
high dimensionality
CO-3: Analyse the statistics required for data engineering
CO-4: Illustrate Hadoop and its framework activities to deal with the data process
analytics
CO-5: Work with databases for big data analytics

COURSE ARTICULATION MATRIX:


(Correlation of Course Outcomes with Program Outcomes and Program Specific Outcomes using
mapping levels 1 = Slight, 2 = Moderate and 3 = Substantial)
PROGRAM SPECIFIC
PROGRAM OUTCOMES (PO)
CO OUTCOMES (PSO)
PO-1 PO-2 PO-3 PO-4 PO-5 PO-6 PO-7 PO-8 PO-9 PO-10 PO-11 PO-12 PSO-1 PSO-2 PSO-3

CO-1 3 3 2 3 1 1 - - - - - 1 3 1 3
CO-2 3 3 2 2 2 1 - - - - - 1 3 1 3
CO-3 3 3 3 3 3 1 - - - - - 1 3 1 3
CO-4 3 3 3 3 3 1 - - - - - 1 3 1 3
CO-5 3 3 3 3 3 1 - - - - - 1 3 1 3

UNIT-I:
Data Pre-processing: Types of data, exploring structure of data: Exploring and Plotting
numerical data, categorical data and relationship between variables, data quality
and remediation, data pre-processing: Dimensionality reduction and feature
selection.

UNIT-II:
Feature Engineering: Feature, feature engineering, feature transformation: Feature
construction, Feature extraction, Feature subset selection: issues in high dimensional
data, key drivers of feature selection feature relevance and redundancy, measures
of feature relevance and redundancy, overall feature selection process, feature
selection approaches
UNIT-III:
Statistics for Data Engineering: Importance of statistical tools for handling data,
concept of probability- frequentist and Bayesian interpretation, review of probability
theory, random variables-discrete random variables, continuous random variables,
common discrete distributions-Bernoulli distributions, Binomial distribution, multinomial
and multinoulli distributions, Poisson distribution, common continuous distributions-
uniform distribution, Gaussian distribution, Laplace distribution. Multiple random
variables-Bivariate random variables, joint distribution functions, joint probability mass
functions, joint probability density functions, conditional distributions, covariance and
correlation, central limit theorem, Sampling distributions Sampling with replacement,
sampling without replacement, Mean and variance of sample, hypothesis testing,
Monte Carlo approximation.

UNIT-IV:
Types of Digital Data, Introduction to Big Data: Characteristics of Data, Evolution of Big
Data and Challenges with Big Data, Big Data, Terminologies used in Big Data
Environment.
Introduction to Hadoop: Features of Hadoop, Why Hadoop, RDBMS vs Hadoop,
Hadoop Overview, HDFS, Processing Data with Hadoop.

UNIT-V:
NoSQL: Basics of NoSQL - uses, Types of NoSQL databases, significance of NoSQL.
Advantages of NoSQL, SQL vs NoSQL.
MongoDB: uses and need of MongoDB, MongoDB Query Language: Insert, Save,
Update, Remove, Find methods, Dealing with NULL values, Count, Limit, Sort and Skip,
Arrays, Aggregate Functions and MapReduce Functions.

TEXT BOOKS:
1. Machine Learning, Saikat Dutt, Subramanian Chandramouli, Amit Kumar Das,
Pearson India
2. Big Data and Analytics, Seema Acharya, Subhasinin Chellappan, Wiley
3. Machine Learning, Tom M. Mitchell, McGraw-Hill Education

REFERENCES:
1. Introduction to Data Mining, Pang-Ning Tan, Vipin Kumar, Michael Steinbach,
Pearson
2. Hadoop: The Definitive Guide, Tom White, 3rd Reilly Media, 2012

You might also like