0% found this document useful (0 votes)

6 views63 pages

Chapter 02 Understanding of Data

The document discusses various aspects of data understanding, focusing on bivariate and multivariate data visualization techniques, including scatter plots and heat maps. It covers essential mathematical concepts such as probability distributions, density estimation methods, feature engineering, and dimensionality reduction techniques like PCA and LDA. The document emphasizes the importance of feature transformation and selection in improving model performance.

Uploaded by

bunny kim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views63 pages

Chapter 02 Understanding of Data

Uploaded by

bunny kim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

Machine

Learning
S. Sridhar and M. Vijayalakshmi
Chapter 2

Understanding of Data
Bivariate Data
INVOLVES TWO VARIABLES
Bivariate Data Visualization
Scatter Plot Line Plot
Bivariate Data – Covariance
Bivariate Data – Correlation
Bivariate Data – Correlation
Multivariate Data Visualization
HeatMap
Multivariate Data Visualization
HeatMap
Multivariate Data Visualization
Multivariate Essential Mathematics
1. LINEAR SYSTEM and GAUSSIAN ELIMINATION
In mathematics, the Gaussian elimination method is known as the row reduction
algorithm for solving linear equations systems.
Multivariate Essential Mathematics
1. GAUSSIAN ELIMINATION
Multivariate Essential Mathematics
1. MATRIX DECOMPOSITION
Multivariate Essential Mathematics
1. MATRIX DECOMPOSITION
Multivariate Essential Mathematics
1. PROBABILITY DISTRIBUTIONS

Any data is assumed to be generated by a probability distribution

Probability Distribution is of two types:
1. Discrete Probability Distribution
2. Continuous Probability Distribution

Discrete Probability Distribution Continuous Probability Distribution

• Binomial Distribution • Normal Distribution
• Poisson Distribution • Rectangular Distribution
• Bernoulli Distribution • Exponential Distribution
Multivariate Essential Mathematics
The relationship between the events for a continuous random variable and their
probabilities is called a continuous probability distribution.
• It is summarized as Probability Density Function (PDF)
• PDF calculates the probability of observing an instance.
• The plot of the PDF shows the shape of the distribution

1. NORMAL DISTRIBUTIONS
Bell Curve or Gaussian Distribution
Multivariate Essential Mathematics
Most babies are likely to weight around 7.5 pounds, with few weighing less than 7 pounds and
few weighing more than 8 pounds.
Multivariate Essential Mathematics
2. RECTANGULAR DISTRIBUTIONS
It has equal probabilities for all values in the range a,b.
Also known as Uniform Distribution

3. EXPONENTIAL DISTRIBUTIONS
This distribution is helpful in modeling time until an event
occurs. Also known as Continuous Uniform Distribution
Multivariate Essential Mathematics
1. BINOMIAL DISTRIBUTIONS

The objective of this distribution is to find the probability of getting success k

out of n trials. It has only 2 outcomes – Success or Failure
Multivariate Essential Mathematics
2. POISSON DISTRIBUTIONS

Given an interval of time, this distribution is used to model the probability of a

given number of events k. Ex: Number of e-mails received, number of
customers visiting a shop.
Multivariate Essential Mathematics
3. BERNOULLI DISTRIBUTIONS
Density Estimation
• Let there be a set of observed values x1, x2, …….xn from a larger set of data
whose distribution is not known.

• Density estimation is the problem of estimating the density function from observed
data (sample data)

• The estimated density function, denoted as p(x) can be used to value directly for
any unknown data say xt as p(xt).

• If its value is less than a certain threshold, then xt is not an outlier or anomaly data.
Else it is categorized as anomaly data.
Density Estimation
Two types of Density Estimation

Parametric Density Estimation

• Maximum Likelihood Estimate
• Gaussian Mixture Model and Expectation Maximization(EM)
Algorithm

Non-Parametric Density Estimation

• Parzen Window
• KNN Estimation
Density Estimation
• Maximum likelihood estimation is a method that determines values for the
parameters of a model.
• The parameter values are found such that they maximise the likelihood that the
process described by the model produced the data that were actually observed.
• Let’s suppose we have observed 10 data points from some process. These 10 data
points are shown in the figure below
For these data we’ll assume that the data
generation process can be adequately described by
a Gaussian (normal) distribution. Visual inspection
of the figure suggests that a Gaussian distribution is
plausible because most of the 10 points are
clustered in the middle with few points scattered to
the left and the right
Density Estimation
• Gaussian distribution has 2 parameters. The mean, μ, and the standard deviation, σ.
• Different values of these parameters result in different curves. We want to
know which curve was most likely responsible for creating the data points that we
observed? (See figure below).
• Maximum likelihood estimation is a method that will find the values of μ and σ that
result in the curve that best fits the data. (Curve f1 best fits the data in the figure
which has mean=10 and standard deviation = 2.25)
The probability density of observing a single data
point x, that is generated from a Gaussian distribution is
given by:
Density Estimation
Gaussian Mixture Model and EM Algorithm
• In Machine learning, clustering is an important task.
• MLE framework is quite useful for designing model based methods
for clustering data
• A model is a statistical method and data is assumed to be
generated by a distribution model with its parameter
• There may be many distributions involved and hence called Mixture
model
• Gaussians are normally assumed and hence named Gaussian
Mixture Model (GMM)
Density Estimation
Gaussian Mixture Model – Expectation Maximization (EM) Algorithm is
commonly used for Estimating MLE
Density Estimation
Two types of Density Estimation

Parametric Density Estimation

• Maximum Likelihood Estimate
• Gaussian Mixture Model and Expectation Maximization(EM)
Algorithm

Non-Parametric Density Estimation

• Parzen Window
• KNN Estimation
Density Estimation
Parsen Window
Density Estimation
Feature Engineering

• Feature Engineering deals with 2 problems:

• FEATURE TRANSFORMATION
• FEATURE SELECTIONS
Feature Engineering
• FEATURE TRANSFORMATION
- Extraction of features and creating new features that may be
helpful in increasing performance.
• Ex: Height and Weight may give new attribute - BMI

• FEATURE SELECTIONS
- feature subset selection by removing irrelevant features
- “Curse of Dimensionality” – as the number of dimensions
increases, time complexity increases
Characteristics of Good Features
• FEATURES ARE REMOVED USING RELEVANCY
• FEATURES ARE REMOVED BASED ON REDUNDANCY

Procedure:
1. Generate all possible subsets
2. Evaluate the subsets and model performance
3. Evaluate the results for optimal feature selection
Characteristics of Good Features
• FEATURES ARE REMOVED USING RELEVANCY
• FEATURES ARE REMOVED BASED ON REDUNDANCY

Procedure:
1. Generate all possible subsets
2. Evaluate the subsets and model performance
3. Evaluate the results for optimal feature selection
Feature Selection
Filter Based Selection Methods
- No learning algorithm is used
- Uses statistical measures such as correlation, mutual
information, entropy etc for feature selection
Wrapper – based Methods
- Use classifiers to identify best features
- They are selected and evaluated by learning algorithms
- Computationallly intensive but superior performance
FEATURE SELECTION
FORWARD SELECTION

BACKWARD SELECTION
Principal Component Analysis
• Principal Component Analysis (PCA) is a dimensionality reduction
technique

• Trasforms a given set of features to a new set of features so that

features exhibit high information packing properties

• This leads to a reduced and compact set of features

• PCA extracts the most important information. This in turn leads to

compression since the less important information are discarded.
Principal Component Analysis
Principal Component Analysis
Principal Component Analysis

Compute Covariance matrix as

Compute Eigen values and Eigen vectors

and matrix A as a set of eigen vectors
Principal Component Analysis
Compute PCA as

The original Data can be

recovered as
PCA Algorithm
PCA Example
PCA Example
PCA Example
PCA Example
PCA Example
Verification
Linear Discriminant Analysis
• Linear discriminant analysis (LDA) Fisher's linear discriminant, a method used
in statisctics and other fields, to find a linear combination of features that
characterizes or separates two or more classes.

• LDA is a supervised learning method that seeks to find a linear combination of

features, forming a decision boundary that effectively separates two or more
classes in a dataset.

• It involves two primary steps: dimensionality reduction and linear classification.

• By projecting high-dimensional data onto a lower-dimensional space, LDA

maximizes the separation between classes while minimizing variance within each
class.
Linear Discriminant Analysis
• LDA is also closely related to principal component analysis (PCA)

• They both look for linear combinations of variables which best explain the
data.
• LDA explicitly attempts to model the difference between the classes of data.

• PCA, in contrast, does not take into account any difference in class, but only
variance in the data

• The axes created by LDA (LD1, LD2, etc.) prioritize class separation, whereas
PCA’s axes (PC1, PC2, etc.) prioritize variance.
Linear Discriminant Analysis
LDA Algorithm

Where, V is the linear projection and σb and σw are class scatter matrix
and within class scatter matrix respectively. For two class problem, these
matrices are given as,
LDA Algorithm
Singular Value Decomposition
Singular Value Decomposition
Singular Value Decomposition
SVD Algorithm
SVD Example
SVD Example
SVD Example
SVD Example
Summary
Summary

25 Zero Investment Business Ideas
No ratings yet
25 Zero Investment Business Ideas
109 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Data Scaling and Statistical Methods
No ratings yet
Data Scaling and Statistical Methods
4 pages
Note 1518944988
No ratings yet
Note 1518944988
27 pages
Dimensions Reduction
No ratings yet
Dimensions Reduction
27 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Intro to Machine Learning Concepts
No ratings yet
Intro to Machine Learning Concepts
30 pages
5 - Feature Generation
No ratings yet
5 - Feature Generation
15 pages
Hota ML13
No ratings yet
Hota ML13
24 pages
2 Mle
No ratings yet
2 Mle
28 pages
Unit 1 Ganeshk e
No ratings yet
Unit 1 Ganeshk e
24 pages
CB PDF
No ratings yet
CB PDF
69 pages
C3.2 - IoT Protocols and Connectivity
No ratings yet
C3.2 - IoT Protocols and Connectivity
73 pages
IMAMultivariate 1
No ratings yet
IMAMultivariate 1
90 pages
Visualization 9 Dim Reduction
No ratings yet
Visualization 9 Dim Reduction
73 pages
Chapter 02
No ratings yet
Chapter 02
27 pages
Generative Learning Algorithims 1233
No ratings yet
Generative Learning Algorithims 1233
33 pages
Module 2 ML Chapter2
No ratings yet
Module 2 ML Chapter2
64 pages
Bayesian Classifier Implementation Using MATLAB
No ratings yet
Bayesian Classifier Implementation Using MATLAB
21 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
CS-AI LECUTRE NOTES Unsupervised Learning-03
No ratings yet
CS-AI LECUTRE NOTES Unsupervised Learning-03
71 pages
Week 7 GMM
No ratings yet
Week 7 GMM
9 pages
Mod2 Notes
No ratings yet
Mod2 Notes
72 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
AI&ML Module 2
No ratings yet
AI&ML Module 2
65 pages
Machine Learning Stats Essentials
No ratings yet
Machine Learning Stats Essentials
5 pages
Lecture-3 Unit 3
No ratings yet
Lecture-3 Unit 3
22 pages
MLSlides5 - Selected - Shared
No ratings yet
MLSlides5 - Selected - Shared
30 pages
Module 2 Rnsit
No ratings yet
Module 2 Rnsit
15 pages
Symmetrical Based Projects
No ratings yet
Symmetrical Based Projects
105 pages
Chapter1 MV
No ratings yet
Chapter1 MV
72 pages
ML Module 02
No ratings yet
ML Module 02
37 pages
Module 2
No ratings yet
Module 2
104 pages
Machine Learning Module 2
No ratings yet
Machine Learning Module 2
58 pages
ML Module-02
No ratings yet
ML Module-02
37 pages
Machine Learning Techniques
No ratings yet
Machine Learning Techniques
8 pages
Week 7 - Latent Variable Models and Expectation Maximization
No ratings yet
Week 7 - Latent Variable Models and Expectation Maximization
39 pages
2 Probability
No ratings yet
2 Probability
30 pages
Unit 2
No ratings yet
Unit 2
88 pages
Tabak Turner
No ratings yet
Tabak Turner
20 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
Module 4 - Chapter 2
No ratings yet
Module 4 - Chapter 2
14 pages
Clustering Mixture
No ratings yet
Clustering Mixture
22 pages
1 - Chapter 3 Product Assurance
No ratings yet
1 - Chapter 3 Product Assurance
82 pages
Advanced Data Analysis Techniques
No ratings yet
Advanced Data Analysis Techniques
20 pages
Module 2 Notes Bcs602
No ratings yet
Module 2 Notes Bcs602
19 pages
Pattern Recognition Notes Part-2 - Studocu
No ratings yet
Pattern Recognition Notes Part-2 - Studocu
16 pages
Module-2 Notes-Bcs602
No ratings yet
Module-2 Notes-Bcs602
18 pages
Machine Learning Techniques Guide
No ratings yet
Machine Learning Techniques Guide
5 pages
Course MDA-12
No ratings yet
Course MDA-12
48 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
Module 2
No ratings yet
Module 2
18 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Covariance Matrix (W Krzanowski)
No ratings yet
Covariance Matrix (W Krzanowski)
5 pages
Generative Algorithms
No ratings yet
Generative Algorithms
3 pages
Assignment 1: 4% Monday, 30th of September at 8AM
No ratings yet
Assignment 1: 4% Monday, 30th of September at 8AM
8 pages
ECali1 Engineer Manual Eng
No ratings yet
ECali1 Engineer Manual Eng
138 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
Data Mining & Analysis Guide
No ratings yet
Data Mining & Analysis Guide
148 pages
Enyecontrols TKG CO
No ratings yet
Enyecontrols TKG CO
3 pages
Chap2 Part2 GMM
No ratings yet
Chap2 Part2 GMM
34 pages
B.Com Part-I Syllabus Overview
100% (1)
B.Com Part-I Syllabus Overview
33 pages
HRV Scientific Users Guide
No ratings yet
HRV Scientific Users Guide
49 pages
Spring Interview Prep Guide
No ratings yet
Spring Interview Prep Guide
44 pages
JVC kd-r601
No ratings yet
JVC kd-r601
85 pages
Basics of Learning
No ratings yet
Basics of Learning
53 pages
Ict Chapter 1-4
No ratings yet
Ict Chapter 1-4
9 pages
Pub 57441
No ratings yet
Pub 57441
40 pages
Roav DASHCAM R2241 A1-MANUAL
No ratings yet
Roav DASHCAM R2241 A1-MANUAL
12 pages
Chapter 2-Analytical Decision Making
No ratings yet
Chapter 2-Analytical Decision Making
39 pages
Bayesian Learning
No ratings yet
Bayesian Learning
42 pages
Pranjali P Jagtap - Resume
No ratings yet
Pranjali P Jagtap - Resume
6 pages
Uni Flip
No ratings yet
Uni Flip
46 pages
How To: Tab "3Rd Party Information"
No ratings yet
How To: Tab "3Rd Party Information"
8 pages
Geoeasy
No ratings yet
Geoeasy
17 pages
Regression Analysis
No ratings yet
Regression Analysis
33 pages
Digital Audio Workstation Meaning
No ratings yet
Digital Audio Workstation Meaning
10 pages
Screenshot 2024-06-27 at 2.57.46 PM
No ratings yet
Screenshot 2024-06-27 at 2.57.46 PM
9 pages
Config Switch Core 10.16.35.1
No ratings yet
Config Switch Core 10.16.35.1
10 pages
Procedural Programming
No ratings yet
Procedural Programming
9 pages
File Management in Cloud Storage Platforms
No ratings yet
File Management in Cloud Storage Platforms
8 pages
TN - 1130 Resolving A Popup Warning, "Your System May Be Running Low On Memory. Continue - " When Running HistData - InSource Solutions
No ratings yet
TN - 1130 Resolving A Popup Warning, "Your System May Be Running Low On Memory. Continue - " When Running HistData - InSource Solutions
6 pages
Eclipse Shortcuts
No ratings yet
Eclipse Shortcuts
1 page
Mohd Zaid CV
No ratings yet
Mohd Zaid CV
2 pages
Intel Core 2 Duo E7500
No ratings yet
Intel Core 2 Duo E7500
4 pages
Cross Entropy Wikipedia
No ratings yet
Cross Entropy Wikipedia
8 pages
Devops Brochure - H-Town Technologies
No ratings yet
Devops Brochure - H-Town Technologies
4 pages
Pronto Xi Help 750.2 - Item Creation Request Function
No ratings yet
Pronto Xi Help 750.2 - Item Creation Request Function
3 pages
CC Labu
No ratings yet
CC Labu
7 pages
Cloud
No ratings yet
Cloud
5 pages
Maven "Convention Over Configuration" Example: An Illustration of This Notion Inside The Maven
No ratings yet
Maven "Convention Over Configuration" Example: An Illustration of This Notion Inside The Maven
2 pages
Purple and White Clean and Professional Resume
No ratings yet
Purple and White Clean and Professional Resume
1 page
System Based Error Book
No ratings yet
System Based Error Book
16 pages

Chapter 02 Understanding of Data

Uploaded by

Chapter 02 Understanding of Data

Uploaded by

Machine

Any data is assumed to be generated by a probability distribution

Discrete Probability Distribution Continuous Probability Distribution

The objective of this distribution is to find the probability of getting success k

Given an interval of time, this distribution is used to model the probability of a

Parametric Density Estimation

Non-Parametric Density Estimation

Parametric Density Estimation

Non-Parametric Density Estimation

• Feature Engineering deals with 2 problems:

• Trasforms a given set of features to a new set of features so that

• This leads to a reduced and compact set of features

• PCA extracts the most important information. This in turn leads to

Compute Covariance matrix as

Compute Eigen values and Eigen vectors

The original Data can be

• LDA is a supervised learning method that seeks to find a linear combination of

• It involves two primary steps: dimensionality reduction and linear classification.

• By projecting high-dimensional data onto a lower-dimensional space, LDA

You might also like