Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
6 views63 pages

Chapter 02 Understanding of Data

The document discusses various aspects of data understanding, focusing on bivariate and multivariate data visualization techniques, including scatter plots and heat maps. It covers essential mathematical concepts such as probability distributions, density estimation methods, feature engineering, and dimensionality reduction techniques like PCA and LDA. The document emphasizes the importance of feature transformation and selection in improving model performance.

Uploaded by

bunny kim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views63 pages

Chapter 02 Understanding of Data

The document discusses various aspects of data understanding, focusing on bivariate and multivariate data visualization techniques, including scatter plots and heat maps. It covers essential mathematical concepts such as probability distributions, density estimation methods, feature engineering, and dimensionality reduction techniques like PCA and LDA. The document emphasizes the importance of feature transformation and selection in improving model performance.

Uploaded by

bunny kim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Machine

Learning
S. Sridhar and M. Vijayalakshmi
Chapter 2

Understanding of Data
Bivariate Data
INVOLVES TWO VARIABLES
Bivariate Data Visualization
Scatter Plot Line Plot
Bivariate Data – Covariance
Bivariate Data – Correlation
Bivariate Data – Correlation
Multivariate Data Visualization
HeatMap
Multivariate Data Visualization
HeatMap
Multivariate Data Visualization
Multivariate Essential Mathematics
1. LINEAR SYSTEM and GAUSSIAN ELIMINATION
In mathematics, the Gaussian elimination method is known as the row reduction
algorithm for solving linear equations systems.
Multivariate Essential Mathematics
1. GAUSSIAN ELIMINATION
Multivariate Essential Mathematics
1. MATRIX DECOMPOSITION
Multivariate Essential Mathematics
1. MATRIX DECOMPOSITION
Multivariate Essential Mathematics
1. PROBABILITY DISTRIBUTIONS

Any data is assumed to be generated by a probability distribution


Probability Distribution is of two types:
1. Discrete Probability Distribution
2. Continuous Probability Distribution

Discrete Probability Distribution Continuous Probability Distribution


• Binomial Distribution • Normal Distribution
• Poisson Distribution • Rectangular Distribution
• Bernoulli Distribution • Exponential Distribution
Multivariate Essential Mathematics
The relationship between the events for a continuous random variable and their
probabilities is called a continuous probability distribution.
• It is summarized as Probability Density Function (PDF)
• PDF calculates the probability of observing an instance.
• The plot of the PDF shows the shape of the distribution

1. NORMAL DISTRIBUTIONS
Bell Curve or Gaussian Distribution
Multivariate Essential Mathematics
Most babies are likely to weight around 7.5 pounds, with few weighing less than 7 pounds and
few weighing more than 8 pounds.
Multivariate Essential Mathematics
2. RECTANGULAR DISTRIBUTIONS
It has equal probabilities for all values in the range a,b.
Also known as Uniform Distribution

3. EXPONENTIAL DISTRIBUTIONS
This distribution is helpful in modeling time until an event
occurs. Also known as Continuous Uniform Distribution
Multivariate Essential Mathematics
1. BINOMIAL DISTRIBUTIONS

The objective of this distribution is to find the probability of getting success k


out of n trials. It has only 2 outcomes – Success or Failure
Multivariate Essential Mathematics
2. POISSON DISTRIBUTIONS

Given an interval of time, this distribution is used to model the probability of a


given number of events k. Ex: Number of e-mails received, number of
customers visiting a shop.
Multivariate Essential Mathematics
3. BERNOULLI DISTRIBUTIONS
Density Estimation
• Let there be a set of observed values x1, x2, …….xn from a larger set of data
whose distribution is not known.

• Density estimation is the problem of estimating the density function from observed
data (sample data)

• The estimated density function, denoted as p(x) can be used to value directly for
any unknown data say xt as p(xt).

• If its value is less than a certain threshold, then xt is not an outlier or anomaly data.
Else it is categorized as anomaly data.
Density Estimation
Two types of Density Estimation

Parametric Density Estimation


• Maximum Likelihood Estimate
• Gaussian Mixture Model and Expectation Maximization(EM)
Algorithm

Non-Parametric Density Estimation


• Parzen Window
• KNN Estimation
Density Estimation
• Maximum likelihood estimation is a method that determines values for the
parameters of a model.
• The parameter values are found such that they maximise the likelihood that the
process described by the model produced the data that were actually observed.
• Let’s suppose we have observed 10 data points from some process. These 10 data
points are shown in the figure below
For these data we’ll assume that the data
generation process can be adequately described by
a Gaussian (normal) distribution. Visual inspection
of the figure suggests that a Gaussian distribution is
plausible because most of the 10 points are
clustered in the middle with few points scattered to
the left and the right
Density Estimation
• Gaussian distribution has 2 parameters. The mean, μ, and the standard deviation, σ.
• Different values of these parameters result in different curves. We want to
know which curve was most likely responsible for creating the data points that we
observed? (See figure below).
• Maximum likelihood estimation is a method that will find the values of μ and σ that
result in the curve that best fits the data. (Curve f1 best fits the data in the figure
which has mean=10 and standard deviation = 2.25)
The probability density of observing a single data
point x, that is generated from a Gaussian distribution is
given by:
Density Estimation
Gaussian Mixture Model and EM Algorithm
• In Machine learning, clustering is an important task.
• MLE framework is quite useful for designing model based methods
for clustering data
• A model is a statistical method and data is assumed to be
generated by a distribution model with its parameter
• There may be many distributions involved and hence called Mixture
model
• Gaussians are normally assumed and hence named Gaussian
Mixture Model (GMM)
Density Estimation
Gaussian Mixture Model – Expectation Maximization (EM) Algorithm is
commonly used for Estimating MLE
Density Estimation
Two types of Density Estimation

Parametric Density Estimation


• Maximum Likelihood Estimate
• Gaussian Mixture Model and Expectation Maximization(EM)
Algorithm

Non-Parametric Density Estimation


• Parzen Window
• KNN Estimation
Density Estimation
Parsen Window
Density Estimation
Feature Engineering

• Feature Engineering deals with 2 problems:


• FEATURE TRANSFORMATION
• FEATURE SELECTIONS
Feature Engineering
• FEATURE TRANSFORMATION
- Extraction of features and creating new features that may be
helpful in increasing performance.
• Ex: Height and Weight may give new attribute - BMI

• FEATURE SELECTIONS
- feature subset selection by removing irrelevant features
- “Curse of Dimensionality” – as the number of dimensions
increases, time complexity increases
Characteristics of Good Features
• FEATURES ARE REMOVED USING RELEVANCY
• FEATURES ARE REMOVED BASED ON REDUNDANCY

Procedure:
1. Generate all possible subsets
2. Evaluate the subsets and model performance
3. Evaluate the results for optimal feature selection
Characteristics of Good Features
• FEATURES ARE REMOVED USING RELEVANCY
• FEATURES ARE REMOVED BASED ON REDUNDANCY

Procedure:
1. Generate all possible subsets
2. Evaluate the subsets and model performance
3. Evaluate the results for optimal feature selection
Feature Selection
Filter Based Selection Methods
- No learning algorithm is used
- Uses statistical measures such as correlation, mutual
information, entropy etc for feature selection
Wrapper – based Methods
- Use classifiers to identify best features
- They are selected and evaluated by learning algorithms
- Computationallly intensive but superior performance
FEATURE SELECTION
FORWARD SELECTION

BACKWARD SELECTION
Principal Component Analysis
• Principal Component Analysis (PCA) is a dimensionality reduction
technique

• Trasforms a given set of features to a new set of features so that


features exhibit high information packing properties

• This leads to a reduced and compact set of features

• PCA extracts the most important information. This in turn leads to


compression since the less important information are discarded.
Principal Component Analysis
Principal Component Analysis
Principal Component Analysis

Compute Covariance matrix as

Compute Eigen values and Eigen vectors


and matrix A as a set of eigen vectors
Principal Component Analysis
Compute PCA as

The original Data can be


recovered as
PCA Algorithm
PCA Example
PCA Example
PCA Example
PCA Example
PCA Example
Verification
Linear Discriminant Analysis
• Linear discriminant analysis (LDA) Fisher's linear discriminant, a method used
in statisctics and other fields, to find a linear combination of features that
characterizes or separates two or more classes.

• LDA is a supervised learning method that seeks to find a linear combination of


features, forming a decision boundary that effectively separates two or more
classes in a dataset.

• It involves two primary steps: dimensionality reduction and linear classification.

• By projecting high-dimensional data onto a lower-dimensional space, LDA


maximizes the separation between classes while minimizing variance within each
class.
Linear Discriminant Analysis
• LDA is also closely related to principal component analysis (PCA)

• They both look for linear combinations of variables which best explain the
data.
• LDA explicitly attempts to model the difference between the classes of data.

• PCA, in contrast, does not take into account any difference in class, but only
variance in the data

• The axes created by LDA (LD1, LD2, etc.) prioritize class separation, whereas
PCA’s axes (PC1, PC2, etc.) prioritize variance.
Linear Discriminant Analysis
LDA Algorithm

Where, V is the linear projection and σb and σw are class scatter matrix
and within class scatter matrix respectively. For two class problem, these
matrices are given as,
LDA Algorithm
Singular Value Decomposition
Singular Value Decomposition
Singular Value Decomposition
SVD Algorithm
SVD Example
SVD Example
SVD Example
SVD Example
Summary
Summary

You might also like