0% found this document useful (0 votes)

4 views18 pages

Module 2

Uploaded by

suchitrakp608

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views18 pages

Module 2

Uploaded by

suchitrakp608

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Module 2- Machine Learning (BCS602)

Module 2
Understanding Data
Bivariate and Multivariate data, Multivariate statistics, Essential mathematics for Multivariate data,
Overview hypothesis, Feature engineering and dimensionality reduction techniques, Basics of Learning
Theory: Introduction to learning and its types, Introduction computation learning theory, Design of
learning system, Introduction concept learning. Similarity-based learning: Introduction to Similarity or
instance based learning, Nearest-neighbour learning, weighted k- Nearest - Neighbour algorithm.

CHAPTER -2
2.6 BIVARIATE DATA AND MULTIVARIATE DATA
Bivariate Data involves two variables. Bivariate data deals with causes of relationships. The aim is
to find relationships among data. Consider the following Table 2.3, with data of the temperature in
a shop and sales of sweaters.

Here, the aim of bivariate analysis is to find relationships among variables. The relationships can then be
used in comparisons, finding causes, and in further explorations. To do that, graphical display of the data is
necessary. One such graph method is called scatter plot.

Scatter plot is used to visualize bivariate data. It is useful to plot two variables with or without nominal
variables, to illustrate the trends, and also to show differences. It is a plot between explanatory and
response variables. It is a 2D graph showing the relationship between two variables. Line graphs are
similar to scatter plots. The Line Chart for sales data is shown in
Figure 2.12.

2.6.1 Bivariate Statistics

Dr.BA, AG,NGY
Module 2- Machine Learning (BCS602)
Covariance and Correlation are examples of bivariate statistics. Covariance is a measure of joint
probability of random variables, say X and Y. Generally, random variables are represented in capital
letters. It is defined

Dr.BA, AG,NGY
Module 2- Machine Learning (BCS602)

as covariance (X, Y) or COV (X, Y) and is used to measure the variance between two dimensions. The
formula for finding co-variance for specific x, and
y are:

Here, xi and yi are data values from X and Y. E(X) and E(Y) are the mean values of xi and yi. N is the
number of given data. Also, the COV(X, Y) is same as COV(Y, X).

If the given attributes are X = (x1, x2, … , xN) and Y = (y1, y2, … , yN), then the Pearson correlation
coefficient, that is denoted as r, is given as: (σX, σY are the standard deviations of X and Y.)

2.7 MULTIVARIATE STATISTICS

In machine learning, almost all datasets are multivariable. Multivariate data is the analysis of more than
two observable variables, and often, thousands of multiple measurements need to be conducted for one or
more subjects. Multivariate data has three or more variables. The aim of the multivariate analysis is much
more. They are regression analysis, factor analysis and multivariate analysis of variance.

Heatmap A heat map is a graphical representation of data where individual values are represented by
colors. Heat maps are often used in data analysis and visualization to show patterns, density, or intensity
of data points in a two-dimensional grid.
Example: Let's consider a heat map to display the average temperatures (in °C) across different regions in
a country over a week. Each cell in the heat map will represent a temperature for a specific region on a
specific day. This is useful to quickly identify trends, such as higher temperatures in certain regions or
specific days with unusual weather patterns. The color gradient (from blue to red) indicates the
temperature range: cooler colors represent lower temperatures, while warmer colors represent higher
temperatures.

Dr.BA, AG,NGY
Module 2- Machine Learning (BCS602)

Pairplot
Pairplot or scatter matrix is a data visual technique for multivariate data. A scatter matrix consists of
several pair-wise scatter plots of variables of the multivariate data. A random matrix of three columns is
chosen and the relationships of the columns is plotted as a pairplot (or scatter matrix) as shown in
Figure 2.14.

2.8 ESSENTIAL MATHEMATICS FOR MULTIVARIATE DATA

Machine learning involves many mathematical concepts from the domain of Linear algebra, Statistics,
Probability and Information theory. The subsequent sections discuss important aspects of linear algebra
and probability.

2.8.1 Linear Systems and Gaussian Elimination for Multivariate Data

A linear system of equations is a group of equations with unknown variables. Let Ax = y, then the solution
x is given as: x= y/A= A-1y. This is true if y is not zero and A is not zero. The logic can be extended for N-
Dr.BA, AG,NGY
Module 2- Machine Learning (BCS602)

set of equations with ‘n’ unknown variables. It means if A= and y=(y1 y2…yn), then the unknown
variable x can be computed as: x= y/A= A-1y

If there is a unique solution, then the system is called consistent independent. If there are various
solutions, then the system is called consistent dependant. If there are no solutions and if the equations
are contradictory, then the system is called inconsistent.

For solving large number of system of equations, Gaussian elimination can be used. The
procedure for applying Gaussian elimination is given as follows:
1. Write the given matrix.
2. Append vector y to the matrix A. This matrix is called augmentation matrix.
3. Keep the element a11 as pivot and eliminate all a11 in second row using the matrix operation,

R2 - (a21/a11), here R2 is the 2nd row and (a21/a11) is called the multiplier.

The same logic can be used to remove a11 in all other equations.
4. Repeat the same logic and reduce it to reduced echelon form. Then, the unknown variable as:

5. Then, the remaining unknown variables can be found by back-substitution as:

To facilitate the application of Gaussian elimination method, the following row operations are
applied:
1. Swapping the rows
2. Multiplying or dividing a row by a constant
3. Replacing a row by adding or subtracting a multiple of another row to it

These concepts are illustrated in Example 2.8.

Dr.BA, AG,NGY
Module 2- Machine Learning (BCS602)

2.8.2 Matrix Decomposition

It is often necessary to reduce a matrix to its constituent parts so that complex matrix operations can be
performed.
Then, the matrix A can be decomposed as: A=Q ^ QT

where, Q is the matrix of eigen vectors, Λ is the diagonal matrix and QT is the transpose of matrix Q.

LU Decomposition
One of the simplest matrix decomposition is LU decomposition where the matrix A can be decomposed
matrices: A = LU. Here, L is the lower triangular matrix and U is the upper triangular matrix. The
decomposition can be done using Gaussian elimination method as discussed in the previous section. First,
an identity matrix is augmented to the given matrix. Then, row operations and Gaussian elimination is
applied to reduce the given matrix to get matrices L and U. Example 2.9 illustrates the application of
Gaussian elimination to get LU.

Dr.BA, AG,NGY
Module 2- Machine Learning (BCS602)

Now, it can be observed that the first matrix is L as it is the lower triangular matrix whose values are the
determiners used in the reduction of equations above such as 3, 3 and 2/3.
The second matrix is U, the upper triangular matrix whose values are the values of the reduced matrix
because of Gaussian elimination.

Introduction to Machine Learning and Probability/Statistics

 Importance: Machine learning relies heavily on statistics and probability to make

predictions and analyze data.
 Statistics in ML: Key for understanding data patterns, measuring relationships, and
quantifying uncertainties.

Probability Distributions

 Definition: A probability distribution describes the likelihood of various outcomes for a variable XXX.
Dr.BA, AG,NGY
Module 2- Machine Learning (BCS602)
 Types:

Dr.BA, AG,NGY
Module 2- Machine Learning (BCS602)

o Discrete Probability Distributions: For countable events (e.g., binomial, Poisson).

o Continuous Probability Distributions: For measurable events on a continuum (e.g., normal,
exponential).

Continuous Probability
Distributions

1. Normal Distribution (Gaussian Distribution)

 Shape: Bell curve, symmetric around the mean.

 Characteristics: Defined by mean μ and standard deviation σ.
 Probability Density Function (PDF)

 Applications: Common in natural data (e.g., heights, exam scores).

 Z-score: Standardizes data points. Z=X−μ/σ
2. Uniform Distribution (Rectangular Distribution)

 Definition: Equal probability for all outcomes within range [a,b].

 PDF :

3. Exponential Distribution

Definition: Models time between events in a Poisson process

Discrete Probability
Distributions

1 Binomial Distribution

 Definition: For trials with two outcomes (success/failure).

 Formula for Probability of k Successes in n Trials:

Dr.BA, AG,NGY
Module 2- Machine Learning (BCS602)

2 Poisson Distribution

 Definition: Models the number of events in a fixed interval of time.

 PDF

3 Bernoulli Distribution

 Definition: Models a single trial with two outcomes (success/failure).

 Probability Mass Function (PMF)

Density Estimation

 Goal: Estimate the probability density function (PDF) of data.

 Types:
o Parametric Density Estimation: Assumes a known distribution (e.g., Gaussian)
and estimates parameters.
o Non-Parametric Density Estimation: Does not assume a fixed distribution (e.g.,
Parzen window, k-Nearest Neighbors)

Parametric Density Estimation

1 Maximum Likelihood Estimation (MLE)

 Definition: A method for estimating the parameters of a distribution by maximizing the

likelihood function.
 Likelihood Function: Maximize L(ϴ) for parameter ϴ

Gaussian Mixture Model (GMM) and Expectation-Maximization (EM) Algorithm

 GMM: A probabilistic model assuming data is generated from a mixture of Gaussian

distributions.
 EM Algorithm:
o E-Step: Estimate the distribution parameters for each latent variable.
o M-Step: Optimize parameters using MLE.
 Iteration: Repeat until convergence.

Dr.BA, AG,NGY
Module 2- Machine Learning (BCS602)

Non-Parametric Density Estimation Methods

1 Parzen Window

 Definition: A non-parametric technique that estimates the PDF based on local samples.
 Example: Uses a kernel function like Gaussian around each data point.

2 k-Nearest Neighbors (KNN)

 Definition: Estimates density by considering the kkk closest neighbors.

 Application: Frequently used in classification tasks.

2.10 FEATURE ENGINEERING AND DIMENSIONALITY REDUCTION TECHNIQUES

Features are attributes. Feature engineering is about determining the subset of features that form
an important part of the input that improves the performance of the model, be it classification or any other
model in machine learning.

Feature engineering deals with two problems – Feature Transformation and Feature Selection.
Feature transformation is extraction of features and creating new features that may be helpful in increasing
performance. For example, the height and weight may give a new attribute called Body Mass Index (BMI).

Feature subset selection is another important aspect of feature engineering that focuses on selection of
features to reduce the time but not at the cost of reliability.

The features can be removed based on two aspects:

1. Feature relevancy – Some features contribute more for classification than other features. For
example, a mole on the face can help in face detection than common features like nose. In simple
words, the features should be relevant.
Feature redundancy – Some features are redundant. For example, when a database table has a field called
Date of birth, then age field is not relevant as age can be computed easily from date of birth.
So, the procedure is:
1. Generate all possible subsets
2. Evaluate the subsets and model performance
3. Evaluate the results for optimal feature selection

Filter-based selection uses statistical measures for assessing features. In this approach, no learning algorithm
is used. Correlation and information gain measures like mutual information and entropy are all examples
of this approach.

Wrapper-based methods use classifiers to identify the best features. These are selected and evaluated by the
learning algorithms. This procedure is computationally intensive but has superior performance.

2.10.1 Stepwise Forward Selection

This procedure starts with an empty set of attributes. Every time, an attribute is tested for statistical
significance for best quality and is added to the reduced set. This process is continued till a good reduced
set of attributes is obtained.

2.10.2 Stepwise Backward Elimination

This procedure starts with a complete set of attributes. At every stage, the procedure removes the worst
Dr.BA, AG,NGY
Module 2- Machine Learning (BCS602)
attribute from the set, leading to the reduced set.

Dr.BA, AG,NGY
Module 2- Machine Learning (BCS602)

2.10.3 Principal Component Analysis

The idea of the principal component analysis (PCA) or KL transform is to transform a given set of
measurements to a new set of features so that the features exhibit high information packing properties.
This leads to a reduced and compact set of features. Consider a group of random vectors of the form:

The mean vector of the set of random vectors is defined as:

The operator E refers to the expected value of the population. This is calculated theoretically using the
probability density functions (PDF) of the elements xi and the joint probability density functions between
the elements xi and xj. From this, the covariance matrix can be calculated as:

The mapping of the vectors x to y using the transformation can now be described as:

This transform is also called as Karhunen-Loeve or Hoteling transform. The original vector x
can now be reconstructed as follows:

If K largest eigen values are used, the recovered information would be:

The PCA algorithm is as follows:

1. The target dataset x is obtained
2. The mean is subtracted from the dataset. Let the mean be m. Thus, the adjusted dataset is X – m.
The objective of this process is to transform the dataset with zero mean.
3. The covariance of dataset x is obtained. Let it be C.
4. Eigen values and eigen vectors of the covariance matrix are calculated.
5. The eigen vector of the highest eigen value is the principal component of the dataset. The eigen
values are arranged in a descending order. The feature vector is formed with these eigen vectors in
its columns.
Feature vector = {eigen vector1, eigen vector2, … , eigen vectorn}
6. Obtain the transpose of feature vector. Let it be A.
7. PCA transform is y = A × (x – m), where x is the input dataset, m is the mean, and A is the
transpose of the feature vector.
The original data can be retrieved using the formula given below:

The new data is a dimensionaly reduced matrix that represents the original data.
Figure 2.15. The scree plot indicates that only 6 out of 246 attributes are important.

From Figure 2.15, one can infer the relevance of the attributes. The scree plot indicates that
the first attribute is more important than all other attributes.

2.10.4 Linear Discriminant Analysis

Linear Discriminant Analysis (LDA) is also a feature reduction technique like PCA. The focus of LDA
is to project higher dimension data to a line (lower dimension data). LDA is also used to classify the
data. Let there be two classes, c1 and c2. Let m1 and m2 be the mean of the patterns of two classes.
The mean of the class c1 and c2 can be computed as:

Dr.BA, AG,NGY
Module 2- Machine Learning (BCS602)

The aim of LDA is to optimize the function:

2.10.5 Singular Value Decomposition

Singular Value Decomposition (SVD) is another useful decomposition technique. Let A be the
matrix, then the matrix A can be decomposed as:

Here, A is the given matrix of dimension m × n, U is the orthogonal matrix whose dimension is m × n, S is
the diagonal matrix of dimension n × n, and V is the orthogonal matrix. The procedure for finding
decomposition matrix is given as follows:
1. For a given matrix, find AA^T
2. Find eigen values of AA^T
3. Sort the eigen values in a descending order. Pack the eigen vectors as a matrix U.
4. Arrange the square root of the eigen values in diagonal. This matrix is diagonal matrix, S.
5. Find eigen values and eigen vectors for A^TA. Find the eigen value and pack the eigen vector as a
matrix called V.
Thus, A = USV^T. Here, U and V are orthogonal matrices. The columns of U and V are left and right
singular values, respectively. SVD is useful in compression, as one can decide to retain only a
certain component instead of the original matrix A as:

Based on the choice of retention, the compression can be controlled.

Dr.BA,AG,NGY
Module 2- Machine Learning (BCS602)

CHAPTER 3 - BASICS OF LEARNING THEORY

3.3 DESIGN OF A LEARNING SYSTEM

3.4 INTRODUCTION TO CONCEPT LEARNING

Concept learning is a learning strategy that involves acquiring abstract knowledge or inferring a general
concept based on the given training samples. It aims to derive a category or classification from the data,
facilitating abstraction and generalization. In machine learning, concept learning is about finding a
function that categorizes or labels instances correctly based on the observed features.

3.4.1 Representation of a Hypothesis

A hypothesis, denoted by h, is an approximation of the target function f. It represents the

relationship between independent attributes (input features) and the dependent attribute (output
or label) of the training instances. The hypothesis acts as the predicted model that maps inputs to
outputs effectively. In concept learning, each hypothesis is represented as a conjunction (AND
combination) of attribute conditions in the antecedent part, defining specific constraints on
attributes to classify instances accurately.

3.4.2 Hypothesis Space

Hypothesis space is the set of all possible hypotheses that approximates the target function
f.

The subset of hypothesis space that is consistent with all-observed training instances is
Dr.BA,AG,NGY
Module 2- Machine Learning (BCS602)
called as Version Space.

3.4.3 Heuristic Space Search

Heuristic search is a search strategy that finds an optimized hypothesis/solution to a

problem by iteratively improving the hypothesis/solution based on a given heuristic
function or a cost measure.

3.4.4 Generalization and Specialization

Searching the Hypothesis Space

There are two ways of learning the hypothesis, consistent with all training instances from
the large hypothesis space.
1. Specialization – General to Specific learning
2. Generalization – Specific to General learning

Generalization – Specific to General Learning This learning methodology will search

through the hypothesis space for an approximate hypothesis by generalizing the most
specific hypothesis.

Specialization – General to Specific Learning This learning methodology will search

through the hypothesis space for an approximate hypothesis by specializing the most
general hypothesis.

3.4.5 Hypothesis Space Search by Find-S Algorithm

Dr.BA,AG,NGY
Module 2- Machine Learning (BCS602)

Limitations of Find-S Algorithm

3.4.6 Version Spaces

List-Then-Eliminate Algorithm

Candidate Elimination Algorithm

Dr.BA,AG,NGY
Module 2- Machine Learning (BCS602)

The diagrammatic representation of deriving the version space is shown below:

Deriving the Version Space

Dr.BA,AG,NGY

Unit 2 Assignment CGC1W
No ratings yet
Unit 2 Assignment CGC1W
2 pages
Probabilistic Machine Learning An Introduction Book 1 (Kevin P Murphy)
100% (1)
Probabilistic Machine Learning An Introduction Book 1 (Kevin P Murphy)
949 pages
Multivariate
0% (1)
Multivariate
319 pages
Mathematics For Machine Learning
No ratings yet
Mathematics For Machine Learning
134 pages
Pattern Classification
No ratings yet
Pattern Classification
41 pages
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages
DLMDSAS01 - Advanced Statistics.
100% (1)
DLMDSAS01 - Advanced Statistics.
248 pages
PRML Exercise Solutions Guide
No ratings yet
PRML Exercise Solutions Guide
87 pages
LEVEL 3: Scope and Sequence: Big Question
No ratings yet
LEVEL 3: Scope and Sequence: Big Question
4 pages
Machine Learning Handbook - Radivojac and White
No ratings yet
Machine Learning Handbook - Radivojac and White
108 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
STAT3006 Lecture Notes 2021 Aug8 2021
No ratings yet
STAT3006 Lecture Notes 2021 Aug8 2021
110 pages
Materi 5 - 2
No ratings yet
Materi 5 - 2
25 pages
Unit 2 - Esp in Elt - Complete
No ratings yet
Unit 2 - Esp in Elt - Complete
35 pages
Statement of Purpose (Ashok)
No ratings yet
Statement of Purpose (Ashok)
2 pages
Machine Learning Lab Guide
No ratings yet
Machine Learning Lab Guide
69 pages
Data Scaling and Statistical Methods
No ratings yet
Data Scaling and Statistical Methods
4 pages
My Notes
No ratings yet
My Notes
15 pages
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
No ratings yet
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
65 pages
Math Behind Machine Learning
No ratings yet
Math Behind Machine Learning
9 pages
Multivariate Statistical Analysis: Old School
No ratings yet
Multivariate Statistical Analysis: Old School
319 pages
Unit 1
No ratings yet
Unit 1
21 pages
Object Oriented Programming in Java
No ratings yet
Object Oriented Programming in Java
5 pages
Kopi
No ratings yet
Kopi
5 pages
High Pass Filter
No ratings yet
High Pass Filter
12 pages
Pottery Basics
No ratings yet
Pottery Basics
29 pages
Preparation 7 - Ointments
No ratings yet
Preparation 7 - Ointments
8 pages
Trial Memorandum Plaintiff SAMPLE
No ratings yet
Trial Memorandum Plaintiff SAMPLE
9 pages
Adani Group Acquires NDTV Assingment No. 1
No ratings yet
Adani Group Acquires NDTV Assingment No. 1
11 pages
Matrices
No ratings yet
Matrices
12 pages
WinDNC V06 02 NewFeatures en
100% (3)
WinDNC V06 02 NewFeatures en
2 pages
CBSE Sample Question Paper-2021 (Solved) : Section-A (Objective Type)
0% (1)
CBSE Sample Question Paper-2021 (Solved) : Section-A (Objective Type)
17 pages
EDAN96 2024 Last Lecture-1
No ratings yet
EDAN96 2024 Last Lecture-1
78 pages
178 HW 9
No ratings yet
178 HW 9
153 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
178 HW 6
No ratings yet
178 HW 6
125 pages
Ml. Model 2
100% (1)
Ml. Model 2
31 pages
Portable Dust Extractors for Sale or Hire
100% (1)
Portable Dust Extractors for Sale or Hire
1 page
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Introduction To The Importance of Sanitation - 5
No ratings yet
Introduction To The Importance of Sanitation - 5
16 pages
CHOCOLAT (1988) Analysis of The Film
No ratings yet
CHOCOLAT (1988) Analysis of The Film
3 pages
Module 4 - Chapter 2
No ratings yet
Module 4 - Chapter 2
14 pages
Textbook ML - Removed
No ratings yet
Textbook ML - Removed
22 pages
ML Module2
No ratings yet
ML Module2
59 pages
Agency Sales Call Script
No ratings yet
Agency Sales Call Script
4 pages
P&ID Symbols and Legend Guide
No ratings yet
P&ID Symbols and Legend Guide
1 page
Machine Learning Assignment Questions
No ratings yet
Machine Learning Assignment Questions
2 pages
Module - 02 Machine Learning (BCS602) Notes
No ratings yet
Module - 02 Machine Learning (BCS602) Notes
38 pages
Repair & Rehab of Structures Course
No ratings yet
Repair & Rehab of Structures Course
2 pages
Maths$Stats NOTES
No ratings yet
Maths$Stats NOTES
50 pages
CS-601 Machine Learning Unit-1 New
No ratings yet
CS-601 Machine Learning Unit-1 New
70 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
Why Choose Jolly Phonics Flyer - 250125 - 035602
No ratings yet
Why Choose Jolly Phonics Flyer - 250125 - 035602
8 pages
ML Module 02
No ratings yet
ML Module 02
37 pages
Module 2 Rnsit
No ratings yet
Module 2 Rnsit
15 pages
ML Module-02
No ratings yet
ML Module-02
37 pages
POLITICAL SCIENCE Most Important Questions (Prashant Kirad) PDF Political Parties Elections
No ratings yet
POLITICAL SCIENCE Most Important Questions (Prashant Kirad) PDF Political Parties Elections
1 page
Module - 02 Machine Learning (BCS602)
No ratings yet
Module - 02 Machine Learning (BCS602)
45 pages
Mod2 Notes
No ratings yet
Mod2 Notes
72 pages
AI&ML Module 2
No ratings yet
AI&ML Module 2
65 pages
Mml-Book (1) Removed
No ratings yet
Mml-Book (1) Removed
371 pages
Module 2
No ratings yet
Module 2
107 pages
AIML Module - 4
No ratings yet
AIML Module - 4
25 pages
Android File Management Guide
No ratings yet
Android File Management Guide
19 pages
Mod 2
No ratings yet
Mod 2
39 pages
Module 2 Notes Bcs602
No ratings yet
Module 2 Notes Bcs602
19 pages
Module - 02 Machine Learning (BCS602)
No ratings yet
Module - 02 Machine Learning (BCS602)
31 pages
AIML Module4
No ratings yet
AIML Module4
44 pages
Hanon Complete Text
No ratings yet
Hanon Complete Text
129 pages
Applied Statistics - Lecture 1: Mario Beraha
No ratings yet
Applied Statistics - Lecture 1: Mario Beraha
52 pages
Machine - Learning - Chapter 1 and 2
No ratings yet
Machine - Learning - Chapter 1 and 2
70 pages
Module 2 ML Chapter2
No ratings yet
Module 2 ML Chapter2
64 pages
Master Thesis - BUILDING A RISK MODEL FOR OIL & GAS - Submitted by Himanshu Singh
No ratings yet
Master Thesis - BUILDING A RISK MODEL FOR OIL & GAS - Submitted by Himanshu Singh
56 pages
150+ Detailed Mathematics Questions and Answers
No ratings yet
150+ Detailed Mathematics Questions and Answers
7 pages
Module-2 Notes-Bcs602
No ratings yet
Module-2 Notes-Bcs602
18 pages
Prob Toc
No ratings yet
Prob Toc
12 pages
AI Module4
No ratings yet
AI Module4
17 pages
Machine Learning Module 2
No ratings yet
Machine Learning Module 2
58 pages
Pawan Transfer
No ratings yet
Pawan Transfer
2 pages
Phase Theory An Introduction Draft Citko B Download
100% (1)
Phase Theory An Introduction Draft Citko B Download
90 pages
Complete Bundle Methods in Behavioral Research 12th Edition Cozby
No ratings yet
Complete Bundle Methods in Behavioral Research 12th Edition Cozby
409 pages
AIML Module 4
No ratings yet
AIML Module 4
46 pages
Machine Learning Module 2 / DR Loganathan D / Cambridge Institute of Technology, Bangalore
No ratings yet
Machine Learning Module 2 / DR Loganathan D / Cambridge Institute of Technology, Bangalore
238 pages
Akshatha Paper
No ratings yet
Akshatha Paper
7 pages
Statistics Notes Based On Pattern Recognition and Machine Learning (PRML)
No ratings yet
Statistics Notes Based On Pattern Recognition and Machine Learning (PRML)
5 pages
Module 2
No ratings yet
Module 2
104 pages
Chapter 02 Understanding of Data
No ratings yet
Chapter 02 Understanding of Data
63 pages

Module 2

Uploaded by

Module 2

Uploaded by

Module 2- Machine Learning (BCS602)

2.6.1 Bivariate Statistics

2.7 MULTIVARIATE STATISTICS

2.8 ESSENTIAL MATHEMATICS FOR MULTIVARIATE DATA

2.8.1 Linear Systems and Gaussian Elimination for Multivariate Data

5. Then, the remaining unknown variables can be found by back-substitution as:

These concepts are illustrated in Example 2.8.

2.8.2 Matrix Decomposition

Introduction to Machine Learning and Probability/Statistics

 Importance: Machine learning relies heavily on statistics and probability to make

o Discrete Probability Distributions: For countable events (e.g., binomial, Poisson).

1. Normal Distribution (Gaussian Distribution)

 Shape: Bell curve, symmetric around the mean.

 Applications: Common in natural data (e.g., heights, exam scores).

 Definition: Equal probability for all outcomes within range [a,b].

Definition: Models time between events in a Poisson process

 Definition: For trials with two outcomes (success/failure).

 Definition: Models the number of events in a fixed interval of time.

 Definition: Models a single trial with two outcomes (success/failure).

 Goal: Estimate the probability density function (PDF) of data.

Parametric Density Estimation

1 Maximum Likelihood Estimation (MLE)

 Definition: A method for estimating the parameters of a distribution by maximizing the

Gaussian Mixture Model (GMM) and Expectation-Maximization (EM) Algorithm

 GMM: A probabilistic model assuming data is generated from a mixture of Gaussian

Non-Parametric Density Estimation Methods

2 k-Nearest Neighbors (KNN)

 Definition: Estimates density by considering the kkk closest neighbors.

2.10 FEATURE ENGINEERING AND DIMENSIONALITY REDUCTION TECHNIQUES

The features can be removed based on two aspects:

2.10.1 Stepwise Forward Selection

2.10.2 Stepwise Backward Elimination

2.10.3 Principal Component Analysis

The mean vector of the set of random vectors is defined as:

The PCA algorithm is as follows:

2.10.4 Linear Discriminant Analysis

The aim of LDA is to optimize the function:

2.10.5 Singular Value Decomposition

Based on the choice of retention, the compression can be controlled.

CHAPTER 3 - BASICS OF LEARNING THEORY

3.4 INTRODUCTION TO CONCEPT LEARNING

3.4.1 Representation of a Hypothesis

A hypothesis, denoted by h, is an approximation of the target function f. It represents the

3.4.2 Hypothesis Space

3.4.3 Heuristic Space Search

Heuristic search is a search strategy that finds an optimized hypothesis/solution to a

3.4.4 Generalization and Specialization

Searching the Hypothesis Space

Generalization – Specific to General Learning This learning methodology will search

Specialization – General to Specific Learning This learning methodology will search

3.4.5 Hypothesis Space Search by Find-S Algorithm

Limitations of Find-S Algorithm

3.4.6 Version Spaces

Candidate Elimination Algorithm

The diagrammatic representation of deriving the version space is shown below:

Deriving the Version Space

You might also like