0% found this document useful (0 votes)

35 views16 pages

Regression Techniques Explained

Regression analysis is used to estimate the relationship between variables and predict future or missing values. It fits a function to data by minimizing error. Common types include linear, polynomial, ridge, and lasso regression. Ridge and lasso address overfitting by regularizing weights. Bayesian regression finds a distribution of weights rather than single values.

Uploaded by

Navneet Lalwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views16 pages

Regression Techniques Explained

Uploaded by

Navneet Lalwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Regression Analysis in Machine Learning

Context:

In order to understand the motivation behind regression, let's consider

the following simple example. The scatter plot below shows the
number of college graduates in the US from the year 2001 to 2012.

Now based on the available data, what if someone asks you how many
college graduates with master's degrees will there be in the year 2018?
It can be seen that the number of college graduates with master’s
degrees increases almost linearly with the year. So by simple visual
analysis, we can get a rough estimate of that number to be between 2.0
to 2.1 million. Let's look at the actual numbers. The graph below plots
the same variable from the year 2001 to the year 2018. It can be seen
that our predicted number was in the ballpark of the actual value.
Since it was a simpler problem (fitting a line to data), our mind was
easily able to do that. This process of fitting a function to a set of data
points is known as regression analysis.

What is Regression Analysis?

Regression analysis is the process of estimating the relationship

between a dependent variable and independent variables. In simpler
words, it means fitting a function from a selected family of functions to
the sampled data under some error function. Regression analysis is one
of the most basic tools in the area of machine learning used for
prediction. Using regression you fit a function on the available data
and try to predict the outcome for the future or hold-out datapoints.
This fitting of function serves two purposes.

1. You can estimate missing data within your data range

(Interpolation)
2. You can estimate future data outside your data range
(Extrapolation)

Some real-world examples for regression analysis include predicting

the price of a house given house features, predicting the impact of
SAT/GRE scores on college admissions, predicting the sales based on
input parameters, predicting the weather, etc.

Let's consider the previous example of college graduates.

1. Interpolation: Let's assume we have access to somewhat sparse

data where we know the number of college graduates every 4 years,
as shown in the scatter plot below.

We want to estimate the number of college graduates for all the

missing years in between. We can do this by fitting a line to the limited
available data points. This process is called interpolation.
Extrapolation: Let’s assume we have access to limited data from the
year 2001 to the year 2012, and we want to predict the number of
college graduates from the year 2013 to 2018.

It can be seen that the number of college graduates with master’s

degrees increases almost linearly with the year. Hence, it makes sense
to fit a line to the dataset. Using the 12 points to fit a line, and then test
the prediction of this line on the future 6 points, it can be seen that the
prediction is very close.

Mathematically speaking

Types of regression analysis

Now let’s talk about different ways in which we can carry out
regression. Based on the family-of-functions (f_beta), and the loss
function (l) used, we can categorize regression into the following
categories.
1. Linear Regression

In linear regression, the objective is to fit a hyperplane (a line for 2D

data points) by minimizing the sum of mean-squared error for each
data point.

Mathematically speaking, linear regression solves the following

problem

Hence we need to find 2 variables denoted by beta that parameterize

the linear function f(.). An example of linear regression can be seen in
the figure 4 above where P=5. The figure also shows the fitted linear
function with beta_0 = -90.798 and beta_1 = 0.046

2. Polynomial Regression

Linear regression assumes that the relationship between the

dependant (y) and independent (x) variables are linear. It fails to fit the
data points when the relationship between them is not linear.
Polynomial regression expands the fitting capabilities of linear
regression by fitting a polynomial of degree m to the data points
instead. The richer the function under consideration, the better (in
general) its fitting capabilities. Mathematically speaking, polynomial
regression solves the following problem.

Hence we need to find (m+1) variables denoted by beta_0, …,beta_m.

It can be seen that linear regression is a special case of polynomial
regression with degree 2.

Consider the following set of data points plotted as a scatter plot. If we

use linear regression, we get a fit that clearly fails to estimate the data
points. But if we use polynomial regression with degree 6, we get a
much better fit as shown below

[Left] Scatter plot of data — [Center] Linear regression on data — [Right] Polynomial regression of

degree 6
Since the data points did not have a linear relationship between
dependant and independent variables, linear regression failed to
estimate a good fitting function. On the other hand, polynomial
regression was able to capture the non-linear relationship.

3. Ridge Regression

Ridge regression addresses the issue of overfitting in regression

analysis. To understand that, consider the same example as above.
When a polynomial of degree 25 is fit on the data with 10 training
points, it can be seen that it fits the red data points perfectly (center
figure below). But in doing so, it compromises other points in between
(spike between last two data points). This can be seen in the figure
below. Ridge regression tries to address this issue. It tries to minimize
the generalization error by compromising the fit on the training points.

Left] Scatter plot of data — [Center] Polynomial regression of degree 25— [Right] Polynomial Ridge

regression of degree 25

Mathematically speaking, ridge regression solves the following

problem by modifying the loss function.
The function f(x) can either be linear or polynomial. In the absence of
ridge regression, when the function overfits the data points, the
weights learned to tend to be pretty high. Ridge regression avoids over-
fitting by limiting the norm of the weights being learned by introducing
the scaled L2 norm of the weights (beta) in the loss function. Hence the
trained model trade-offs between fitting the data point perfectly (large
norm of the learned weights) and limiting the norm of the weights. The
scaling constant alpha>0 is used to control this trade-off. A small value
of alpha will result in higher norm weights and overfitting the training
data points. On the other hand, a large alpha value will result in a
function with a poor fit to the training data points but a very small
norm of the weights. Choosing the value of alpha carefully will yield the
best trade-off.

4. LASSO regression

LASSO regression is similar to Ridge regression as both of them are

used as regularizers against overfitting on the training data points. But
LASSO comes with an additional benefit. It enforces sparsity on the
learned weights.
Ridge regression enforces the norm of the learned weights to be small
yielding a set of weights where the total norm is reduced. Most of the
weights (if not all) will be non-zero. LASSO on the other hand tries to
find a set of weights by making most of them really close to zero. This
yields a sparse weight matrix whose implementation can be much
more energy-efficient than a non-sparse weight matrix while
maintaining similar accuracy in terms of fitting to the data points.

The figure below tries to visualize this idea on the same example as
above. The data points are fit using both the Ridge and Lasso
regression and their corresponding fit and weighs are plotted in
ascending order. It can be seen that most of the weights in the LASSO
regression are really close to zero.
Mathematically speaking, LASSO regression solves the following
problem by modifying the loss function.

The difference between LASSO and Ridge regression is that LASSO

uses the L1 norm of the weights instead of the L2 norm. This L1 norm
in the loss function tends to increase sparsity in the learned weights.

The constant alpha>0 is used to control the tradeoff between the fit
and the sparsity in the learned weights. A large value of alpha results in
poor fit but a sparser learned set of weights. On the other hand, a small
value of alpha results in a tight fit on training data points (might lead
to over-fitting) but with a less sparse set of weights.

5. ElasticNet Regression

ElasticNet regression is a combination of Ridge and LASSO regression.

The loss term includes both the L1 and L2 norm of the weights with
their respective scaling constants. It is often used to address the
limitations of LASSO regression such as the non-convex nature.
ElasticNet adds a quadratic penalty of the weights making it
predominantly convex.
Mathematically speaking, ElasticNet regression solves the following
problem by modifying the loss function.

6. Bayesian Regression

For the regression discussed above (the frequentists approach), the

goal is to find a set of deterministic values of weights (beta) that
explain the data. In Bayesian regression, instead of finding one value
for each weight, we rather try to find the distribution for these weights
assuming a prior.

So we start off with an initial distribution of the weights and based on

the

So we start off with an initial distribution of the weights and based on

the data nudge the distribution in the right direction by making use of
the Bayesian theorem that relates the prior distribution to posterior
distribution based on the likelihood and the evidence.
When we have infinite data points, the posterior distribution of the
weights becomes an impulse at the solution of ordinary least square
solution i.e. the variance approaches zero.

Finding the distribution of weights instead of a single set of

deterministic values serves two purposes

1. t naturally guards against the issue of overfitting hence acting as a

regularizer

2. It provides confidence and a range for weights which makes more

logical sense than just returning one value.

Let us mathematically formulate the problem and state its solution.

Let us a Gaussian prior on the weights with mean μ and

covariance Σ i.e

Based on the available data D, we update this distribution. For the

problem at hand, the posterior will be a gaussian distribution with the
following parameters
7. Logistic Regression

Logistic regression comes in handy in the classification tasks where the

output needs to be the conditional probability of the output given the
input. Mathematically speaking, logistic regression solves the following
problem

Consider the following example where the data points belong to one of
the two categories: {0 (red), 1 (yellow)} as shown in the scatter plot
below.
[left] Scatter plot of data points — [Right] Logistic regression trained on data
points plotted in blue

Logistic regression uses a sigmoid function at the output of the linear

or polynomial function to map the output from (-♾, ♾)to (0, 1). A
threshold (usually 0.5) is then used to categorize the test data into one
of the two categories.

This may seem like Logistic regression is not regression but a

classification algorithm. But that is not the case. You can find
more about it here in Adrian’s post.

https://towardsdatascience.com/a-beginners-guide-to-regression-analysis-in-machine-learning-
8a828b491bbf

Introduction To Real Analysis Lee Larson
No ratings yet
Introduction To Real Analysis Lee Larson
203 pages
217 Plus
No ratings yet
217 Plus
3 pages
Unit 2
No ratings yet
Unit 2
26 pages
Lecture+Notes+-+Advanced+Regression
No ratings yet
Lecture+Notes+-+Advanced+Regression
12 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
DA Unit-3
No ratings yet
DA Unit-3
14 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
26 pages
Unit 2
No ratings yet
Unit 2
92 pages
OS Questionbank Endsem
No ratings yet
OS Questionbank Endsem
29 pages
9 Types of Regression Analysis
No ratings yet
9 Types of Regression Analysis
16 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Unit 3 Da
No ratings yet
Unit 3 Da
20 pages
Regression Analysis for ML Beginners
No ratings yet
Regression Analysis for ML Beginners
12 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
4 ML
No ratings yet
4 ML
41 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
11 pages
Unit 3
No ratings yet
Unit 3
48 pages
DAV 2201079 Exp 2 2-1
No ratings yet
DAV 2201079 Exp 2 2-1
35 pages
Types of Regression
No ratings yet
Types of Regression
8 pages
Unit 2
No ratings yet
Unit 2
67 pages
Unit III
No ratings yet
Unit III
24 pages
Unit III
No ratings yet
Unit III
11 pages
Unit 2 Notes - Final
No ratings yet
Unit 2 Notes - Final
32 pages
Unit III
No ratings yet
Unit III
18 pages
Unit - Iii Data Analysis
No ratings yet
Unit - Iii Data Analysis
39 pages
2 Regression Models
No ratings yet
2 Regression Models
6 pages
Unit - II - DA
No ratings yet
Unit - II - DA
22 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Complete
No ratings yet
Complete
12 pages
Module 5.2
No ratings yet
Module 5.2
51 pages
Regression Analysis in Machine Learning: Temperature, Age, Salary, Price
No ratings yet
Regression Analysis in Machine Learning: Temperature, Age, Salary, Price
12 pages
Aml 3
No ratings yet
Aml 3
19 pages
Data Analytics Unit 2
No ratings yet
Data Analytics Unit 2
13 pages
ML Points
No ratings yet
ML Points
13 pages
Classification & Regression Models
No ratings yet
Classification & Regression Models
32 pages
Regression Analysis: Post Mid Assignment Topic
No ratings yet
Regression Analysis: Post Mid Assignment Topic
8 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Advance Machine Learning
No ratings yet
Advance Machine Learning
16 pages
UNIT II Regration
No ratings yet
UNIT II Regration
62 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
Unit - Iii
No ratings yet
Unit - Iii
9 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
SumitBurnwal ML
No ratings yet
SumitBurnwal ML
13 pages
Linear & Polynomial Regression Guide
No ratings yet
Linear & Polynomial Regression Guide
56 pages
Regression
No ratings yet
Regression
45 pages
Module 3
No ratings yet
Module 3
34 pages
Unit - 2 MLA
No ratings yet
Unit - 2 MLA
57 pages
Unit 2
No ratings yet
Unit 2
8 pages
Da Module 3
No ratings yet
Da Module 3
54 pages
Assignment Group C
No ratings yet
Assignment Group C
8 pages
Regression: UNIT - V Regression Model
100% (1)
Regression: UNIT - V Regression Model
21 pages
Unit-Iii-1 1
No ratings yet
Unit-Iii-1 1
31 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Regression Analysis Guide
No ratings yet
Regression Analysis Guide
13 pages
Chapter2 - Optimisation
No ratings yet
Chapter2 - Optimisation
7 pages
Regression
No ratings yet
Regression
16 pages
21csc305p ML Unit 2
No ratings yet
21csc305p ML Unit 2
115 pages
Linear Regression Case Study
No ratings yet
Linear Regression Case Study
6 pages
Rohini 73149042113
No ratings yet
Rohini 73149042113
11 pages
Frank 144 9330
No ratings yet
Frank 144 9330
2 pages
WV47RS 1729818436814
No ratings yet
WV47RS 1729818436814
1 page
Neural Network Activation Guide
No ratings yet
Neural Network Activation Guide
11 pages
Neural Networks Explained for Beginners
No ratings yet
Neural Networks Explained for Beginners
9 pages
What Is Distribution?
No ratings yet
What Is Distribution?
4 pages
Why Convexity Is The Key To Optimization: Convex Sets
No ratings yet
Why Convexity Is The Key To Optimization: Convex Sets
4 pages
Limitations of ML
No ratings yet
Limitations of ML
10 pages
Asymptotic and Perturbation Methods
No ratings yet
Asymptotic and Perturbation Methods
129 pages
Each Problem Is Worth 5 Points: NO Partial Credit Will Be Given. The Use of Calculators Is Prohibited
No ratings yet
Each Problem Is Worth 5 Points: NO Partial Credit Will Be Given. The Use of Calculators Is Prohibited
9 pages
The Scientific Approach and Alternative Approahes To Investigation
No ratings yet
The Scientific Approach and Alternative Approahes To Investigation
21 pages
Maxima Notes 5 Simplify
No ratings yet
Maxima Notes 5 Simplify
13 pages
STEM - BC11D IIIh 1
No ratings yet
STEM - BC11D IIIh 1
4 pages
Presenting Numerical Data Education Presentation in Blue Cream Yellow Bold Geometric Style
No ratings yet
Presenting Numerical Data Education Presentation in Blue Cream Yellow Bold Geometric Style
22 pages
Need Analysis
No ratings yet
Need Analysis
15 pages
Ge3791 Hv&ethics
No ratings yet
Ge3791 Hv&ethics
7 pages
Project Appraisal Course Guide
No ratings yet
Project Appraisal Course Guide
2 pages
Classification Problems
100% (1)
Classification Problems
25 pages
Well Function Calculator Guide
No ratings yet
Well Function Calculator Guide
23 pages
1 Laplace Transforms - Notes PDF
No ratings yet
1 Laplace Transforms - Notes PDF
16 pages
Pakistani Fiction
No ratings yet
Pakistani Fiction
307 pages
Variable End Point
No ratings yet
Variable End Point
227 pages
Watt's Speed Governor For
No ratings yet
Watt's Speed Governor For
18 pages
Bayes Lectures English
No ratings yet
Bayes Lectures English
74 pages
HPLC Citrate Analysis Method Development
No ratings yet
HPLC Citrate Analysis Method Development
4 pages
Complex Numbers Worksheet
No ratings yet
Complex Numbers Worksheet
3 pages
Absolute Value and Piecewise Functions Test
100% (1)
Absolute Value and Piecewise Functions Test
3 pages
Lecture 8 Tolerance
No ratings yet
Lecture 8 Tolerance
22 pages
Project Development Cycle
No ratings yet
Project Development Cycle
2 pages
Applied Analysis Exam 2010
No ratings yet
Applied Analysis Exam 2010
15 pages
047 U-Substitution-In-Definite-Integrals
No ratings yet
047 U-Substitution-In-Definite-Integrals
6 pages
Modul 1: Teknik Optimasi
No ratings yet
Modul 1: Teknik Optimasi
2 pages
Note - Unit-5
No ratings yet
Note - Unit-5
19 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Typical Takagi-Sugeno PI and PD Fuzzy Controllers: Analytical Structures and Stability Analysis
No ratings yet
Typical Takagi-Sugeno PI and PD Fuzzy Controllers: Analytical Structures and Stability Analysis
18 pages
CCB4023 PDP II Briefing PDF
No ratings yet
CCB4023 PDP II Briefing PDF
22 pages

Regression Techniques Explained

Uploaded by

Regression Techniques Explained

Uploaded by

Regression Analysis in Machine Learning

In order to understand the motivation behind regression, let's consider

What is Regression Analysis?

Regression analysis is the process of estimating the relationship

1. You can estimate missing data within your data range

Some real-world examples for regression analysis include predicting

Let's consider the previous example of college graduates.

1. Interpolation: Let's assume we have access to somewhat sparse

We want to estimate the number of college graduates for all the

It can be seen that the number of college graduates with master’s

Types of regression analysis

In linear regression, the objective is to fit a hyperplane (a line for 2D

Mathematically speaking, linear regression solves the following

Hence we need to find 2 variables denoted by beta that parameterize

Linear regression assumes that the relationship between the

Hence we need to find (m+1) variables denoted by beta_0, …,beta_m.

Consider the following set of data points plotted as a scatter plot. If we

[Left] Scatter plot of data — [Center] Linear regression on data — [Right] Polynomial regression of

Ridge regression addresses the issue of overfitting in regression

Left] Scatter plot of data — [Center] Polynomial regression of degree 25— [Right] Polynomial Ridge

Mathematically speaking, ridge regression solves the following

LASSO regression is similar to Ridge regression as both of them are

The difference between LASSO and Ridge regression is that LASSO

ElasticNet regression is a combination of Ridge and LASSO regression.

For the regression discussed above (the frequentists approach), the

So we start off with an initial distribution of the weights and based on

So we start off with an initial distribution of the weights and based on

Finding the distribution of weights instead of a single set of

1. t naturally guards against the issue of overfitting hence acting as a

2. It provides confidence and a range for weights which makes more

Let us mathematically formulate the problem and state its solution.

Let us a Gaussian prior on the weights with mean μ and

Based on the available data D, we update this distribution. For the

Logistic regression comes in handy in the classification tasks where the

Logistic regression uses a sigmoid function at the output of the linear

This may seem like Logistic regression is not regression but a

You might also like