0% found this document useful (0 votes)

38 views7 pages

02 Regression and Classification Problems

The document discusses regression and classification problems in data analysis, defining regression as supervised learning for continuous output and classification for discrete output. It explains linear regression types, evaluation metrics like Mean Squared Error and Root Mean Squared Error, and introduces logistic regression for predicting categorical outcomes. The document also provides examples of parameter estimation and application of logistic regression in predicting outcomes based on specific data.

Uploaded by

meghanaalluri2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views7 pages

02 Regression and Classification Problems

Uploaded by

meghanaalluri2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Regression and Classification Problems

The expression multivariate analysis is used to describe analyses of data that are
multivariate in the sense that numerous observations or variables are obtained for each
individual or unit studied.
Regression Problems:
• Supervised learning problems where the output is a continuous value are called as
regression problems.
• The Regression technique is used for predicting a continuous value.
• For example, predicting things like the price of a house based on its characteristics,
or to estimate the Co2 emission from a car’s engine, etc.

Regression Analysis In statistical modeling, regression analysis is a set of statistical

processes for estimating the relationships among variables.
Regression analysis is a predictive modelling technique. It estimates the relationship
between the input variables (x) and the output variable (y). Regression is a problem of
predicting the value 𝑌� (or response) given the values of the input variables x 1,x2, ...,x𝑝�
(or predictors).
• In linear regression, we assume that the function 𝑓�(𝑋�) corresponding to the
relationship 𝑌� = 𝑓�(x1,x2, ...,x𝑝�) is linear.
• The task is to find coefficients for the linear model (parameter estimation).
There are two types of Linear Regression models:
Simple Linear Regression:
• When there is a single input variable (x), the method is referred to as simple
linear regression.
• Predict Co2emission using EngineSize of all cars
• Independent variable (x): EngineSize
• Dependent variable (y): Co2emission
Multiple Linear Regression:
• When there are multiple input variables, literature from statistics often refers to
the method as multiple linear regression.
• Predict Co2emission using EngineSize and Cylinders of all cars
• Independent variables (x): EngineSize, Cylinders
• Dependent variable (y): Co2emission

Simple Linear Regression

• The simplest mathematical relationship between two variables x and y is a linear
relationship:
y = β0 + β1x
• x: the input, or independent, or predictor, or explanatory variable (usually
known).
• y: the output, or dependent, or response, or study variable.
• Objective: to find out the parameters.
• The points (x1, y1), …, (xn, yn) resulting from n independent observations will then
be scattered about the true regression line:

• The simple linear regression model is:

y = β0 + β1x c +
where:
b0 and b1 are called parameters of the model,
e is a random variable called the error term.
Evaluation Metrics in Regression Models:
Evaluation metrics are used to explain the performance of a model. As mentioned,
basically, we can compare the actual values and predicted values, to calculate the
accuracy of our regression model.
A residual is a measure of how far away a point is from the regression line. Simply, it is
the error between a predicted value and the observed actual value.
Mean Squared Error (MSE) is the mean of the squared error. It's more popular than
mean absolute error because the focus is geared more towards large errors. This is due
to the squared term exponentially increasing larger errors in comparison to smaller ones.

Root Mean Squared Error (RMSE) is the square root of the mean squared error. This
is one of the most popular of the evaluation metrics because root mean squared error is
interpretable in the same units as the response vector or y units, making it easy to relate
its information.

Estimation of Parameters in Simple Linear Regression using Ordinary Least

Squares:
Ordinary Least Squares (OLS) works by minimizing the sum of the squares of the
differences between the observed dependent variable in the given dataset and those
predicted by the linear function.
This method allows finding such estimators 𝛽 ̂0 and 𝛽
̂1 for parameters β0 and β1 that
minimize the sum of squared errors 𝜀�(β0, β1) in the observed 𝑛� experiments.
In other words, we minimize the function

and find the arguments minimizing the function.

To solve the minimization problem, we can use the following theorem.
Theorem: The minimum of the function

is unique and attained when

Where 𝑋̅ is the mean of x values, and 𝑌̅ is the mean of y values.

Example – Dataset of patient's age and their blood pressure

Our aim is to find the regression line:

𝑋̅ = 491/10= 49.1, and 𝑌̅ = 1410/10= 141

The slope (β1) can be calculated as: β1= 2335/2048.9 =1.14
The intercept (β0) is calculated as: β0= 141-1.14*49.1 = 85.026
• Now substitute the regression coefficients into the regression equation
• Estimated blood pressure:
(Ŷ) = 85.026 + 1.14 * 𝑎�𝑔�𝑒�

Classification Problems:
• The problems where the output is a discrete value are called as classification
problems.
• Classification is the process of predicting a discrete class label, or categories.
• For example, if a cell is benign or malignant, if an email is spam or not.
• The classification problem not necessarily has only two outcomes, which means
it isn’t limited to two classes. For example, the problem of handwritten digit
recognition (that is a classification problem) has ten outcome

Logistic Regression
Logistic regression is a classification algorithm designed to predict categorical target
labels based on historical feature data. It allows us to predict the probability of a
dependent variable given an input, and a model. Logistic regression can be used for both
binary classification and multi-class classification.
Sigmoid Function
Logistic Regression uses the sigmoid function also known as the logistic function to
perform classification. The sigmoid function takes in any value and map it into a
value between 0 and 1. The key thing to notice here is that it doesn’t matter what value
of y you put into the logistics or the sigmoid function you’ll always get a value between
0 and 1. This means we can take our linear regression solution and place it into
the sigmoid function and it looks something like this:

• We can formulate the algorithm for predicting the class of the new object x with
the predictors (x1, x2, ..., x𝑝�) once the coefficients β0, β1, ..., β𝑝� are found.
1. Calculate the value 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥 2 + ⋯ + 𝛽𝑝 𝑥 𝑝
2. Calculate the probability P:
3. If P ≥ 0.5, the object x will fall into the class 1 or 0 otherwise.
(In practice, the choice of a probability cut-off is up to a researcher)

Let’s apply the logistic regression algorithm to specific data.

• Our data is football statistics. It has three predictors, including shots on target
(𝑋�1), possession (𝑋� 2), and shots (𝑋�3).
• The response 𝑌� takes only two values. The value 1 corresponds to a win (class
+1), and the value 0 is a loss or draw (class 0).
• The training data provides the following values of the model parameters:
β0= −0.046, β1=0.541, β2= −0.014, β3= −0.132.
• We classify the new object 𝑧�:
𝑧� = (1, 40, 3).
• It’s a team that had 1 shot on target, 40 percent of possession, and 3 shots.
According to the described algorithm, the probability that the team wins equals:
1
P+= 1+𝑒 −(𝛽0 +𝛽1 𝑥1+𝛽2 𝑥2 +𝛽3 𝑥3 )
1
=1+𝑒 −(−0.046+0.541∗1−0.014∗40−0.132∗3)
=0.38
• It means that it will likely lose.

Gradient Descent As Quadratic Approximation
No ratings yet
Gradient Descent As Quadratic Approximation
62 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Regression
No ratings yet
Regression
11 pages
Unit-2 Supervised Machine Learning
No ratings yet
Unit-2 Supervised Machine Learning
132 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
No ratings yet
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
12 pages
Chapter 8
No ratings yet
Chapter 8
39 pages
Module 4
No ratings yet
Module 4
41 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
No ratings yet
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
19 pages
Module 2-Supervised Learning
No ratings yet
Module 2-Supervised Learning
74 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Unit-2 ML
No ratings yet
Unit-2 ML
39 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Lecture Material 11
No ratings yet
Lecture Material 11
14 pages
Classification & Regression Models
No ratings yet
Classification & Regression Models
32 pages
Unit 2
No ratings yet
Unit 2
19 pages
LEC2 مشين
No ratings yet
LEC2 مشين
116 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
5 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Lec 6
No ratings yet
Lec 6
19 pages
Regression
No ratings yet
Regression
14 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
56 pages
Types of Supervised Learning2
No ratings yet
Types of Supervised Learning2
66 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
21 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Supervised Learning Essentials
No ratings yet
Supervised Learning Essentials
30 pages
SumitBurnwal ML
No ratings yet
SumitBurnwal ML
13 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Wa0003
No ratings yet
Wa0003
20 pages
Linear Regression Techniques Explained
100% (1)
Linear Regression Techniques Explained
44 pages
Machine Learning: Regression & Trees
No ratings yet
Machine Learning: Regression & Trees
17 pages
Article Module 4
No ratings yet
Article Module 4
8 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Module1.4 Regression
No ratings yet
Module1.4 Regression
24 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
Linear Regression A Foundational ML Algorithm
No ratings yet
Linear Regression A Foundational ML Algorithm
10 pages
U3 U4 Regression
No ratings yet
U3 U4 Regression
22 pages
Lecture 3 - Linear Regression Imran 20022025 092939am
No ratings yet
Lecture 3 - Linear Regression Imran 20022025 092939am
46 pages
Day.9 SML
No ratings yet
Day.9 SML
23 pages
ML Unit2
No ratings yet
ML Unit2
69 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
23 pages
Regression v33
No ratings yet
Regression v33
81 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Unit 2
No ratings yet
Unit 2
34 pages
Linear Regression Explained
No ratings yet
Linear Regression Explained
26 pages
Unit 2
No ratings yet
Unit 2
26 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Accuracy Assessment and Confusion Matrix
No ratings yet
Accuracy Assessment and Confusion Matrix
23 pages
Machine Learning for Data Analysts
No ratings yet
Machine Learning for Data Analysts
201 pages
Regression
No ratings yet
Regression
45 pages
03 Parametric Families of Distributions
No ratings yet
03 Parametric Families of Distributions
4 pages
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
05 Pictorial and Tabular Methods in Descriptive Inference
No ratings yet
05 Pictorial and Tabular Methods in Descriptive Inference
5 pages
R Plotting Code Outputs
No ratings yet
R Plotting Code Outputs
1 page
01 Hidden Markov Models
No ratings yet
01 Hidden Markov Models
3 pages
PG R 23 M.tech CSE Syllabus
No ratings yet
PG R 23 M.tech CSE Syllabus
127 pages
NP Completeness
No ratings yet
NP Completeness
18 pages
Unit 1
No ratings yet
Unit 1
122 pages
M.Com Computer Applications Syllabus
No ratings yet
M.Com Computer Applications Syllabus
37 pages
Tutorial5 Logic
No ratings yet
Tutorial5 Logic
21 pages
Google Cloud & ML Specialization Guide
100% (5)
Google Cloud & ML Specialization Guide
25 pages
Nonparametric Econometrics. A Primer
No ratings yet
Nonparametric Econometrics. A Primer
103 pages
Project 3
No ratings yet
Project 3
4 pages
A Zaenal Mufaqih - Tugas6
No ratings yet
A Zaenal Mufaqih - Tugas6
6 pages
Time Series Analysis 2
No ratings yet
Time Series Analysis 2
12 pages
Ullah 2019
No ratings yet
Ullah 2019
10 pages
Implementation of An AIoT-based Intelligent Water Resources
No ratings yet
Implementation of An AIoT-based Intelligent Water Resources
15 pages
Intermittent Demand Forecasting Guide
100% (4)
Intermittent Demand Forecasting Guide
54 pages
Machine Learning Based Prediction of Flyrock Distance in Rock Blasting
No ratings yet
Machine Learning Based Prediction of Flyrock Distance in Rock Blasting
16 pages
Eswacorrectedproff
No ratings yet
Eswacorrectedproff
10 pages
ML-Module 3
No ratings yet
ML-Module 3
64 pages
Linreg
No ratings yet
Linreg
54 pages
Radio Propagation - Large-Scale Path Loss: CS 515 Mobile and Wireless Networking
No ratings yet
Radio Propagation - Large-Scale Path Loss: CS 515 Mobile and Wireless Networking
95 pages
Physics and Chemistry of The Earth
No ratings yet
Physics and Chemistry of The Earth
9 pages
Cia 1.1
No ratings yet
Cia 1.1
7 pages
Manual Res2dinv Software (2003)
100% (1)
Manual Res2dinv Software (2003)
129 pages
Cheat Sheet Linear and Logistic Regression
No ratings yet
Cheat Sheet Linear and Logistic Regression
2 pages
Chapter 7 (Time Series Analysis - Forecasting)
No ratings yet
Chapter 7 (Time Series Analysis - Forecasting)
36 pages
Entropy 24 00713 v2
No ratings yet
Entropy 24 00713 v2
12 pages
Perbandingan Peramalan Penjualan Produk Aknil PT - Sunthi Sepurimengguanakan Metode Single Moving Average Dan Single
No ratings yet
Perbandingan Peramalan Penjualan Produk Aknil PT - Sunthi Sepurimengguanakan Metode Single Moving Average Dan Single
8 pages
Main EL CM1 2 2023
No ratings yet
Main EL CM1 2 2023
72 pages
UNIT-III Lecture Notes
No ratings yet
UNIT-III Lecture Notes
18 pages
Lee Iterative Filter Adaptive Network For Single Image Defocus Deblurring CVPR 2021 Paper
No ratings yet
Lee Iterative Filter Adaptive Network For Single Image Defocus Deblurring CVPR 2021 Paper
9 pages
Ds Lab-1
No ratings yet
Ds Lab-1
40 pages
Industrial Crops and Products: Chun Chang, Yusheng Qin, Xiaolan Luo, Yebo Li
No ratings yet
Industrial Crops and Products: Chun Chang, Yusheng Qin, Xiaolan Luo, Yebo Li
7 pages
Risk Control Presentation
No ratings yet
Risk Control Presentation
6 pages
Luczak Et Just 2021
No ratings yet
Luczak Et Just 2021
21 pages
Sample Final Exam: Regression Analysis
No ratings yet
Sample Final Exam: Regression Analysis
12 pages
Forecasting
No ratings yet
Forecasting
126 pages

02 Regression and Classification Problems

Uploaded by

02 Regression and Classification Problems

Uploaded by

Regression and Classification Problems

Regression Analysis In statistical modeling, regression analysis is a set of statistical

Simple Linear Regression

• The simple linear regression model is:

Estimation of Parameters in Simple Linear Regression using Ordinary Least

and find the arguments minimizing the function.

is unique and attained when

Example – Dataset of patient's age and their blood pressure

Our aim is to find the regression line:

𝑋̅ = 491/10= 49.1, and 𝑌̅ = 1410/10= 141

Let’s apply the logistic regression algorithm to specific data.

You might also like