Artificial Intelligence
(CSE3007)
Unit – 04 (Part-I)
Machine Learning Algorithms
Dr. Susant Kumar Panigrahi
Assistant Professor
School of Electrical & Electronics Engineering
What is Regression?
• The main goal of regression is the construction of an efficient model
to predict the dependent attributes from a bunch of attribute
variables. A regression problem is when the output variable is either
real or a continuous value i.e. salary, weight, area, etc.
• We can also define regression as a statistical means that is used in
applications like housing, investing, etc. It is used to predict the
relationship between a dependent variable and a bunch of
independent variables.
Examples
Applications of Regression
1. Evaluating Trends and Sales Estimates
• Linear regressions can be used in business to
evaluate trends and make estimates or forecasts.
• For example, if a company’s sales have increased
steadily every month for the past few years,
conducting a linear analysis on the sales data
with monthly sales on the y-axis and time on the
x-axis would produce a line that that depicts the
upward trend in sales. After creating the trend
line, the company could use the slope of the line
to forecast sales in future months.
2. Analyzing the Impact of Price Changes
• Linear regression can also be used to analyze the
effect of pricing on consumer behavior.
• For example, if a company changes the price on a
certain product several times, it can record the
quantity it sells for each price level and then
performs a linear regression with quantity sold as
the dependent variable and price as the
explanatory variable. The result would be a line
that depicts the extent to which consumers reduce
their consumption of the product as prices
increase, which could help guide future pricing
decisions.
3. Assessing Risk
Simple Linear Regression
• One of the most interesting
and common regression
technique is simple linear
regression. In this, we predict
the outcome of a dependent
variable based on the
independent variables, the
relationship between the
variables is linear. Hence, the
word linear regression.
• Simple linear regression is a regression
technique in which the independent variable
has a linear relationship with the dependent
variable. The straight line in the diagram is
the best fit line.
• The main goal of the simple linear regression
is to consider the given data points and plot
the best fit line to fit the model in the best
way possible.
The Main Idea of Least Square and Linear Regression
Data Points of some
observations
Dependent Variable
But which among these lines best fit
Independent Variable
the data for future prediction..!!
The Main Idea of Least Square and Linear Regression
Lets measure how well this line fits the data….
Lets start with a worst case scenario….
So far the distance between the data points and the
line is:
The Main Idea of Least Square and Linear Regression
So far the distance between the data points and the
line is:
The Main Idea of Least Square and Linear Regression
Finally …….
So to make the cost positive and more
mathematically meaningful, each difference
terms are squared and added together to find
the fit:
= 24.62
This measure indicate how well the line
fits the data
The Main Idea of Least Square and Linear Regression
Rotate the line a little bit and check how well it
fits = 18.72
Rotate the line a little bit more and check how
well it fits = 14.05
The Main Idea of Least Square and Linear Regression
Rotate the line a whole lot then how well it fits
= 31.71
The Main Idea of Least Square and Linear Regression
There is a sweet spot between the horizontal
line and the last case of “whole lot rotated
line” for which we may get the optimal value of
the fit.
The generic line equation for the above linear
regression is:
or Slope
y- intercept
We need to find out the optimum value of and
so that we minimize the sum of squared
As we are looking to find the value
residual.
of m and c so that we will get
Mathematically: smallest sum of residual, so it is
Sum of squared residual = called as “Least Square”
The Main Idea of Least Square and Linear Regression
How do we find the optimal rotation:
“We take the derivative of this
function”
Derivative tells us the slope of the function at
every point…
Notice: The slope at the best point (the “Least
Square”) is zero.
Different rotations are the different values of slope m and y-intercept c.
The big concepts…!!!!!
•• We
want to minimize the squares of the distance between the
observed value and the line.
• We do this by taking the derivative and finding the values of
slope and y-intercept where it is equal to zero.
• The final line minimizes the sums of squares (“least square”)
between it and the real data.
Understanding Linear Regression Algorithm
Mean of
Mean of
Centroid ()
The best fit regression line must pass
through the centroid.
So we need to find out the equation of line that should pass
through the centroid point using least square approach.
Finding the equation of line …..
The generic line equation for the above linear
regression is:
or
𝑐=𝑦 −𝑚 𝑥
4
𝑚= =0 . 4
𝑐=3.6−0.4×3=2.4
10
The Predicted Line…..
The Predicted Line…..
Goodness of fit…. – R2
WHAT IS R-SQUARED?
R-squared is a statistical measure of how close the data are to the fitted
regression line.
It is also known as the coefficient of determination, or the coefficient of
multiple determination for multiple regression.
The definition of R-squared is fairly straight-forward; it is the percentage of
the response variable variation that is explained by a linear model.
R-squared = Explained variation / Total variation
Calculation of – R2
Calculation of – R2
2
𝑅 ≈0.3
Interpretation of values of R2
R2=1
Regression line is a Perfect fit
on actual values
R2=0
There is larger distance
between Actual and predicted
values.
Advantages And Disadvantages
Advantages Disadvantages
Linear regression performs exceptionally The assumption of linearity between
well for linearly separable data dependent and independent variables
Easier to implement, interpret and It is often quite prone to noise and
efficient to train overfitting
It handles overfitting pretty well using Linear regression is quite sensitive to
dimensionally reduction techniques,
regularization, and cross-validation outliers
One more advantage is the extrapolation It is prone to multicollinearity
beyond a specific data set
Solve it
• Use least-squares regression to fit a straight line to
• Also find the goodness of fit. Analyze the result.
Logistic Regression
What is Regression?
• Regression analysis is a powerful statistical analysis technique. A dependent
variable of our interest is used to predict the values of other independent
variables in a data-set.
• We come across regression in an intuitive way all the time. Like predicting the
weather using the data-set of the weather conditions in the past.
• It uses many techniques to analyses and predict the outcome, but the
emphasis is mainly on relationship between dependent variable and one or
more independent variable.
• Logistic regression analysis predicts the outcome in a binary variable which
has only two possible outcomes.
What Is Logistic Regression?
• Logistic regression is a classification algorithm, used when the
value of the target variable is categorical in nature. Logistic
regression is most commonly used when the data in question
has binary output, so when it belongs to one class or another, or
is either a 0 or 1.
• Remember that classification tasks have discrete categories,
unlike regressions tasks.
• Logistic Regression is a Machine Learning algorithm
which is used for the classification problems, it is a
predictive analysis algorithm and based on the
concept of probability.
Logistic Regression
• It is a technique to analyze a data-set which has a dependent
variable and one or more independent variables to predict
the outcome in a binary variable, meaning it will have only
two outcomes.
• The dependent variable is categorical in nature. Dependent
variable is also referred as target variable and the
independent variables are called the predictors.
• Logistic regression is a special case of linear regression
where we only predict the outcome in a categorical
variable. It predicts the probability of the event using
the log function.
• We use the Sigmoid function/curve to predict the
categorical value. The threshold value decides the
outcome(win/lose).
• We can call a Logistic Regression a Linear Regression model but the Logistic
Regression uses a more complex cost function, this cost function can be
defined as the ‘Sigmoid function’ or also known as the ‘logistic function’
instead of a linear function.
• The hypothesis of logistic regression tends it to limit the cost function between
0 and 1. Therefore linear functions fail to represent it as it can have a value
greater than 1 or less than 0 which is not possible as per the hypothesis of
logistic regression.
What is the Sigmoid Function?
• In order to map predicted values to probabilities, we use the
Sigmoid function. The function maps any real value into
another value between 0 and 1. In machine learning, we use
sigmoid to map predictions to probabilities.
• The sigmoid function/logistic function is a function that
resembles an “S” shaped curve when plotted on a graph. It
takes values between 0 and 1 and “squishes” them towards
the margins at the top and bottom, labeling them as 0 or 1.
• The equation for the Sigmoid function is this:
• What is the variable e in this instance? The e represents the
exponential function or exponential constant, and it has a value of
approximately 2.71828.
Example:
Age group people those who either brought
insurance or not.
Have_insurance = 1 (Brought Insurance)
Have_insurance = 0 (Have no Insurance)
Applying Linear Regression
Applying Linear Regression Thesholding
Likely to buy insurance
Applying Linear Regression Thesholding
[Lets assume we have another extreme value]
Unlikely to buy
insurance
New predictions are more
erroneous
Sigmoid or Logit Function
1
𝑆 ( 𝑦 )= 𝑦
1+ 𝑒
Linear Regression and Logistic Regression
Relationship
Linear Regression Logistic Regression
1. Definition To predict a continuous dependent To predict a categorical dependent
variable based on values of variable based on values of
independent variable independent variables
2. Variable Type Continuous dependent variable Categorical dependent variable
3. Estimation Method Least square estimation Maximum likelihood estimation
4. Equation Y= a0+a1x Log()=a0+a1x1+a2x2 +…+ anxn
5. Best fit line Straight line Curve
6. Relationship between Linear Non linear
dependent and independent
variable
7. Output Predicted Integer value Predicted binary value (0/1)
Types Of Logistic Regression
• Binary logistic regression – It has
only two possible outcomes.
Example- yes or no
• Multinomial logistic regression – It
has three or more nominal
categories. Example- cat, dog,
elephant.
• Ordinal logistic regression- It has
three or more ordinal categories,
ordinal meaning that the categories
will be in a order. Example- user
ratings(1-5).