Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
22 views38 pages

Linear Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views38 pages

Linear Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Linear Regression

Linear Regression
• Introduction-
• In the most simple words, Linear Regression is the supervised Machine
Learning model in which the model finds the best fit linear line between
the independent and dependent variable
• i.e it finds the linear relationship between the dependent and
independent variable.
• Linear Regression is a linear model that assumes a linear relationship
between input variables (independent variables ‘x’) and output variable
(dependent variable-’y’) such that ‘y’ can be calculated from a linear
combination of input variables(x).
• For single input variable, method is referred to as Simple Linear Linear
Regression
• whereas for multiple input variables it is referred to as Multiple Linear
Regression.
Linear Regression
• Equation of Simple Linear Regression, where bo is the
intercept, b1 is coefficient or slope, x is the independent
variable and y is the dependent variable.

• Equation of Multiple Linear Regression, where bo is the


intercept, b1,b2,b3,b4…,bn are coefficients or slopes of the
independent variables x1,x2,x3,x4…,xn and y is the dependent
variable.
Linear Regression
• A Linear Regression model’s main aim is to find the
best fit linear line and the optimal values of intercept
and coefficients such that the error is minimized.
Error is the difference between the actual value and
Predicted value and the goal is to reduce this difference.
• Let’s understand this with the help of a diagram.
Linear Regression
• In the diagram:
• x is our dependent variable which is plotted on the x-axis and y is
the dependent variable which is plotted on the y-axis.
• Black dots are the data points i.e the actual values.
• bo is the intercept which is 10 and b1 is the slope of the x variable.
• The blue line is the best fit line predicted by the model i.e the
predicted values lie on the blue line.
• The vertical distance between the data point and the regression
line is known as error or residual. Each data point has one
residual and the sum of all the differences is known as the Sum of
Residuals/Errors.
Linear Regression
• Mathematical Approach:
• Residual/Error = Actual values – Predicted Values
• Sum of Residuals/Errors = Sum(Actual- Predicted
Values)
• Square of Sum of Residuals/Errors = (Sum(Actual-
Predicted Values))2
• i.e
Assumptions of Linear Regression
• The basic assumptions of Linear Regression are as follows:
• 1. Linearity: It states that the dependent variable Y should be
linearly related to independent variables. This assumption can
be checked by plotting a scatter plot between both variables.
Assumptions of Linear Regression
• 2. Normality: The X and Y variables should be
normally distributed. Histograms, KDE plots, Q-Q
plots can be used to check the Normality
assumption.
Assumptions of Linear Regression
• 3. Homoscedasticity: The
variance of the error terms
should be constant i.e the
spread of residuals should
be constant for all values of
X.
• This assumption can be
checked by plotting a
residual plot.
• If the assumption is violated
then the points will form a
funnel shape otherwise
they will be constant.
Assumptions of Linear Regression
• 4. Independence/No
Multicollinearity: The
variables should be
independent of each other
i.e no correlation should be
there between the
independent variables.
• To check the assumption, we
can use a correlation matrix
or VIF score. If the VIF score
is greater than 5 then the
variables are highly
correlated.
• In the below image, a high
correlation is present
between x5 and x6 variables.
Assumptions of Linear Regression
• 5. The error terms should be normally
distributed. Q-Q plots and Histograms can be
used to check the distribution of error terms.
Assumptions of Linear Regression
• 6. No Autocorrelation: The error terms should be
independent of each other.
• Autocorrelation can be tested using the Durbin Watson test.
• The null hypothesis assumes that there is no autocorrelation.
• The value of the test lies between 0 to 4. If the value of the
test is 2 then there is no autocorrelation.
How to deal with the Violation of any of
the Assumption
• The Violation of the assumptions leads to a
decrease in the accuracy of the model
therefore the predictions are not accurate and
error is also high.
For example, if the Independence assumption
is violated then the relationship between the
independent and dependent variable can not
be determined precisely.
How to deal with the Violation of any of
the Assumption
• There are various methods or techniques available to deal with the
violation of the assumptions. Let’s discuss some of them below.
• Violation of Normality assumption of variables or error terms
• To treat this problem, we can transform the variables to the normal
distribution using various transformation functions such as log
transformation, Reciprocal, or Box-Cox Transformation.
• Violation of MultiCollineraity Assumption
• It can be dealt with by:
• Doing nothing (if there is no major difference in the accuracy)
• Removing some of the highly correlated independent variables.
• Deriving a new feature by linearly combining the independent variables,
such as adding them together or performing some mathematical operation.
• Performing an analysis designed for highly correlated variables, such as
principal components analysis.
Evaluation Metrics for Regression Analysis
• To understand the performance of the Regression model performing
model evaluation is necessary. Some of the Evaluation metrics used for
Regression analysis are:
• 1. R squared or Coefficient of Determination: The most commonly
used metric for model evaluation in regression analysis is R squared. It
can be defined as a Ratio of variation to the Total Variation. The value
of R squared lies between 0 to 1, the value closer to 1 the better the
model.

where SSRES is the Residual Sum of squares and SSTOT is the Total Sum of
squares
Evaluation Metrics for Regression Analysis

• 2. Adjusted R squared: It is the improvement to R squared.


The problem/drawback with R2 is that as the features increase,
the value of R2 also increases which gives the illusion of a good
model. So the Adjusted R2 solves the drawback of R2. It only
considers the features which are important for the model and
shows the real improvement of the model.
Adjusted R2 is always lower than R2.
Evaluation Metrics for Regression Analysis

• 3. Mean Squared Error (MSE): Another


Common metric for evaluation is Mean squared
error which is the mean of the squared
difference of actual vs predicted values.
Evaluation Metrics for Regression Analysis

• 4. Root Mean Squared Error (RMSE): It is the


root of MSE i.e. Root of the mean difference of
Actual and Predicted values. RMSE penalizes
the large errors whereas MSE doesn’t.
Linear Regression
• Linear Regression Model Representation :
• In a Simple Linear Regression Model with single x and y, the
form of the model would be-
Linear Regression
• In higher dimensions when we have more than 1 input
variables the line is now replaced by a plane or hyper plane.
Linear Regression
• Violations of linearity or additivity are extremely serious: if you fit
a linear model to data which are non linearly or non additively
related, your predictions are likely to be seriously in error, especially
when you extrapolate beyond the range of the sample data.
• Violations of independence are potentially very serious in time
series regression models: serial correlation in the errors (i.e.,
correlation between consecutive errors or errors separated by some
other number of periods) means that there is room for
improvement in the model, and extreme serial correlation is often a
symptom of a badly mis-specified model. Serial correlation (also
known as auto correlation”) is sometimes a byproduct of a violation
of the linearity assumption, as in the case of a simple (i.e., straight)
trend line fitted to data which are growing exponentially over time.
Linear Regression
• Independence can also be violated in non-time-series models if errors
tend to always have the same sign under particular conditions, i.e., if
the model systematically under predicts or over predicts what will
happen when the independent variables have a particular configuration.
• Violations of homoscedasticity (which are called “heteroscedasticity”)
make it difficult to gauge the true standard deviation of the forecast
errors, usually resulting in confidence intervals that are too wide or too
narrow.
• In particular, if the variance of the errors is increasing over time,
confidence intervals for out-of-sample predictions will tend to be
unrealistically narrow.
• Heteroscedasticity may also have the effect of giving too much weight
to a small subset of the data (namely the subset where the error
variance was largest) when estimating coefficients.
Linear Regression
• Violations of normality create problems for determining
whether model coefficients are significantly different from
zero and for calculating confidence intervals for forecasts.
Sometimes the error distribution is “skewed” by the presence
of a few large outliers. Since parameter estimation is based
on the minimization of squared error, a few extreme
observations can exert a disproportionate influence on
parameter estimates. Calculation of confidence intervals and
various significance tests for coefficients are all based on the
assumptions of normally distributed errors. If the error
distribution is significantly non-normal, confidence intervals
may be too wide or too narrow.
Linear Regression
• Techniques to build a Linear Regression
model
• Two most common techniques through which
a Linear Regression model is built are :
• 1.Ordinary Least Squares
• 2.Gradient Descent
• 3.Regularization
Linear Regression
• Ordinary Least Squares Method :
• Ordinary Least Squares method is used for multiple linear regression.
The OLS method corresponds to minimizing the sum of square
differences between the observed and predicted values
Linear Regression
• Gradient Descent
• When there are one or more inputs you can use a process of
optimizing the values of the coefficients by iteratively
minimizing the error of the model on your training data.
• This operation is called Gradient Descent and works by
starting with random values for each coefficient. The sum of
the squared errors are calculated for each pair of input and
output values.
• A learning rate is used as a scale factor and the coefficients
are updated in the direction towards minimizing the error.
The process is repeated until a minimum sum squared error
is achieved or no further improvement is possible.
Linear Regression
• When using this
method, you must
select a learning
rate (alpha)
parameter that
determines the size
of the improvement
step to take on each
iteration of the
procedure.
Linear Regression
• Regularization-
• There are extensions of the training of the linear model called
regularization methods.
• These seek to both minimize the sum of the squared error of
the model on the training data (using ordinary least squares)
but also to reduce the complexity of the model (like the
number or absolute size of the sum of all coefficients in the
model).
• Two popular examples of regularization procedures for linear
regression are:
• Lasso Regression:
• Ridge Regression:
Linear Regression
• Lasso Regression: where Ordinary Least
Squares is modified to also minimize the
absolute sum of the coefficients (called L1
regularization).
• Ridge Regression: where Ordinary Least
Squares is modified to also minimize the
squared absolute sum of the coefficients
(called L2 regularization).
Linear Regression
• These methods are effective to use when there is collinearity in
your input values and ordinary least squares would over fit the
training data.
Linear Regression
• Applications of Linear Regression
• Linear Regression is a very powerful statistical technique
and can be used to generate insights on consumer
behaviour, understanding business and factors
influencing profitability.
• Linear regressions can be used in business to evaluate
trends and make estimates or forecasts.
– For example, if a company’s sales have increased steadily
every month for the past few years, by conducting a linear
analysis on the sales data with monthly sales, the company
could forecast sales in future months.
Linear Regression
• Applications of Linear Regression
• Linear regression can also be used to analyze the marketing
effectiveness, pricing and promotions on sales of a product.
– For instance, if company XYZ, wants to know if the funds that they
have invested in marketing a particular brand has given them
substantial return on investment, they can use linear regression.
– The beauty of linear regression is that it enables us to capture the
isolated impacts of each of the marketing campaigns along with
controlling the factors that could influence the sales.
– In real life scenarios there are multiple advertising campaigns that run
during the same time period.
– Supposing two campaigns are run on TV and Radio in parallel, a linear
regression can capture the isolated as well as the combined impact of
running this ads together.
Linear Regression
• Applications of Linear Regression
• Linear Regression can also be used to assess risk in financial services
or insurance domain.
– For example, a car insurance company might conduct a linear regression to
come up with a suggested premium table using predicted claims to Insured
Declared Value ratio.
– The risk can be assessed based on the attributes of the car, driver
information or demographics. The results of such an analysis might guide
important business decisions.
• In the credit card industry, a financial company maybe interested in
minimizing the risk portfolio and wants to understand the top five
factors that cause a customer to default. Based on the results the
company could implement specific EMI options so as to minimize
default among risky customers.
Applications of Linear Regression
• Real-time example-
• We have a dataset which contains information about
relationship between ‘number of hours studied’ and ‘marks
obtained’. Many students have been observed and their hours
of study and grade are recorded. This will be our training data.
Goal is to design a model that can predict marks if given the
number of hours studied. Using the training data, a regression
line is obtained which will give minimum error. This linear
equation is then used for any new data. That is, if we give
number of hours studied by a student as an input, our model
should predict their mark with minimum error.
• Y(pred) = b0 + b1*x
Applications of Linear Regression

• The values b0 and b1 must be chosen so that they minimize the


error. If sum of squared error is taken as a metric to evaluate
the model, then goal to obtain a line that best reduces the error.

Figure 2: Error Calculation

If we don’t square the error, then positive and negative point


will cancel out each other.
For model with one predictor,

Figure 3: Intercept Calculation


Figure 4: Co-efficient Formula

• Exploring ‘b1’
• If b1 > 0, then x(predictor) and y(target) have a positive
relationship. That is increase in x will increase y.
• If b1 < 0, then x(predictor) and y(target) have a negative
relationship. That is increase in x will decrease y.
• Exploring ‘b0’
• If the model does not include x=0, then the prediction will become
meaningless with only b0.
• For example, we have a dataset that relates height(x) and weight(y).
Taking x=0(that is height as 0), will make equation have only b0 value
which is completely meaningless as in real-time height and weight
can never be zero. This resulted due to considering the model values
beyond its scope.
• If the model includes value 0, then ‘b0’ will be the average of all
predicted values when x=0. But, setting zero for all the predictor
variables is often impossible.
• The value of b0 guarantee that residual have mean zero. If there is no
‘b0’ term, then regression will be forced to pass over the origin. Both
Ref: https://towardsdatascience.com/linear-regression-detailed-view-ea73175f6e86
the regression co-efficient and prediction will be biased.
Example-Linear Regression
• The values of x and the corresponding values of y are given in the table below:
i. Find the least square regression line y= ax + b
ii. Estimate the value of y when x =10.

x 0 1 2 3 4
y 2 3 5 4 6

You might also like