Linear regression is also a type of machine-learning algorithm more specifically
a supervised machine-learning algorithm that learns from the labelled datasets and maps
the data points to the most optimized linear functions. which can be used for prediction on
new datasets.
Supervised learning has two types:
Classification: It predicts the class of the dataset based on the independent
input variable. Class is the categorical or discrete values. like the image of an
animal is a cat or dog?
Regression: It predicts the continuous output variables based on the
independent input variable. like the prediction of house prices based on
different parameters like house age, distance from the main road, location, area,
etc.
What is Linear Regression?
Linear regression is a type of supervised machine learning algorithm that computes the linear
relationship between the dependent variable and one or more independent features by fitting a
linear equation to observed data.
When there is only one independent feature, it is known as Simple Linear Regression, and
when there are more than one feature, it is known as Multiple Linear Regression.
Similarly, when there is only one dependent variable, it is considered Univariate Linear
Regression, while when there are more than one dependent variables, it is known
as Multivariate Regression.
Simple Linear Regression
In its simplest form, linear regression involves modeling the relationship between one
independent variable and one dependent variable with a straight line.
The equation for simple linear regression is:
𝑦=𝛽0+𝛽1𝑋
Explanation of the Components
1. y (Dependent Variable):
This is the variable we are trying to predict or explain.
It is also known as the response variable or the outcome.
This is the variable we use to make predictions about 𝑦y.
2. X (Independent Variable):
It is also known as the predictor variable, explanatory variable, or feature.
This is the value of 𝑦y when X is zero.
3. β0 (Intercept):
It represents the point where the regression line crosses the y-axis.
It is a constant term in the equation.
4. β1 (Slope):
This indicates the change in y for a one-unit change in X.
It represents the steepness or incline of the regression line.
A larger absolute value of β1 means a steeper slope.
Visualization
To help visualize this, imagine plotting X on the x-axis and 𝑦y on the y-axis. The simple
linear regression model fits a straight line through the data points in such a way that the sum
of the squared differences between the observed values and the predicted values is
minimized. This is known as the "least squares" method.
Example
Suppose we have data on the number of hours students study (independent variable X) and
their scores on a test (dependent variable y). We want to predict test scores based on the
number of study hours.
If our simple linear regression model results in the equation:
𝑦=50+5𝑋
𝛽0 = 50: This means that if a student studies for 0 hours, their predicted test score
𝛽1=5β1=5: This means that for each additional hour studied, the test score increases
would be 50.
by 5 points.
Interpretation
Intercept (β0): The predicted test score when no hours are studied is 50.
Slope (β1): Each additional hour of study is associated with an increase of 5 points in
the test score.
So, if a student studies for 3 hours, the predicted test score would be:
𝑦=50+5×3=50+15=65
Conclusion
The simple linear regression equation
𝑦=𝛽0+𝛽1𝑋
provides a way to model the relationship between an independent variable X and a dependent
interpret how changes in 𝑋 affect y.
variable y. By understanding the intercept (β0) and slope (β1), we can make predictions and
What is the best Fit Line?
Our primary objective while using linear regression is to locate the best-fit line, which
implies that the error between the predicted and actual values should be kept to a
minimum. There will be the least error in the best-fit line.
The best Fit Line equation provides a straight line that represents the relationship between
the dependent and independent variables. The slope of the line indicates how much the
dependent variable changes for a unit change in the independent variable(s).
Here Y is called a dependent or target variable and X is called an independent variable
also known as the predictor of Y. There are many types of functions or modules that can
be used for regression. A linear function is the simplest type of function. Here, X may be a
single feature or multiple features representing the problem.
Linear regression performs the task to predict a dependent variable value (y) based on a
given independent variable (x)). Hence, the name is Linear Regression. In the figure
above, X (input) is the work experience and Y (output) is the salary of a person. The
regression line is the best-fit line for our model.
We utilize the cost function to compute the best values in order to get the best fit line since
different values for weights or the coefficient of lines result in different regression lines.
Evaluation Metrics for Linear Regression
A variety of evaluation measures can be used to determine the strength of any linear
regression model. These assessment metrics often give an indication of how well the model is
producing the observed outputs.
Optimization Techniques
Normal Equation
Conclusion
Linear Regression is a powerful and widely-used algorithm in Machine Learning for
predicting continuous outcomes. Understanding its key concepts, assumptions, and evaluation
metrics is essential for effectively applying this technique to real-world problems.