Application of
Regression Analysis
SDS Sir
Introduction. . .
• Father of Regression Analysis
Carl F. Gauss (1777-1855).
• contributions to physics, Mathematics &
astronomy.
• The term “Regression” was first used in
1877 by Francis Galton.
Regression Analysis. . .
• It is the study of the
relationship between
variables.
• It is one of the most
commonly used tools
for business analysis.
• It is easy to use and
applies to many
situations.
Regression types. . .
• Simple Regression: single explanatory
variable
• Multiple Regression: includes any
number of explanatory variables.
• Dependant variable: the single variable being explained/
predicted by the regression model
• Independent variable: The explanatory variable(s) used to
predict the dependant variable.
• Coefficients (β): values, computed by the regression tool,
reflecting explanatory to dependent variable
relationships.
• Residuals (ε): the portion of the dependent variable that
isn’t explained by the model; the model under and over
predictions.
Regression Analysis. . .
• Linear Regression: straight-line relationship
– Form: y=mx+b
• Non-linear: implies curved relationships
– logarithmic relationships
Regression Analysis. . .
• Cross Sectional: data gathered from the
same time period
• Time Series: Involves data observed over
equally spaced points in time.
Simple Linear Regression Model. . .
• Only one
independent
variable, x
• Relationship
between x and y
is described by a
linear function
• Changes in y are
assumed to be
caused by changes
in x
Types of Regression Models. . .
Estimated Regression Model. . .
The sample regression line provides an estimate of
the population regression line
Estimated Estimate of Estimate of the
(or predicted) the regression regression slope
y value
intercept
Independent
ŷ i b0 b1x variable
The individual random error terms ei have a mean of zero
Simple Linear Regression
Example. . .
• A real estate agent wishes to examine
the relationship between the selling price
of a home and its size (measured in
square feet)
• A random sample of 10 houses is
selected
– Dependent variable (y) = house price in
$1000s
– Independent variable (x) = square feet
House Price in
Square Feet
$1000s
(x)
(y)
245 1400
312 1600
279 1700
Sample Data 308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Regression Statistics
Multiple R
R Square
0.76211
0.58082
Output. . .
Adjusted R
Square 0.52842
Standard Error 41.33032
The regression equation is:
Observations 10
house price 98.24833 0.10977 (square feet)
ANOVA Significance
df SS MS F F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficient Upper
s Standard Error t Stat P-value Lower 95% 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Graphical Presentation . . .
• House price model: scatter plot and
regression line
450
400
House Price ($1000s)
350
Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
house price 98.24833 0.10977 (square feet)
Interpretation of the Intercept, b0
house price 98.24833 0.10977 (square feet)
• b0 is the estimated average value of Y when the
value of X is zero (if x = 0 is in the range of
observed x values)
– Here, no houses had 0 square feet, so b0 = 98.24833
just indicates that, for houses within the range of sizes
observed, $98,248.33 is the portion of the house price
not explained by square feet
Interpretation of the Slope Coefficient, b1
house price 98.24833 0.10977 (square feet)
• b1 measures the estimated change in the
average value of Y as a result of a one-
unit change in X
– Here, b1 = .10977 tells us that the average value of a
house increases by .10977($1000) = $109.77, on
average, for each additional one square foot of size
Example: House Prices
House Price Estimated Regression Equation:
Square Feet
in $1000s
(x)
(y) house price 98.25 0.1098 (sq.ft.)
245 1400
312 1600
279 1700 Predict the price for a house
308 1875
with 2000 square feet
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Example: House Prices
Predict the price for a house
with 2000 square feet:
house price 98.25 0.1098 (sq.ft.)
98.25 0.1098(2000)
317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Coefficient of Determination, R2
Coefficient of determination
2 SSR sum of squares explained by regression
R
SST total sum of squares
Note: In the single independent variable case, the coefficient
of determination is
2 2
R r
where:
R2 = Coefficient of determination
r = Simple correlation coefficient
Examples of Approximate R2
Values
y
R2 = 1
Perfect linear relationship
between x and y:
x
R2 = 1
y 100% of the variation in y is
explained by variation in x
x
R = +1
2
Examples of Approximate R2 Values
y
0 < R2 < 1
Weaker linear relationship
between x and y:
x
Some but not all of the
y
variation in y is explained
by variation in x
x
Examples of Approximate R2 Values
R2 = 0
y
No linear relationship
between x and y:
The value of Y does not
x depend on x. (None of the
R2 = 0
variation in y is explained
by variation in x)
Output. . .
Regression Statistics
Multiple R 0.76211 SSR 18934.9348
2
R 0.58082
R Square 0.58082 SST 32600.5000
Adjusted R
Square 0.52842
58.08% of the variation in
Standard Error 41.33032
house prices is explained by
Observations 10 variation in square feet
ANOVA Significance
df SS MS F F
18934.934 11.084
Regression 1 18934.9348 8 8 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficien P- Upper
ts Standard Error t Stat value Lower 95% 95%
0.1289 232.0738
Intercept 98.24833 58.03348 1.69296 2 -35.57720 6
0.0103
Square Feet 0.10977 0.03297 3.32938 9 0.03374 0.18580
Standard Error of Estimate. . .
• The standard deviation of the variation of
observations around the regression line is
estimated by
SSE
s
n k 1
Where
SSE = Sum of squares error
n = Sample size
k = number of independent variables in the
model
The Standard Deviation of the
Regression Slope
• The standard error of the regression slope
coefficient (b1) is estimated by
sε sε
sb1
(x x) 2
( x)
x n 2
2
where:
sb1 = Estimate of the standard error of the least squares slope
SSE
sε = Sample standard error of the estimate
n 2
Regression Statistics Output. . .
Multiple R 0.76211
R Square 0.58082
Adjusted R sε 41.33032
Square 0.52842
Standard Error 41.33032
Observations 10 sb1 0.03297
ANOVA Significance
df SS MS F F
18934.934 11.084
Regression 1 18934.9348 8 8 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficien P- Upper
ts Standard Error t Stat value Lower 95% 95%
0.1289 232.0738
Intercept 98.24833 58.03348 1.69296 2 -35.57720 6
0.0103
Square Feet 0.10977 0.03297 3.32938 9 0.03374 0.18580