Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
75 views16 pages

OLS Regression Analysis Basics

Uploaded by

Chahinez ziane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views16 pages

OLS Regression Analysis Basics

Uploaded by

Chahinez ziane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Lecture 2.

3: Testing the Validation of the model


Learning Outcomes: When you finish this lecture you should be able to
• Understand the foundational assumptions underlying the Ordinary Least Squares (OLS) technique
and their importance for accurate regression analysis.
• Explore the properties of the OLS technique, focusing on its unbiasedness, consistency, and efficiency
in estimating regression coefficients.
• Learn to calculate and interpret the variance and standard error of OLS estimators to assess
estimation accuracy.
• Develop skills in hypothesis testing for regression coefficients to evaluate their statistical
significance and implications in the regression model.
• Examine the Coefficient of Determination (R²) as a key measure of goodness of fit, understanding its
role in assessing the explanatory power of the estimated regression.
• Conduct an overall significance test for the regression model to determine its effectiveness in
explaining the dependent variable.
• Perform and interpret tests for the correlation coefficient to understand the strength and direction
of relationships between variables.
• Apply the OLS technique for forecasting, utilizing the regression model to make informed predictions
based on historical data.

This lecture aims to equip students with a comprehensive understanding of the OLS technique and its
applications in regression analysis and forecasting.

Short content: This lecture will develop the following sections:


2-3-1- Assumptions Underlying the OLS technique
2-3-2- Properties of OLS technique
2-3-3- Variance and Standard Error of OLS Estimators
2-3-4- Testing Hypotheses about the Regression Coefficients
2-3-5- The Coefficient of Determination (R2): a Measure of Goodness of Fit
of the Estimated Regression
2-3-6- Test of the Overall Significance of the Regression
2-3-7- Test of the Correlation Coefficient.
2-3-8- Forecasting.
I-OLS TECHNIQUE
2.3.1. The Assumptions Underlying the Method of Least Squares
If our objective is to estimate β1 and β2 only, the method of OLS discussed in the
preceding lecture will suffice. But recall from lecture 1 that in regression analysis our objective
is not only to obtain βˆ1 and βˆ2 but also to draw inferences about the true β1 and β2. For
example, we would like to know how close βˆ0 and βˆ1 are to their counterparts in the
population or how close Yˆi is to the true E(Y | Xi). To that end, we must not only specify the
functional form of the model, as in Eq: (Y= β0 + β 1 Xi + u), but also make certain assumptions
about the manner in which Yi are generated. To see why this requirement is needed, look at
the PRF: (Yi = β0 + β1 Xi + µi). It shows that Yi depends on both Xi and µi. Therefore, unless we
Module: Statistical Modeling
Lecture 2.3: Testing the Validation of the Model

are specific about how Xi and µi are created or generated, there is no way we can make any
statistical inference about the Yi and also, as we shall see, about β0 and β1. Thus, the
assumptions made about the Xi variable(s) and the error term are extremely critical to the valid
interpretation of the regression estimates.

In conducting a regression analysis, we begin by making an assumption about the appropriate


model for the relationship between the dependent and independent variable(s). For the case of
simple linear regression, the assumed regression model is

Y=β0 + β1 X + µ

ASSUMPTION 1: Linearity of the Regression Model:

The regression model is linear in the parameters, though it may or may not be linear in the
variables.

Implication: That is the regression model as shown in Eq: Yi = β0 + β1 Xi + ui. As will be


discussed in Chapter 3, this model can be extended to include more explanatory variables.

ASSUMPTION 2: Fixed X Values or X Values Independent of the Error Term:

Values taken by the regressor X may be considered fixed in repeated samples (the case of fixed
regressor) or they may be sampled along with the dependent variable Y (the case of stochastic
regressor).

Implication: In the latter case, it is assumed that the X variable(s) and the error term are
independent, that is cov(Xi, ui) =0

ASSUMPTION 3: The errors have constant variance (σ2)

The variance of µ, denoted by σ2, is the same for all values of x.

Implication: The variance of y about the regression line equals σ2 and is the same for all values
of x.

ASSUMPTION 4: The errors are independent of each other (cov µi , µj)

The values of µ are independent. That means, we assume that σ2 is the same for each x

Implication: The value of µ for a particular value of x is not related to the value of µ for any
other value of x; thus, the value of y for a particular value of x is not related to the value of y
for any other value of x.

ASSUMPTION 5: The errors are normally distributed

The error term µ is a normally distributed. Even though the error term µ is not observable, we
assume that the error is a normally distributed random variable with mean 0 and standard
deviation σ; that is, E(µ)=0.

2
Module: Statistical Modeling
Lecture 2.3: Testing the Validation of the Model

Implication: β0 and β1 are constants, therefore E(β0)=β0 and E(β1)=β1; thus, for a given
value of x, the expected value of y is: E(y)=β0 + β1 X. And because y is a linear function of µ,
y is also a normally distributed random variable.

Figure 2.3.1 illustrates the model assumptions and their implications; note that in this
graphical interpretation, the value of E(y) changes according to the specific value of x
considered. However, regardless of the x value, the probability distribution of and hence the
probability distributions of y are normally distributed, each with the same variance. The
specific value of the error at any particular point depends on whether the actual value of y is
greater than or less than E(y).
Figure 2.3.1 ASSUMPTIONS FOR THE REGRESSION MODEL

Violations of these assumptions can lead to inaccurate predictions, unreliable parameter


estimates, or inefficiency in the model, affecting its applicability in real-world scenarios. The
key assumptions underlying the least squares method are:

1. Linearity: The relationship between the dependent and independent variables must be
linear.
2. Independence of Errors: The errors in the model should be independent of each other.

3
Module: Statistical Modeling
Lecture 2.3: Testing the Validation of the Model

3. Homoscedasticity: The variance of the errors should be constant across all values of the
independent variable.
4. Normality of Errors: The errors should be normally distributed.

Remember that

2.3.2 Properties of the OLS estimator:


If assumptions 1-5 hold, then the estimators β1ˆ and β2ˆ determined by OLS will have a number
of desirable properties, and are known as Best Linear Unbiased Estimators (BLUE). What
does this acronym stand for:

• ‘Estimator’ - β1ˆ and β2ˆ are estimators of the true value of α and β
• ‘Linear’ - β1ˆ and β2ˆ are linear estimators - that means that the formulae for β1ˆ and
β2ˆ are linear combinations of the random variables (in this case, y)
• ‘Unbiased’ - on average, the actual values of β1ˆ and β2ˆ will be equal to their true values
• ‘Best’ - means that the OLS estimator βˆ has minimum variance among the class of
linear unbiased estimators. Alternative linear unbiased estimator and showing in all
cases that it must have a variance no smaller than the OLS estimator.

Under assumptions 1-5 listed above, the OLS estimator can be shown to have the desirable
properties that it is consistent, unbiased and efficient. Unbiasedness and efficiency have already
been discussed above, and consistency is an additional desirable property.
Figure 2.3.2 OLS estimators have the lowest variance among Table 2-3-1: The least-squares method yields

all linear unbiased estimators estimators with desirable statistical properties

4
Module: Statistical Modeling
Lecture 2.3: Testing the Validation of the Model

II-TESTS FOR SIGNIFICANCE


2.3.3 Variances and the Standard Errors of OLD Estimators
From Eq. (E(y)=β1 + β2 X + µ), it is evident that least-squares estimates are a function of the
sample data. But since the data are likely to change from sample to sample, the estimates will
change ipso facto. Therefore, what is needed is some measure of reliability or precision of the
estimators βˆ1 and βˆ2. In statistics the precision of an estimate is measured by its standard
error(se). The standard errors (se) of the OLS estimates can be obtained as follows:

Where: x2i=(xi-ѝ), var=variance and se=standard error and where σ2 is the constant or
homoscedastic variance of ui. All the quantities entering into the preceding equations except
σ2 can be estimated from the data. σ2 itself is estimated by the following formula:

where σˆ2 is the OLS estimator of the true but unknown σ2 and where the expression n – 2 (n-
k, where K is the number of the parameters included in the model) is known as the number
of degrees of freedom (df), ∑uˆ2i being the sum of the residuals squared or the residual sum
of squares (RSS).
Once ∑uˆ2i is known, σˆ2 can be easily computed using the following formula:

Example 2-3-1:
The given below table represents the calculation of a regression equations of X on Y, where:
given below: (b1=44.25) and (b2=-o.25). Use these values in the table below to:
1. Visualize the relationship between data points and their fitted line. Plot the actual data
points and overlay the regression line. Clearly indicate the sum of squared errors (SSE),
regression sum of squares (RSS), and total sum of squares (TSS) on the graph.
2. Calculate the values of (Ỹi).
3. Calculate the square sum of residuals (RSS)
4. Calculate (SE) of the overall model, (SEb1) and (SEb2).

5
Module: Statistical Modeling
Lecture 2.3: Testing the Validation of the Model

Table: 2-3-1: The calculations of a regression equation of price on amount demanded

Observations Price ($) Amount Demanded (Q) (X-ϰ) (Y-Ý) (X-ϰ)(Y-Ý) (X-ϰ)2

1 10 40 -3 -1 3 9
2 12 38 -1 -3 3 1
3 13 43 0 2 0 0
4 12 45 -1 4 -4 1
5 16 37 3 -4 -12 9
6 15 43 2 2 4 4
∑ 78 246 / / -6 24

2.3.4 Testing Hypotheses about the Regression Coefficients

Is the true slope different from zero? This is an important question because if β1 = 0, then X
does not influence Y and the regression model collapses to a constant β0 plus a random error
term:

In other words. in a simple linear regression equation, the mean or expected value of y is a
linear function of x: E(Y)= β0 + β1 X. If the value of β1 is zero, E(Y)= β0 + β1 X= β0. In this
case, the mean value of Y does not depend on the value of X and hence we would conclude that
X and Y are not linearly related. Alternatively, if the value of β1 is not equal to zero, we would
conclude that the two variables are related. Thus, to test for a significant regression
relationship, we must conduct a hypothesis test to determine whether the value of β1 is zero.
Two tests are commonly used. Both require an estimate of σ2, the variance of βK in the
regression model.
• t Test
The simple linear regression model is Y= β0 + β1X+µ. If X and Y are linearly related, we must
have β1≠0. The purpose of the t test is to see whether we can conclude that β1≠0. We will use
the sample data to test the following hypotheses about the parameter β1.

Suppose we want to test the hypothesis that the regression coefficient βk=0. To test this
hypothesis, we use the t test of statistics, which is:

6
Module: Statistical Modeling
Lecture 2.3: Testing the Validation of the Model

• Test for Zero Slope: (Example on Exam Scores)


In this example, we would anticipate a positive slope (i.e., more study hours should improve exam
scores) so we will use a right-tailed test.
Step 1: State the Hypotheses:
H0: β1 = 0
H1: β1 > 0
Step 2: Specify the Decision Rule: For a right-tailed test with α = .05 and d.f= 10 – 2 = 8, t.05
=1.860. Our decision rule states:
Reject H0 if tcalc > 1.860 or p-value<ά
Step 3: Calculate the Test Statistic : To calculate the test statistic, we use the slope estimate (b1
=1.9641) and the standard error (sb1 = 0.86095):
𝒃𝟏 1.9641
𝒕𝒄𝒂𝒍𝒄 = = =2.281
𝒔𝒆𝒃𝟏 0.86095
Step 4: Make a Decision : Because tcalc > t.05 (2.281 > 1.860), we can reject the hypothesis of a zero
slope in a right-tailed test. (We would be unable to do so in a two-tailed test because the critical value
of our t statistic would be 2.306.) Once we calculate the test statistic for the slope or intercept, we
can find the p-value by using Excel’s function =T.DIST. RT(2.281, 8) = .025995. Because .02995 <
.05, we would reject H0. We conclude the slope is positive.
If H0 is rejected, we will conclude that β1≠0 and that a statistically significant relationship
exists between the two variables. However, if H0 cannot be rejected, we will have insufficient
evidence to conclude that a significant relationship exists. In order to obtain a significant
intercept value, same test should be carried out for β0, where t=b0/Sbo

Example 2-3-2: Use the data from Example 2-3-1 to test β0 and β1.

2-3-5- Confidence Intervals for Slope and Intercept (βK)


The confidence intervals or levels of β0 (the intercept) and β1 (the slope) in regression analysis
provide insights into the range within which we expect the true values of β0 and β1 to fall with
a certain level of confidence, typically 95%.
1. Significance Interval of β1 (Slope)
• The interval for β1 shows where the true relationship between the independent and
dependent variable lies, considering sample data. This interval is crucial for
understanding the strength and direction of the relationship. For instance, a 95%
confidence interval for β1 that does not contain zero indicates a statistically significant
relationship.
• The width of this interval depends on factors like sample size and data variability. A
narrower interval suggests greater precision, while a wider interval indicates more
uncertainty about the slope's true value.
2. Significance Interval of β0 (Intercept)
• The confidence interval for β0 provides insight into where the dependent variable is
expected to fall when the independent variable equals zero. Though the intercept might
not always hold practical interpretation, its interval is still important for assessing
model accuracy.

7
Module: Statistical Modeling
Lecture 2.3: Testing the Validation of the Model

• When the interval for β0 includes zero, it suggests the intercept is not statistically
significant, meaning there may be no systematic effect on the dependent variable when
the independent variable is absent or minimal.
The form of a confidence interval for β0 and β1 is as follows:
β0 +tα/2SE βo ≤ β0 ≤ β0 -tα/2SE βo or in brief β0 ± tα/2SE βo
β1+tα/2SE βo ≤ β1≤ β1-tα/2SE βo or in brief β1 ± tα/2SE β1

The point estimator is β0 and β1 and the margin of error is tα/2. The confidence
coefficient associated with this interval is 1-α, and tα/2 is the t value providing an
area of α/2 in the upper tail of a t distribution with n-2 degrees of freedom.
Example 2-3-3

Suppose that we want to develop a 95% confidence interval estimate for a regression model
between gross leasable area (X) and retail sales (Y) in shopping malls. Based on the data in
the following table, calculate the confidence interval knowing that n= 24.

where: Y(Retail Sales (billions))= 0.3852 + 0.2590X (Gross Leasable Area (million sq ft))
REGRESSION OUTPUT
VARIABLES Coefficients Std.Error T(df=26) p-value 95% Lower 95% Upper
INTERCEPT 0.3852 1.9853 0.194 .8479 ………………… …………………
AREA 0.2590 0.0084 30.972 1.22E-19 ………………… …………………

2-3-6-F Test: Significance test of all model parameters simultaneously.


The F Test is a statistical method used in regression analysis to evaluate the overall significance
of a model. Specifically, in the context of linear regression, the F Test assesses whether there
is a statistically significant relationship between the dependent variable Y and the set of
independent variables1 included in the model. It tests the null hypothesis that all regression
coefficients (except the intercept) are equal to zero, meaning that none of the independent
variables have a significant effect on Y.

Purpose of the F Test

In multiple regression, we have several predictors (X’s), and the F Test helps determine
whether they collectively have an explanatory power over Y. The test is crucial in assessing the
overall validity of the model before interpreting individual coefficients, as a significant F Test
indicates that at least one predictor contributes meaningfully to the prediction of Y.

Hypotheses for the F Test

The F Test in a regression model examines the following hypotheses:

1
With only one independent variable, the F test will provide the same conclusion as the t test; that is, if the t
test indicates b1±0 and hence a significant relationship, the F test will also indicate a significant relationship.
But with more than one independent variable, only the F test can be used to test for an overall significant
relationship.

8
Module: Statistical Modeling
Lecture 2.3: Testing the Validation of the Model

• Null Hypothesis (H0): β1=β2=…=βk=0 (All coefficients are equal to zero. This implies
that the independent variables do not jointly explain any of the variability in Y).

• Alternative Hypothesis (H1): At least one βi≠0 (At least one predictor has a non-zero
coefficient. Indicating that the model has explanatory power).

• Decision Rule: If the F statistic is large enough (or the p-value is small enough), we
reject the null hypothesis, concluding that the regression model is statistically
significant. Or: Reject H0 if FCALCULATED/ > F CRITICAL or p-value<ά

Alternatively, if using a p-value approach:

• If the p-value is less than α, reject the null hypothesis.


• If the p-value is greater than or equal to α, fail to reject the null hypothesis.

• F Test Statistic: The F statistic is calculated as:

𝑴𝑺𝑹 𝑺𝑺𝑹/𝑷
𝑭= =
𝑴𝑺𝑬 𝑺𝑺𝑬/(𝑵−𝑷−𝟏)

where:

• MSR (Mean Square Regression): This measures the variance explained by the
regression model and is calculated as SSR/p, where SSR (Sum of Squares Regression)
is the explained variation by the model, and p is the number of predictors.
• MSE (Mean Square Error): This measures the unexplained variance, or the variance of
the residuals. It is calculated as SSE/(n−p−1), where SSE (Sum of Squares Error)
represents the residual or unexplained variation, n is the sample size, and p is the
number of predictors.
• F=Explained Variation per Degree of Freedom.

where:

• MSR (Mean Square Regression) is the average explained variance, calculated by


dividing the sum of squares for regression by the number of independent variables k.

• MSE (Mean Square Error) is the average unexplained variance, calculated by dividing
the residual sum of squares by the degrees of freedom n−k.

The F statistic follows an F distribution, and its critical value depends on the degrees of freedom
associated with the explained and unexplained variances.

Interpreting the F Test:

1. If the F statistic is large and the p-value is small (typically less than 0.05), we reject
H0, suggesting that the model significantly explains the variance in Y (A higher F value

9
Module: Statistical Modeling
Lecture 2.3: Testing the Validation of the Model

suggests a more significant model, implying that the predictors explain a substantial
portion of the variance in the dependent variable).

1. To determine if the F statistic is statistically significant, it is compared to a critical F


value from the F-distribution table (based on n−p−1 degrees of freedom) or by
calculating the p-value associated with the F statistic.

2. If the F statistic is small and the p-value is high, we fail to reject H0, suggesting that
the independent variables do not collectively explain a significant portion of the variance
in Y.

For instance, in a regression model predicting income (dependent variable) based on education
level, years of experience, and industry (independent variables), a significant F Test result
would indicate that at least one of these predictors is meaningfully related to income.

In short:

• The F Test is a global test for the regression model, assessing the collective impact of
all predictors.

• A significant F Test allows us to proceed with analyzing individual predictors.

• The F Test does not tell us which specific variables are significant; we would look at
individual t-tests for each coefficient for that information.

Remember that:

2-3-5-The Coefficient of Determination (R2): a Measure of Goodness of Fit of


the Estimated Regression
The Coefficient of Determination, often denoted as R2, is a key statistic in regression analysis
used to assess how well the estimated regression model fits the observed data. Essentially, R2
measures the proportion of variance in the dependent variable that can be explained by the
independent variables in the model.

• R2 values range from 0 to 1, where 0 indicates that the model explains none of the
variance in the dependent variable, while 1 means the model explains all the
variance.
• A higher R2 value suggests a better fit of the model to the data, meaning the
independent variables explain a large portion of the variability in the dependent
variable.

How to Calculate R2:

10
Module: Statistical Modeling
Lecture 2.3: Testing the Validation of the Model

• R2 is calculated by comparing the total sum of squares (TSS), which reflects the total
variation in the data, with the sum of squares of residuals (RSS), which indicates the
unexplained variation. The formula is:

• This ratio gives an insight into the extent to which the regression model reduces
prediction error compared to a simple mean model.
• This quantity varies from 0 to 1, and higher values indicate a better regression.
• Caution should be used in making general interpretations of R2 because a high value can
result from either a small SSE, a large SST, or both.

Uses and Limitations:

• While a high R2 indicates a strong model fit, it does not necessarily mean the model is
appropriate, as it doesn’t account for the potential overfitting or omitted variable bias.
• For multiple regressions, an adjusted R2 is often used to adjust for the number of
predictors in the model, providing a more reliable metric when comparing models with
different numbers of variables.

In summary, R2 serves as a useful measure for evaluating the goodness of fit of a regression
model, helping analysts understand the explanatory power of their models, although it should
be interpreted in context with other diagnostic measures.

• Decomposing the Variance in Y:

In a regression, we seek to explain the variation in the dependent variable around its mean. We
express the total variation as a sum of squares (denoted SST):

We can split the total variation into two parts:

The explained variation in Y (denoted SSR) is the sum of the squared differences between the
conditional mean yˆi (conditioned on a given value xi) and the unconditional mean Ý (same for
all xi):

The unexplained variation in Y (denoted SSE) is the sum of squared residuals, sometimes
referred to as the error sum of squares.*2

2 *But bear in mind that the residual e (observable) is not the same as the true error µ (unobservable)
i i

11
Module: Statistical Modeling
Lecture 2.3: Testing the Validation of the Model

If the fit is good, SSE will be relatively small compared to SST. If each observed data value yi
is exactly the same as its estimate yˆi (i.e., a perfect fit), then SSE will be zero. There is no
upper limit on SSE. Table 2.3.2 shows the calculation of SSE for the exam scores.

Figure 2-3-3: Decomposing the deviation of an observed y-value from the mean into the
deviations explained and not explained by the regression

Table 2.3.2: calculations of Sum of Squares

Example 2-3-3: The data in the following table relates to Y and X,

And the estimated regression equation for these data is represented as:

a. Compute SSE, SST, and SSR.

12
Module: Statistical Modeling
Lecture 2.3: Testing the Validation of the Model

b. Compute the coefficient of determination R2.


c. Comment on the goodness of fit.
d. Compute the sample correlation coefficient r.

2.3.1. Significant test for the correlation coefficient

The Sample Correlation Coefficient (r) is:


The sample correlation coefficient r is an estimate of the population correlation coefficient ρ
(the Greek letter rho). There is no flat rule for a “high” correlation because sample size must
be taken into consideration. To test the hypothesis H0: ρ = 0, the test statistic is:

(test for zero correlation)


We compare this t test statistic with a critical value of t for a one-tailed or two-tailed using
d.f=n-2 degrees of freedom and any desired α. Recall that we lose a degree of freedom for each
parameter that we estimate when we calculate a statistic. Because both x− and y− are used to
calculate r, we lose 2 degrees of freedom and so d.f.=n-2. After calculating the t statistic, we
can find its p-value.
Step 1: State the Hypotheses
We will use a two-tailed test for significance at α=.05. The hypotheses are
H0: ρ = 0
H0: ρ ≠ 0
Step 2: Specify the Decision Rule
For a two-tailed (tn-ά/2) test using for example d.f.=n-2=30-2=28 degrees of freedom, Appendix
(01) gives t.025=2.048. The decision rule is:

where the random variable tn-2 follows a Student’s t distribution with (n-2) degrees of freedom.

13
Module: Statistical Modeling
Lecture 2.3: Testing the Validation of the Model

In this example, we Reject H0 if tcalc > 2.048 Or if: tcalc < -2.048.

Step 3: Calculate the Test Statistic


To calculate the test statistic, we first need to calculate the value for r. For example, we find
r=.4356 for the variables y and x. We must then calculate tcalc.

Step 4: Make a Decision


The test statistic value (tcalc=2.561) exceeds the critical value t.025=2.048, so we reject the
hypothesis of zero correlation at α=.05. We can also find the p-value using the Excel function
=T.DIST.2T(t,deg_freedom). The two-tailed p-value for Y is =T.DIST.2T(2.561,28) 5 .0161.
We would reject ρ = 0 since the p-value < .05.
Step 5: Take Action
Based on the results of the test, managers can take the necessary or corrective actions with
great confidence.
- Testing the significance of the correlation coefficient is important for several reasons:
1. Determining if the Relationship is Real or Due to Chance:
A correlation coefficient measures the strength and direction of a linear relationship between
two variables. However, just because a correlation exists in a sample does not mean it's a true
relationship in the population. The significance test helps determine if the observed correlation
is likely due to a real relationship or if it occurred by random chance.
2. Understanding the Reliability of the Correlation:
By testing for significance, you can assess the reliability of the correlation in predicting the
strength of the relationship in the population. If the correlation is statistically significant, you
can have more confidence that the relationship exists beyond just the sample data.
3. Making Informed Decisions:
In research and analysis, especially in fields like economics, psychology, and finance,
conclusions are often drawn based on relationships between variables. A significant correlation
allows researchers to make more confident decisions about the nature of these relationships.
4. Avoiding False Interpretations:
Without testing significance, a researcher might overinterpret a weak correlation that is
actually not meaningful. Significance tests act as a safeguard against such misinterpretations
by providing a way to assess the strength of the evidence for the correlation.
5. Quantifying Uncertainty:
The test helps quantify the uncertainty around the correlation coefficient. It gives you a p-value,
which indicates the probability of observing a correlation as extreme as the one in your sample,
under the assumption that no true correlation exists in the population (null hypothesis).
6. Assessing the Impact of Sample Size:
The significance of a correlation can be affected by sample size. A small sample might show a
strong correlation by chance, while a large sample might reveal that the correlation is weak or
not significant. The test adjusts for sample size, allowing a clearer understanding of the
relationship.

14
Module: Statistical Modeling
Lecture 2.3: Testing the Validation of the Model

In summary, testing for the significance of the correlation coefficient ensures that the
relationship you observe between variables is meaningful, not random, and that your findings
can be generalized to the population you are studying.
Example 2-3-4:
A research team was attempting to determine if political risk in countries is related to
inflation for these countries. In this research a survey of political risk analysts produced a
mean political risk score for each of 49 countries.
The political risk score is scaled such that the higher the score, the greater the political risk.
The sample correlation between political risk score and inflation for these countries was
0.43.
We wish to determine if the population correlation, r, between these measures is different
from 0. Specifically, we want to test: H0: r = 0
Against: H1: r > 0
• Use the previous information and the appendix (01) to test (H0: r = 0).

References:
• David P. Doane and Lori E. Seward (2016). Applied Statistics in Business and
Economics. 5TH EDITION. . McGraw-Hill Companies, Inc. Boston.

• Damodar N. Gujarati and Dawn C. Porter (2009). BASIC ECONOMETRICS. 5th edition.
McGraw-Hill Companies, Inc. Boston.
• Damodar Gujarati (2012). Econometrics By Examples. Palgrave Macmillan. London.

• Neil A. Weiss (2012). Introductory STATISTICS. 9TH edition. Pearson Education, Inc.
Boston. USA.
• Neil A. Weiss (2017). Introductory STATISTICS. 10TH edition. Pearson Education, Inc.
Boston. USA.
• David R. Anderson, Dennis J. Sweeney, and Thomas A. Williams (2008). Statistics for
Business and Economics. Tenth Edition. Thomson South-Western. Mason, OH. USA.
• Paul Newbold, William L. Carlson, and Betty M Thorne (2013). 8TH edition. Statistics
for Business and Economics. Pearson Education, Inc. Boston. USA.

Appendix 1

15
Module: Statistical Modeling
Lecture 2.3: Testing the Validation of the Model

16

You might also like