Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
54 views24 pages

Practice Question Set Linear Regression

This document provides an overview of linear regression concepts and practices, including ordinary least squares (OLS) regression, hypothesis testing, and interpretation of regression results. It includes practical examples involving the Phillips curve, investment fund performance, and real estate rental prices, along with associated questions and answers. The content is intended for personal use and should not be distributed freely.

Uploaded by

Sagar Chav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views24 pages

Practice Question Set Linear Regression

This document provides an overview of linear regression concepts and practices, including ordinary least squares (OLS) regression, hypothesis testing, and interpretation of regression results. It includes practical examples involving the Phillips curve, investment fund performance, and real estate rental prices, along with associated questions and answers. The content is intended for personal use and should not be distributed freely.

Uploaded by

Sagar Chav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Licensed to at [email protected]. Downloaded February 19, 2022.

The information provided in this document is intended solely for you. Please do not freely distribute.

P1.T2. Quantitative Analysis

Chapter 7: Linear Regression

Bionic Turtle FRM Practice Questions

By David Harper, CFA FRM CIPM


www.bionicturtle.com
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

LINEAR REGRESSION: KEY IDEAS ............................................................................................. 3

Chapter 7: Linear Regression


P1.T2.20.16. LINEAR REGRESSION MODELS ................................................................................... 4
P1.T2.20.17. HYPOTHESIS TESTS OF UNIVARIATE LINEAR REGRESSION MODEL................................. 9
P1.T2.214. REGRESSION LINES (STOCK & WATSON) .................................................................... 13
P1.T2.215. PROPERTIES OF LINEAR REGRESSION (STOCK & WATSON) .......................................... 15
P1.T2.216. REGRESSION SUMS OF SQUARES: ESS, SSR, AND TSS .............................................. 17

Regression: Gujarati
P1.T2.86.OLS REGRESSION ....................................................................................................... 19
P1.T2.87. OLS REGRESSION INTERPRETATION ............................................................................. 21
P1.T2.91. OLS REGRESSION HYPOTHESIS ................................................................................... 23

2
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

Linear Regression: Key Ideas


An ordinary least squares (OLS) linear regression with one regressor (a.k.a., independent or
explanatory variable) is given by:

= + +
 The error term contains all the other factors aside from (X) that determine the value of the
regressand dependent variable (Y) for a specific observation The t-statistic tests the null
hypothesis that the population mean equals a certain value.
 The key assumptions of the OLS linear regression model are:
o Conditional distribution of u(i) given X(1i), X(2i),…,X(ki) has mean of zero
o X(1i), X(2i), … X(ki), Y(i) are independent and identically distributed (i.i.d.)
o Large outliers are unlikely
o No perfect collinearity (in the case of a multiple regression; i.e., two or more
regressors)
 To test the significance of a coefficient (since we do not know the population variance), we
compute a t-ratio which has student’s t distribution
− regression coefficient − null hypothesis [0]
= =
( 1) (regression coefficient)

 The coefficient of determination is given by:

= =1−
 The adjusted R^2 is given by:
−1
=1−
− −1
 The standard error of the regression (SER) is given by:

=
− −1
Where k = number of slope coefficients; e.g., in the case of a single variable regression,
the denominator is (n-2).

3
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

Chapter 7: Linear Regression


P1.T2.20.16. Linear regression models
P1.T2.20.17. Hypothesis tests of univariate linear regression model
P1.T2.214. Regression lines (Stock & Watson)
P1.T2.215. Properties of linear regression (Stock & Watson)
P1.T2.216. Regression sums of squares: ESS, SSR, and TSS

P1.T2.20.16. Linear regression models


Learning objectives: Describe the models which can be estimated using linear regression
and differentiate them from those which cannot. Interpret the results of an ordinary least
squares (OLS) regression with a single explanatory variable. Describe the key assumptions
of OLS parameter estimation. Characterize the properties of OLS estimators and their
sampling distributions.

20.16.1. Debra is an analyst at a governmental agency. Her boss asked her to investigate whether
the Phillips curve applies during high-inflation regimes. To answer the question, Debra collected
data from the FRED database at the St. Louis Fed (https://fred.stlouisfed.org/). The Phillips curve
describes an inverse relationship between unemployment rates and inflation rates;
https://en.wikipedia.org/wiki/Phillips_curve. Debra collected monthly data and she regressed the
inflation rate against the unemployment rate (conditional on high-inflation regimes). Her
independent variable is the unemployment rate (FRED code: UNRATE) and here, the dependent
variable is the Inflation rate (CPIAUCSL). The units are percentages not decimals; e.g., the dataset
includes the month of January in 1982 when the unemployment rate was 8.90 and the inflation rate
was 6.38. Her regression results are presented below.

Debra wants to know if an inverse relationship is observed. Which of the following statements
about the regression is TRUE?
a) The regression is not useful because the intercept is too far away from (different than) zero
b) The pattern of the standard errors, t-statistics, and p-values suggest there is a violation in
some assumption(s) of the classical linear regression model (CLRM)
c) There is an inverse relationship because, for each unit increase in the unemployment rate
(i.e., +1.0%), the inflation rate is expected to decrease on average by 1.10%
d) There is not an inverse relationship because, for each unit increase in the unemployment
rate (i.e., +1.0%), the inflation rate is expected to increase on average by 5.60%

4
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

20.16.2. Peter is an analyst who is evaluating an investment fund whose managers claim has
outperformed their benchmark. He collected monthly returns for the last five years; i.e., the sample
size is excess return pairs over n = 60 months. He plots excess returns, which are defined as the
returns in excess of the riskfree rate; ie., an excess return equals the gross return minus the
riskfree rate. The scatterplot is displayed below:

The correlation coefficient is 0.708. In regard to the univariate data, the standard deviation of the
portfolio's returns is 22.84% and the standard deviation of the benchmark's returns is 9.79%. The
average excess return of the benchmark was -0.37% and the average excess return of the portfolio
was 2.61%. Each of the following statements is true EXCEPT which is false?

a) The slope of the regression line is approximately 1.65 and the intercept is approximately
3.22%
b) Visual inspection confirms the error variance is not constant and we can, therefore, assert
the presence of heteroskedastic shocks
c) This regression line passes through the coordinates of averages, (μ_x, μ_y) = (-0.37%,
+2.61%), although this is not an actual pairwise observation
d) This model appears to at least meet the three essential restrictions of a linear regression
model including linearity in the coefficients (aka, parameters)

5
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

20.16.3. Sally works at a real estate firm and was asked by her client to quantify the relationship
between rental size (in square feet) and rental price. She explained to her client that the
relationship is multivariate but, given that caveat, she offered to perform a linear regression with a
single explanatory variable. She retrieved a massive dataset (n = 360,400 observations and
includes rentals across the United States) and regressed monthly rental price (aka, the explained
variable) against rental size as measured by square feet. To illustrate the units, one of data points
in the dataset is (y = $1,200 per month, X = 1,000 feet^2). The results are displayed below.

In regard to Sally's interpretation of these regression results (above), each of the following
statements is true EXCEPT which is false?

a) The model predicts a rent of $2,072 for a size of 1,800 feet^2


b) The mean residual is zero; i.e., the average of 360,400 residuals is zero
c) Both the intercept and slope coefficients are significant; aka, significantly different than zero
d) Each increase in the rental size of 100 feet^2 is associated with an average increase of
$57.90 in monthly rent

6
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

Answers:

20.16.1. C. True: There is an inverse relationship because, for each unit increase in the
unemployment rate (i.e., +1.0%), the inflation rate is expected to decrease on average by 1.10%
(note this would be true even if misinterpreted the units as decimals rather than percentages; i.e.,
the slope implies an almost 1:1 inverse relationship).

In regard to (A), (B) and (D), each is false:


 The non-zero intercept is a viable alpha (due to luck or skill)
 There are no obvious, apparent violations although please note that regression diagnostics
are not shown.
 The slope is negative and highly significant (i.e., the t-stat is -19.843 and the associated p-
value is infinitesimal: the slope coefficient is twenty standard deviations away from zero)
such that we can conclude an inverse relationship between the unemployment rate and the
inflation rate.

20.16.2. B. False. There is NOT a visual pattern of heteroskedasticity; further, given this
dataset, test statistics fail to reject a null hypothesis of homoskedasticity.

In regard to (A), (C) and (D), each is TRUE. Specifically,


 The slope of the regression line is given by ρ(F,B)*σ(F)/σ(B); in this case,
0.708*0.2284/0.0979 = 1.652.
 This regression line passes through the coordinates of averages, (μ_x, μ_y) = (-0.37%,
+2.61%), although this is not an actual pairwise observation
 This model appears to meet at least the three essential assumptions (aka, restrictions) of a
classical linear regression model (CLRM) including linearity in the parameters

20.16.3. A. False. Given a size of 1,800 feet^2, the model predicts a monthly rent of 624.4 +
0.579 × 1,800 = $1,666.

In regard to (B), (C) and (D), each is TRUE. Specifically,


 The mean residual is zero by construction of the OLS regression line; in this case, the
average of 360,400 residuals is zero
 Both the intercept and slope coefficients are significant: the t-stats are in excess of 200.
 Each increase in the rental size of 100 feet^2 is associated with an average increase of
$57.90 in monthly rent: the 0.570 slope coefficient indicates that a one-unit increase in the
independent variable is associated, on average, with a 0.579 unit increase in the dependent
variable. In this case, an increase in one square foot is associated with an increase of
+$0.579 or 57.9 cents in the rent. Consequently, an increase of +100 ft^2 is associated with
an increase of 100*$0.579 = $57.90 in the monthly rent.

7
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

Additional Notes:

For those who might be interested, these regressions are run in R (#rstats), often with actual
datasets for added realism. If you would like to learn more about data science, see the following
links:
 Question 20.16.1: My blog (i.e., about the use of gt table to prettify the output) at
https://www.davidsdatablog.com/post...employment-using-gt-package-to-display-table/; and
code at github https://github.com/bionicturtle/frm...-regression-inflation-versus-
unemployment.Rmd As the table indicates, data is pulled from the very useful FRED
database at https://fred.stlouisfed.org/
 Question 20.16.2: My blog at https://www.davidsdatablog.com/post...egression-portfolio-
versus-benchmark-returns/ (github at https://github.com/bionicturtle/frm...ate-regression-
portfolio-versus-benchmark.Rmd)
 Question 20.16.3: My blog at https://www.davidsdatablog.com/post...ate-regression-
monthly-rental-versus-footage/ and github https://github.com/bionicturtle/frm...gression-
monthly-rental-versus-footage.en.Rmd

Discuss here in the forum: https://www.bionicturtle.com/forum/threads/p1-t2-20-16-linear-


regression-models.23437/

8
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

P1.T2.20.17. Hypothesis tests of univariate linear regression model


Learning objectives: Construct, apply, and interpret hypothesis tests and confidence
intervals for a single regression coefficient in a regression. Explain the steps needed to
perform a hypothesis test in a linear regression. Describe the relationship between a t-
statistic, its p-value, and a confidence interval.

20.17.1. Below the results of a linear regression analysis are displayed. The dataset is monthly
returns over a six-year period; i.e., n = 72 months. The gross returns of Apple's stock (ticker:
AAPL) were regressed against the S&P 1500 Index (the S&P 1500 is our proxy for the market).
The explanatory variable is SP_1500 and the response (aka, dependent) variable is AAPL.

Which is nearest to the 90.0% confidence interval for the beta of Apple's (AAPL) stock?

a) 90.0% CI = (0.56; 1.98)


b) 90.0% CI = (0.70; 1.84)
c) 90.0% CI = (0.91; 1.63)
d) 90.0% CI = (-0.004; 0.020)

9
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

20.17.2. Peter wants to add a low-beta stock to his portfolio. One candidate is Kroger's stock
(ticker: KR). As a proxy for the market, he uses the S&P 1500. He wrangled gross monthly returns
for KR and SP_1500 over ten years such that his sample size is 120 pairwise returns. The
regression results are displayed here.

Peters wants to make two decisions. In


both cases, his test is a two-sided
hypothesis test with 99.0% confidence. In
the first test, the null hypothesis is that
KR's beta is zero. In the second test, the
null hypothesis is that KR's beta is one
(1.0). Based on these regression results,
which of the following is TRUE as a valid
inference?

a) At two-sided 99.0% confidence,


Kroger's beta is significantly different from BOTH zero AND one (1.0)
b) At two-sided 99.0% confidence, Kroger's beta is significantly different from NEITHER zero
NOR one (1.0)
c) At two-sided 99.0% confidence, Kroger's beta is significantly different from one (1.0) but is
NOT significantly different from zero
d) At two-sided 99.0% confidence, Kroger's beta is significantly different from zero but is NOT
significantly different from one (1.0)

20.17.3. Debra is an economist who is interested in the relationship between consumer spending
and the gross domestic product (GDP). From the FRED database at the Fed's Bank of St. Louis
(https://fred.stlouisfed.org/) she collects quarterly data from 1980 through the first quarter of 2020;
her series includes n = 161 quarters of data. She regresses consumer spending (C_SPEND), as
the response (aka, dependent) variable against GDP as the explanatory (aka, independent)
variable. Each series is not a level, but rather a seasonally adjusted percent change. The
regression results are displayed
here.

Additionally, Debra calculates the


(univariate) standard deviation of
each variable over the period:
σ(C_SPEND) = 2.484 and σ(GDP) =
3.412. In regard to these regression
results, each of the following
statements is true EXCEPT which is
false?

a) The coefficients are jointly insignificant because each two-sided 95.0% confidence interval
contain zero
b) The GDP's t-statistic ("t-stat") of 7.3285 is equal to the coefficient estimate (0.3659) divided
by its standard error (0.0499)
c) Given the variables' cross-volatility, σ(C_SPEND)/σ(GDP) = 2.484 ÷ 3.412 = 0.7280, the
correlation between the variables is approximately 0.50
d) The coefficient of determination (R-squared; aka, R^2) is about 0.25 such that we can say
about 25% in the variation in consumer spending (C_SPEND) is explained by gross
domestic product (GPD)

10
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

Answers:

20.17.1. C. True: 90.0% CI = (0.91; 1.63)

The two-tailed critical-Z at 90.0% confidence is 1.645 such that the CI = 1.270 +/- 1.645 × 0.216 =
(0.91; 1.63). The confidence interval is given by: coefficient ± (standard error) × (critical value). The
sample size is large so we can use the normal deviate of 1.645 associated with 90.0% two-tailed
confidence; note this should not require any lookup because we already know the 95.0% confident
one-tailed normal deviate is 1.645. With 71 degrees of freedom, the critical t value is
T.INV.2T(0.10, 71) = 1.666600, so we can see that normal Z is a close approximation.

20.17.2. D. True: At two-sided 99.0% confidence, Kroger's beta is significantly different from
zero, but is NOT significantly different from one (1.0)

The regression table already tests the first null hypothesis that the parameter's value is zero; we
can immediately observe by the t-value (4.119) and associated low p-value that this first null
hypothesis can be rejected at 99.0% confidence. The displayed p-value of 7.08 × 10^(-5) =
0.00007080 = 0.007080%; as the p-value is (by definition) the exact significance level, we can
reject this null, H0: β(KR, M) = 0, at any two-sided confidence less than 99.992920% (= 1.0 -
0.007080%).

The general form of the test statistic is given by (Observation - Null)/SE. When the null is given by,
H0: β(KR, M) = 0, the test statistic is (0.690 - 0)/0.1690 = 4.118; i.e., the is similar to the displayed
t-value and differs slightly only due rounding. As already mentioned, this is the first null hypothesis
that can be rejected with any high degree of confidence given the p-value is only 0.00708%.

When the null is instead given by, H0: β(KR, M) = 1.0, the test statistic is (0.690 - 1.0)/0.1690 = -
1.7988 or about -1.80. This is too low (too near to the hypothesized value) to reject with 99.0%
confidence. We know the test statistic is at least 2.58 (assuming a two-sided normal test; the
corresponding two-sided student's t with 119 degrees of freedom is 2.62). Please note that given
the setup, Peter may plausibly conduct instead the one-sided test given by H0: β(KR, M) ≥ 1.0 and
H1: β(KR, M) < 1.0. Nonetheless, here he would also fail to reject this null. In this way, the beta of
0.6960 is not significantly different than one (1.0) under either a one- or two-sided test.

20.17.3. A. False: Each (aka, both) coefficients are SIGNIFICANT (as we can immediately
observe by their small p-values) such that neither two-sided 95.0% confidence interval
contains zero
 At 95.0% confidence, the CI for the intercept coefficient is given by 0.9000 +/- 1.96 ×
0.3156 = (0.281, 1.518).
 At 95.0% confidence, the CI for the beta (aka, slope coefficient) is given by 0.3659 +/- 1.96
× 0.0499 = (0.268, 0.464).
In regard to (B), (C) and (D), each is TRUE. Specifically:
 True: The GDP's t-statistic ("t-stat") of 7.3285 is equal to the coefficient estimate (0.3659)
divided by its standard error (0.0499)

11
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

 True: Given the variables' cross-volatility, σ(C_SPEND)/σ(GDP) = 2.484 ÷ 3.412 = 0.7280,


the correlation between the variables is approximately 0.50. Because β(C_SPEND, GDP) =
ρ(C_SPEND, GPD) × σ(C_SPEND)/σ(GDP), it follows that ρ(C_SPEND, GPD) =
β(C_SPEND, GDP) × σ(GDP)/σ(C_SPEND); in this case, ρ(C_SPEND, GPD) = 0.3659 ×
3.412/2.484 = 0.5026 or about 0.50.
 True: The coefficient of determination (R^2) is about 0.25 such that we can say about 25%
in the variation in consumer spending (C_SPEND) is explained by gross domestic product
(GPD). In a univariate regression, the R^2 is the square of the correlation coefficient; in this
case, the R^2 = 0.50^2 = 0.25 (indeed the dataset's regression produces an exact R-
squared of 0.2525, and an adjusted R-squared of 0.2478, both of which round to 0.25).
Additional Notes:

For added realism, I generated these regressions in R (#rstats) with real datasets. If you would like
to learn more about data science, or just see the typical regression summary output, see the
following links:
 As a post on my data science blog at https://www.davidsdatablog.com/post...-univariate-
regressions-continued-2nd-set-v2/
 The code is also at my github https://github.com/bionicturtle/frm...univariate-regressions-
cont-2nd-set-v2.en.Rmd

Discuss in the forum here: https://www.bionicturtle.com/forum/threads/p1-t2-20-17-hypothesis-


tests-of-univariate-linear-regression-model.23450/

12
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

P1.T2.214. Regression lines (Stock & Watson)


Learning Objectives: Explain how regression analysis in econometrics measures the
relationship between dependent and independent variables. Define and interpret a
population regression function, regression coefficients, parameters, slope and the
intercept. Define and interpret the stochastic error term (or noise component).

214.1. According to the capital asset pricing model (CAPM), the expected return of a security:
E[R(i)] = Rf + B(i,M)*RiskPrice(M), where R(i) is the security's return, Rf is the riskfree rate, B(i,M)
is the security's beta with respect to the market, and RiskPrice(M) is the market risk premium which
is also known as market's "price of risk." The riskfree rate is 3.0% and RiskPrice(M) is 4.0%. We
conduct a regression analysis for a stock and discover that, with respect to the market, the stock's
correlation and beta are, respectively, 0.50 and 1.50. That is, rho(stock, market) = 0.50 and
beta(stock, market) = 1.50. If the volatilities of the overall market and the stock do not change, but
their correlation, rho(stock, market), increases to 0.80, what is the CHANGE in the stock's
expected return?
a) +0.30% (30 basis points)
b) +1.2%
c) +2.4%
d) +3.6%

214.2. A regression of average weekly earnings (AWE, measure in dollars) on age (AGE, in years)
using a random sample of college-educated full-time workers aged 25-65 is given by: AWE = $600
+ 8.3*AGE. According to the regression model, what is the expected weekly pay difference
between a 35-year-old worker and a 45-year-old worker? (S&W 4.3 adapted).
a) $52.50
b) $83.00
c) $973.50
d) Not enough information

214.3. Pretend GARP regressed the exam scores (FRMScore) against preparation time (Hours)
and returned the following regression: FRMScore(i) = 23.2 + 0.18*Hours(i) + u(i). Which of the
following is the BEST interpretation of the error term, u(i)?
a) It allows for users to adjust to inform the intercept with a "real world" interpretation
b) It contains the assumed but unobserved correlation between the error term and the
regressor (independent variable)
c) The error term represents all of the factors other than preparation time that influence the
score
d) It is the estimator of the standard deviation of the regression error

13
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

Answers:

214.1. D. +3.6%
The beta (stock, market) increases from 1.50 to 2.40: 0.80/0.50 * 1.50 = 2.40.
Or, put another way, since beta = cov(stock,market)/variance(market) =
rho(stock,market)*volatility(stock)/volatility(market), in this case:
1.50 beta = 0.50 correlation * volatility(stock)/volatility(market) = 0.50 correlation * cross-volatility. If
cross-volatility is constant, then 0.80 correlation implies (0.80/0.50)*1.50 = 2.40 beta revised.
If the beta increases by 2.40 - 1.50 = 0.90, then the expected return increases by 0.90 * MRP =
0.90 * 4.0% = 3.60%.

Note: the riskfree rate has no impact on the change in expected return.

214.2. B. $83.00
10 years * $8.3 = $83.00
E[earnings for 35 year old] = 600 + 8.3*35 = $890.50;
E[earnings for 45 year old] = 600 + 8.3*45 = $973.50;
973.50 - 890.50 = $83.00.

214.3. C. The error term represents all of the factors other than preparation time that
influence the score
 Stock & Watson: “The intercept and the slope are the coefficients of the population
regression line, also known as the parameters of the population regression line. The slope
is the change in Y associated with a unit change in X. The intercept is the value of the
population regression line when X = 0; it is the point at which the population regression line
intersects the Y-axis. In some econometric applications, the intercept has a meaningful
economic interpretation. In other applications, the intercept has no real-world meaning; for
example, when X is the class size, strictly speaking, the intercept is the predicted value of
test scores when there are no students in the class! When the real-world meaning of the
intercept is nonsensical, it is best to think of it mathematically as the coefficient that
determines the level of the regression line.
 The term in Equation (4.5) is the error term. The error term incorporates all of the factors
responsible for the difference between the ith district’s average test score and the value
predicted by the population regression line. This error term contains all the other factors
besides X that determine the value of the dependent variable, Y, for a specific observation,
i. In the class size example, these other factors include all the unique features of the i th
district that affect the performance of its students on the test, including teacher quality,
student economic background, luck, and even any mistakes in grading the test."

Discuss in the forum here: http://www.bionicturtle.com/forum/threads/p1-t2-214-regression-lines-


stock-watson.5384/

14
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

P1.T2.215. Properties of linear regression (Stock & Watson)


Learning Objectives: Define and interpret a sample regression function, regression
coefficients, parameters, slope and the intercept. Describe the key properties [assumption]
of a linear regression.

215.1. We regressed the monthly returns of Apple (AAPL) against the S&P 500 ($SPX) for the last
thirty-six months ending January 31st; Apple's monthly return is the dependent variable (Y,
regressand), the index's monthly return is the independent variable (X, regressor) and the number
of pair observations, n = 36. In regard to the dependent variable, Apple's average monthly return
over the period was +4.837% with a standard deviation of 6.686%. In regard to the independent
variable, the average monthly return of the index was +1.69% with a standard deviation of 4.687%.
The covariance between the two series, Covariance(X,Y), was 0.00216. What is the equation for
the sample regression line? (note: I did use actual data, trying to keep it real folks!)
a) AAPL = 0.01 + 0.33*SPX
b) AAPL = 0.02 + 0.67*SPX
c) AAPL = 0.03 + 0.98*SPX
d) AAPL = 0.04 + 1.29*SPX

215.2. A dataset consists of the price of gasoline (Price), the regressor, and the weekly household
demand for gas in terms of gallons (Quantity), the regressand. An ordinary least squares (OLS)
regression line produces the following demand function:

Quantity = 11 - 1.5*Price.

One of the data points in the scatterplot is a household that "demands" 8.0 gallons when the price
is $3.00 per gallon; i.e., Quantity(i) = 8.0 gallons, Price(i) = $3.00. What is the residual of this
observation, u(i)?
a) -1.5
b) zero
c) +1.5
d) Impossible, the observation must lie on the line

215.3. Each of the following is a key property [assumption], according to Stock & Watson, of a
linear regression EXCEPT for:
a) The conditional distribution of the error term, u(i), given X(i), has a mean of zero
b) The variance of the conditional distribution of the error term given X(i), variance[u(i) | X(i) =
x], converges to ZERO as sample (n) and X(i) increase
c) Each observation [X(i), Y(i)] for i = 1, ....n, is independent and identically distributed (i.i.d.)
d) Large outliers are unlikely; i.e., X and Y have nonzero finite kurtosis

15
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

Answers:

215.1. C. AAPL = 0.03 + 0.98*SPX


We need to apply: S&W (4.7): slope (B1) = covariance/variance <-- you should know this!
S&W (4.8): intercept (B0) = average_Y - B1*average_X; i.e., the OLS line must pass through the
point (average X, average Y)
 The slope (B1) = 0.00216/4.687%^2 = 0.983
 The intercept (B0) = 4.837% - (0.98)(1.69%) = 0.032
As correlation is covariance/(StdDev * StdDev), the correlation = 0.00216/(6.686%*4.687%) =
0.6893. The R^2 = 0.6893^2 = 47.51%

215.2. C. +1.5

The Predicted(i) = Q^(i) = 11 - 1.5*3 = 6.5 gallons. The residual, u(i), is the difference between the
observed value and the predicted value. In this case, Q(i) - Q^(i) = 8.0 - 6.5 = 1.5.
 In regard to (D), please make sure you understand why (D) is utterly false: the OLS
generates a series of conditional means, it is impossible for the OLS line to run throughout
all of the points. Notice that we know the OLS line runs through (average X, average Y) but
even this is not an observation itself! The OLS line may "travel" through none of the
observations. It exists as a function of minimizing the sum of the square of each of the
residuals; i.e., the OLS line is derived to solve for MINIMUM[sum of series([Q(i) - Q^(i)]^2)]

215.3. B. An extended assumption is homoskedasticity; i.e., that the variance of the error
term is CONSTANT.

In regard to (A), (C) and (D), these are the three basic OLS assumptions in Stock & Watson:
1. The conditional distribution of the error term, u(i), given X(i), has a mean of zero:
"The conditional distribution of given has a mean of zero. This assumption is a formal
mathematical statement about the ‘other factors’ contained in u(i) and asserts that these
other factors are unrelated to in the sense that, given a value of X(i), the mean of the
distribution of these other factors is zero."
2. Each observation [X(i), Y(i)] for i = 1, ....n, is independent and identically distributed
(i.i.d.): "The assumption is that [X(i), Y(i)], i = 1, ..., n, are independently and identically
distributed (i.i.d.) across observations. This is a statement about how the sample is drawn.
If the observations are drawn by simple random sampling from a single large population,
then [X(i), Y(i)], i = 1, ..., n are i.i.d. ... The i.i.d. assumption is a reasonable one for many
data collection schemes. For example, survey data from a randomly chosen subset of the
population typically can be treated as i.i.d."
3. Large outliers are unlikely; i.e., X and Y have nonzero finite kurtosis: "The assumption
is that large outliers— that is, observations with values of X(i), Y(i), or both that are far
outside the usual range of the data— are unlikely. Large outliers can make OLS regression
results misleading. In this book, the assumption that large outliers are unlikely is made
mathematically precise by assuming that X and Y have nonzero finite fourth moments ...
Another way to state this assumption is that X and Y have finite kurtosis."
Discuss in the forum here: http://www.bionicturtle.com/forum/threads/p1-t2-215-properties-of-
linear-regression-stock-watson.5392/

16
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

P1.T2.216. Regression sums of squares: ESS, SSR, and TSS


Learning Objectives: Define and interpret the explained sum of squares (ESS), the total sum
of squares (TSS), the sum of squared residuals (SSR), the standard error of the regression
(SER), and the regression R^2.

216.1. For the last three years, we regressed monthly dollar change in gasoline prices
(regressand; dependent) against the monthly change in oil prices (regressor; independent). The
number of observations (n) is therefore 36. If the coefficient of determination (R^2) is 0.18 and the
total sum of squares (TSS) is 3.23 dollars^2, what is the standard error of the regression (SER)?
a) $0.28
b) $0.42
c) $2.65
d) $3.23

216.2. We regressed daily returns of a stock (the regressand or dependent variable) against a
market index (e.g., S&P 1500; regressor or independent variable). The regression produced a beta
for the stock, with respect to the market index, of 1.050. The stock's volatility was 30.0% and the
market's volatility was 20.0%. If the regression's total sum of squares (TSS) is 0.300, what is the
regression's explained sum of squares (ESS)?
a) 0.0960
b) 0.1470
c) 0.4900
d) 1.2500

216.3. A five-year regression of monthly cotton price changes, such that the number of
observations (n) equals 60, against average temperature changes produced a standard error of the
regression (SER) of $1.20. If the total sum of squares (TSS) was $90.625 dollars^2 , what is the
implied correlation coefficient?
a) 0.08
b) 0.16
c) 0.28
d) 0.77

17
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

Answers:

216.1. A. $0.28
As R^2 = 1 - SSR/TSS, SSR = (1-R^2)*TSS. In this case, SSR = (1-0.18)*3.23 = 2.6486 dollars^2.
SER = SQRT[SSR/(n-df)], where the df here is 2 because we have 2 coefficients (or 2 variables, if
you like).
Then, SER = SQRT(2.6486/34) = $0.279; i.e., SER units are same as the dependent variable

216.2. B. 0.1470
As beta (stock, index) = covariance(stock,index)/variance(index) = correlation(stock,
index)*volatility(stock)/volatility(index), it follows that:
correlation(stock, index) = beta (stock, index)*volatility(index)/volatility(stock); in this case,
correlation(stock, index) = 1.050*20%/30% = 0.70, and:
R^2 = correlation^2 = 0.70^2 = 0.49.
Since R^2 = ESS/TSS, ESS = R^2*TSS. In this case,
ESS = 0.49*0.30 = 0.1470

216.3. C. 0.28
As SER = SQRT[SSR/(n-df)], SSR = SER^2*(n-df). In this case (again, 2 coefficients = 2 df):
SSR = 1.20^2*(60-2) = 83.52;
R^2 = ESS/TSS = 1 - SSR/TSS = 1 - 83.52/90.625 = 0.07840
correlation = SQRT(0.07840) = 0.280

Discuss in the forum here: http://www.bionicturtle.com/forum/threads/p1-t2-216-regression-


sums-of-squares-ess-ssr-and-tss.5408/

18
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

Regression: Gujarati
P1.T2.86.OLS Regression
P1.T2.87. OLS regression interpretation
P1.T2.91. OLS regression hypothesis

P1.T2.86.OLS Regression
Learning objectives: Describe the method of ordinary least squares for estimation of
parameters. Define and interpret the residual sum of squares.

86.1. Let Y(i) be the Actual Y and let Y^(i) be the Predicted Y. Assume the stochastic version of
sample regression function (SRF) is given by Y^(i) = b1 + b2*X(i) + e(i). What is equal to the
residual?
a) Y(i) - b1 - b2*X(i)
b) Y(i) - b1 - b2*X(i) - e(i)
c) Y^(i) - b1 - b2*X(i)
d) Y^(i) - b1 - b2*X(i) - e(i)

86.2 By what criteria does the method of ordinary least squares (OLS) produce the slope and
intercept estimates?
a) By minimizing: [e(1) + e(2) + … e(n)]^2
b) By minimizing: e(1)^2 + e(2)^2 + … e(n)^2
c) By maximizing: [e(1) + e(2) + … e(n)]^2
d) By maximizing: e(1)^2 + e(2)^2 + … e(n)^2

86.3 An OLS regression is run on a sample of 10 (X,Y) observations. The mean value of the
explanatory variable (X) is 263.0 and the mean value of the explained variable (Y) is 31.0. If the
OLS estimate of the slope coefficient is 0.086, what is the estimate of the intercept?
a) -6.382
b) 8.382
c) 22.618
d) Not enough information

86.4. Assume the same OLS sample regression as above. What is the average (mean) value of all
of the residuals; i.e., what is SUM[e(1) + e(2) + …. e(n)]/n?
a) Zero
b) 15.54
c) 70.25
d) Not enough information

86.5 Let A = the sum of the product of each residual and explanatory variable; i.e., SUM[e(i)*X(i)].
Let B = the sum of the product of each residual and Predicted Y; i.e., SUM[e(i)*Y^(i)]. If the
correlation is greater than zero, what are, respectively, the values of A and B?
a) zero and zero
b) zero and positive
c) positive and zero
d) positive and positive

19
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

Answers:

86.1. A. Y(i) - b1 - b2*X(i)

The residual, e(i), is the difference between the Actual (observed) Y and the Predicted Y.
e(i) = Y(i) - Y^(i) and since the Predicted Y = Y^(i) = b1 + b2*X(i),
e(i) = Y(i) - b1 - b2*X(i)

86.2 B. By minimizing: e(1)^2 + e(2)^2 + … e(n)^2

OLS minimizes the residual sum of squares (RSS) which is just the summation of the square of the
residuals. In regard to (A), the problem with this approach is that negatives cancel positives;
squaring avoids the canceling effect.

86.3 B. 8.382 b1 = mean Y - b2 * mean X

Please note: the OLS line passes through the point (average X, average Y) even as this may or
may not be an observed data point.

86.4 A. (zero)

The mean value of the residuals is zero; this is a condition/outcome of the minimization under OLS.

86.5 A. (zero and zero) Under OLS, the residual is uncorrelated with both the explanatory X(i) and
the Predicted Y^(i).

Key points:
 OLS minimizes the residual sum of squares (RSS); i.e., OLS = MIN(RSS)
 The OLS line passes through the point (Average X, Average Y)
 OLS produces a line such that: the mean of residuals is zero
 OLS implies: residuals are uncorrelated with X(i) and Predicted Y^(i)

Discuss in the forum here: http://www.bionicturtle.com/forum/threads/l1-t2-86-ols-


regression.3850/

20
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

P1.T2.87. OLS regression interpretation


Learning Objective: Interpret the results of an ordinary least squares regression.

87.1 Assume that according to Okun’s Law, the relationship between the change in the
unemployment rate and growth in real GDP is given by: Y(t) = -0.4*[X(t) - 2.5%] where Y(t) =
change in the unemployment rate in percentage points and X(t) = percent growth rate in real
output, as measured by real GDP. In order to reduce the unemployment rate by 3 percentage
points, what is the required growth in real GDP?
a) 2.5%
b) 5.0%
c) 7.5%
d) 10.0%

87.2 Let Y(t) = the S&P 500 Index and let X(t) = the three-month Treasury bill rate. Assume our
linear regression model finds the following relationship: Y(t) = -15 + 26*[1/X(t)]. For example, if X(t)
= 2.0%, then Y(t) = -15 + 26/2% = 1,285. If the Treasury bill rate starts at 3.0%, what is the
average decline in the S&P Index (predicted by the model) if the rate were to increase 1%; i.e.,
from 3% to 4%? (note: do you know why this is a valid “linear” regression?)
a) -289
b) -489
c) -650
d) -28,889

87.3 If Y(t) is the explained (dependent) variable and X(t) is the explanatory (independent) variable,
according to Gujarati, what is the correct way to economically intercept the INTERCEPT
coefficient?
a) The mean or Predicted Y(t) if X(t) is equal to zero
b) The mean or average effect on Y(t) of all the variables omitted from the regression model
c) A value without any particular economic meaning
d) Any of the above depending on the application of common sense

21
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

Answers:

87.1 D. (10.0%)
-0.4 * (10% - 2.5%) = -3.0%

87.2 A. -289 The rate of change is given by dY(t)/dX(t) = -B2/X(t)^2 = -B2*[1/X(t)^2].

... Note this is the same as observing that the slope coefficient—as first partial derivative—is the
rate of change in a more typical linear regression where, if Y(t) = B2*X(t) + B1, d(Yt)/d(Xt) = B2.

Where B2 = 26 and if X(t) = 3%, then the rate of change (first derivative) = -26/3%^2 = -28,888.89.

That is, for each 1.0 unit change in the T-bill rate the S&P Index declines -28,889.

As 1.0 unit = 100%, a 1% change implies a 28,889/100 = 289 index decline.


… this is why we often see 10,000 in the denominator (e.g., for DV01): 1 basis point is 1/100th of
1%, which is 1/100th of 1.0 (100%), so 1 basis point is 1/10,000th of 1.0 unit.
... regression is linear because it is linear in the parameters; it is okay that it is non-linear in the
explanatory variable X.

87.3 D.

In regard to the intercept, we may interpret per either (A), (B), or (C) but “in general you have to
use common sense in interpreting the intercept term, for very often the sample range of the X
values may not include zero as one of the observed values.”

Discuss in the forum here: http://www.bionicturtle.com/forum/threads/l1-t2-87-ols-regression-


interpretation.3854/

22
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

P1.T2.91. OLS regression hypothesis


Learning Objective: Describe hypothesis testing in an OLS regression model.

91.1 A two-variable regression of lotto expenditure against monthly income, based on a sample of
twelve observations (n = 12), produces the following SRF: Y(i) = 11.436 + 0.0642*X(i). The
standard errors of the intercept and slope coefficients, respectively, are 5.448 and 0.020. Is the
slope significant with 99% confidence?
a) No, because critical t = 3.055
b) No, because critical t = 3.169
c) Yes, because critical t = 3.055
d) Yes, because critical t = 3.169

91.2. Using the same regression, what is a 95% confidence interval for the “true” population
(parameter) slope?
a) 0.0008 < B2 < 0.1277
b) 0.0012 < B2 < 0.1212
c) 0.0196 < B2 < 0.1088
d) 0.0224 < B2 < 0.0864

23
Licensed to at [email protected]. Downloaded February 19, 2022.
The information provided in this document is intended solely for you. Please do not freely distribute.

Answers:

91.1. D.
The test statistic = 0.0642/0.020 = 3.21.
The 99% two-tailed critical t with 10 degrees of freedom (i.e., 12 - 2 variables) is 3.169.
As 3.21 > 3.169, we reject the null (null: slope = 0) and find the slope coefficient to be significant
(significantly different than zero)

91.2. C.
The two-tailed critical t @ 95% and 10 df is 2.228. The CI is given by:
0.0642 - 2.228*0.02 = 0.0196
0.0642 + 2.228*0.02 = 0.1088

Discuss in the forum here: http://www.bionicturtle.com/forum/threads/l1-t2-91-ols-regression-


hypothesis.3871/

24

You might also like