0% found this document useful (0 votes)

33 views36 pages

Unit 2 - Class 4

Uploaded by

Alireza Tehrani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views36 pages

Unit 2 - Class 4

Uploaded by

Alireza Tehrani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Unit 2

Class 4
Today’s Agenda
Last class:
• Covariance
• Correlation
• Simple linear regression
• Motivating example 

Today:
• Interpolation and extrapolation
• Influential observations
• Confounding

2
Example: Trees Dataset
This data set provides measurements of the diameter, height and
volume of timber in 31 felled black cherry trees. Note that the
diameter (in inches) is erroneously labelled Girth in the data. It is
measured at 4ft 6in above the ground.

trees[1:5,]
Girth Height Volume
1 8.3 70 10.3
2 8.6 65 10.3
3 8.8 63 10.2
3
A useful command…
summary(Model)
#NOTE: I have removed a lot of stuff you don’t need to see yet!
Coefficients:
Estimate
(Intercept) 62.0313
Girth 1.0544
Multiple R-squared: 0.2697

4
Our line of best fit is…
ℎ� = 62.0313 + 1.0544 × 𝑔𝑔

We can use this line to predict the height of a tree for a particular girth.

Example
If the girth is 10 inches, the model estimates the height will be:

Note! This is NOT the height of a particular cherry tree. Instead, it is an

average height for a cherry tree with girth 10 inches.
5
Clicker Question
The line of best fit is ℎ� = 62.0313 + 1.0544 × 𝑔𝑔. By approximately
how much does our average tree height change for every inch increase
in girth?

A) 63 feet
B) 1 foot
C) 64 feet
D) Cannot determine from the information given
E) None of the above
6
Interpolation
Interpolation is the use of the regression
line for predicting the value of a response
using an explanatory variate within the
range of x.

(i.e., here a value between about 8 and 20)

7
Clicker Question
The line of best fit is ℎ� = 62.0313 + 1.0544 × 𝑔𝑔. This model says the
average height of a tree when the girth is 14 inches is approximately…
A) 14 ft
B) 63 ft
C) 1 ft
D) 77 ft
E) None of the above

8
Extrapolation
Is the use of the regression line for
predicting the value of a response using an
explanatory variate beyond the range of x.

(i.e., here a value outside of 8 and 20)

9
Warnings!

Interpolation tends to be a fairly safe

thing to do, but extrapolation can have
some scary consequences.

10
Clicker Question
The line of best fit is ℎ� = 62.0313 + 1.0544 × 𝑔𝑔. This model says the
average height of a tree when the girth is 0 inches is approximately…
A) 14 ft
B) 63 ft
C) 1 ft
D) 62 ft

11
Today’s Agenda
Today:
• Interpolation and extrapolation
• Influential observations
• Confounding

12
Influential Observations
In one dimension, it is an outlier.

In more than one dimension, it is an outlier in ________________ or in

________________.
• An example will help us see this!

13
Influential Observations affect…
• Slope
• Covariance
• Correlation

14
Influential Observations
Example

15
Dataset
𝒙𝒙𝒊𝒊 𝒚𝒚𝒊𝒊 𝒙𝒙𝒊𝒊 𝒚𝒚𝒊𝒊
69 207 64 193
70 212 73 219
68 206 76 230
69 206 66 195
73 219 70 209
16
Plot the data
The least squares
regression line is
𝑦𝑦� = −9.432 + 3.138𝑥𝑥

𝑟𝑟 = 0.990

17
Let’s add an outlier
𝑥𝑥11 = 100; 𝑦𝑦11 = 299
Extreme in both directions!

The least squares

regression line is
𝑦𝑦� = 1.444 + 2.981𝑥𝑥

𝑟𝑟 = 0.999

(the red line is the original)

18
Let’s change the
outlier
𝑥𝑥11 = 100; 𝑦𝑦11 = 205
Extreme in the X direction
only!

The least squares

regression line is
𝑦𝑦� = 191.915 + 0.238𝑥𝑥

𝑟𝑟 = 0.216

(the red line is the original)

19
Let’s change the
outlier
𝑥𝑥11 = 64; 𝑦𝑦11 = 299
Extreme in the Y direction
only!

The least squares

regression line is
𝑦𝑦� = 276.77 − 0.85𝑥𝑥

𝑟𝑟 = −0.111

(the red line is the original)

20
Notes:
• An influential observation has a large influence on the statistical
calculations being done.
• Identification:
• If removing it from the data causes our line of best fit to change markedly
(see previous examples), then its an influential observation.
• Investigators should try to determine if it’s due to an error or some
other factor surrounding the unit/process from which this
observation was collected.
• Under certain scenarios, the investigator may choose to remove such points
from the analysis.

21
Today’s Agenda
Today:
• Interpolation and extrapolation
• Influential observations
• Confounding

22
Confounding

23
Example:
We are interested in how previous work experience affects income.
Wages are in thousands of dollars and Experience is in years.

Experience = c(5, 7, 9, 11, 13, 15, 17, 19)

Wages = c(45, 48, 50, 45, 65, 62, 65, 70)

24
We build a regression line…
The line is
� = 33.6786 + 1.881 × 𝐸𝐸𝐸𝐸𝐸𝐸
𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊
The correlation is 0.9.

Look at that high correlation!

25
Clicker Question
� = 33.6786 + 1.881 × 𝐸𝐸𝐸𝐸𝐸𝐸 and the correlation is 0.9.
The line is 𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊
What is the R squared value?

A) 0.9
B) 0.81
C) -69.774
D) 1.881
E) None of the above

26
Clicker Question
Therefore, we can say an increase in
previous experience CAUSES Wages to
increase…
A) TRUE
B) FALSE

27
Clicker Question
Therefore, we can say an increase in
previous experience CAUSES Wages to
increase…
A) TRUE
B) FALSE

28
Example continued…
In this example, there is another piece of data that I should give you that may
or may not have been recorded by the researcher…

Experience = c(5, 7, 9, 11, 13, 15, 17, 19)

Wages = c(45, 48, 50, 45, 65, 62, 65, 70)
Education = c(0,0,0,0,1,1,1,1) #1=university, 0=no
university

And if we look at the correlation between Non-University-Educated Wages

and Previous Experience, we obtain a correlation of 0.11. The correlation
between University-Educated Wages and Previous Experience has a
correlation of 0.7.
29
Warnings!
Education is called a lurking variable. Every study has them!

A lurking variable is a variable that is not one of the explanatory or response

variables in a study that may influence the interpretation of relationships
among those variables!
• Can falsely identify a strong relationship between variables or hide a true
relationship.

This is why CORRELATION DOES NOT IMPLY CAUSATION!

30
Correlation does NOT imply Causation!

31
Activity: Suppose you want to study the effect of
diet and exercise on a person’s blood pressure.

What lurking variable(s) might we need to consider?

32
33
Confounding
Two variables (either explanatory variables or lurking variables) are
confounded when their effects on a response cannot be distinguished
from each other.

Example
Weight and Age are confounding variables for Height in children.

34
Today’s Agenda
Today:
• Interpolation and extrapolation
• Influential observations
• Confounding

35
Homework Questions
• Chapter 4
• 4.15-4.23 (odds), 4.29b, 4.33 (use R)

Simple Linear Regression Homework Solutions
50% (2)
Simple Linear Regression Homework Solutions
6 pages
Lec 09
No ratings yet
Lec 09
14 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Linear Regression II
No ratings yet
Linear Regression II
54 pages
1 - Simple Linear Regression
No ratings yet
1 - Simple Linear Regression
43 pages
Regression Analysis I
No ratings yet
Regression Analysis I
46 pages
Unit 2 - Class 3-Al-830
No ratings yet
Unit 2 - Class 3-Al-830
22 pages
Linear Regression Lecture Notes
100% (2)
Linear Regression Lecture Notes
228 pages
Chapter4 - Part 2
No ratings yet
Chapter4 - Part 2
37 pages
S1 Chapter 5 Regression
No ratings yet
S1 Chapter 5 Regression
20 pages
Topic 9: 9.1 Objectives
No ratings yet
Topic 9: 9.1 Objectives
16 pages
OpenStax Chapter 12 Power Point
No ratings yet
OpenStax Chapter 12 Power Point
81 pages
Applied General Statistics (HIS 223)
No ratings yet
Applied General Statistics (HIS 223)
35 pages
Road Signs and Traffic Signal
No ratings yet
Road Signs and Traffic Signal
300 pages
Unit 2 - Scatterplots Correlation and Regression Summer 2021
No ratings yet
Unit 2 - Scatterplots Correlation and Regression Summer 2021
43 pages
Unit 21
No ratings yet
Unit 21
66 pages
Chapter 12 Notes
No ratings yet
Chapter 12 Notes
60 pages
Regression Presentation
No ratings yet
Regression Presentation
20 pages
Longitudinal Data Analysis Guide
No ratings yet
Longitudinal Data Analysis Guide
51 pages
Tugas Ridho 2.100-2.102
No ratings yet
Tugas Ridho 2.100-2.102
9 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
SLR Inference
No ratings yet
SLR Inference
33 pages
Chapter 10 - 2 - 2
No ratings yet
Chapter 10 - 2 - 2
33 pages
12 Gen Ch3 Least Squares Regression Notes 2024
No ratings yet
12 Gen Ch3 Least Squares Regression Notes 2024
17 pages
Chapter 8 Regression Model - 2023
No ratings yet
Chapter 8 Regression Model - 2023
21 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
Waters D. Chapter 9
No ratings yet
Waters D. Chapter 9
33 pages
Efectos de Interacción
No ratings yet
Efectos de Interacción
30 pages
Regression Analysis Techniques
No ratings yet
Regression Analysis Techniques
16 pages
Statistical Analysis: Linear Regression
No ratings yet
Statistical Analysis: Linear Regression
36 pages
Econometrics II: Outliers & Influence
No ratings yet
Econometrics II: Outliers & Influence
71 pages
Lecture 4.3 Regression-1
No ratings yet
Lecture 4.3 Regression-1
30 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
@regression
No ratings yet
@regression
33 pages
Linear Regression Analysis and Least Square Methods
No ratings yet
Linear Regression Analysis and Least Square Methods
65 pages
MGMT1050 L04 Regression W17
No ratings yet
MGMT1050 L04 Regression W17
61 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
Econometrics: Linear Regression Basics
No ratings yet
Econometrics: Linear Regression Basics
52 pages
Computer Project - Student Choose Data
No ratings yet
Computer Project - Student Choose Data
4 pages
Linear Regression & Correlation Analysis
No ratings yet
Linear Regression & Correlation Analysis
10 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
Asset v1 - Indic AI+PR103+2020 - T3+type@asset+block@1 Running Linear Regression in R
No ratings yet
Asset v1 - Indic AI+PR103+2020 - T3+type@asset+block@1 Running Linear Regression in R
74 pages
3-4 CLRM
No ratings yet
3-4 CLRM
87 pages
Practical Linear Regression Guide
No ratings yet
Practical Linear Regression Guide
162 pages
Third, Regression Analysis Predicts Trends and Future Values
No ratings yet
Third, Regression Analysis Predicts Trends and Future Values
2 pages
Statistical Analysis (SM 901B) Unit 2 - Regression: Goonjan Jain Department of Applied Mathematics DTU
No ratings yet
Statistical Analysis (SM 901B) Unit 2 - Regression: Goonjan Jain Department of Applied Mathematics DTU
19 pages
What Is Empirical - Models
No ratings yet
What Is Empirical - Models
14 pages
10 Regression Analysis
No ratings yet
10 Regression Analysis
55 pages
Unit 6
No ratings yet
Unit 6
17 pages
8-Simple Regression Analysis
No ratings yet
8-Simple Regression Analysis
9 pages
Looking at Data: Relationships: Least-Squares Regression
No ratings yet
Looking at Data: Relationships: Least-Squares Regression
23 pages
Ch. 3 Review Packet
No ratings yet
Ch. 3 Review Packet
9 pages
Lect W4m08ab f2023
No ratings yet
Lect W4m08ab f2023
8 pages
STAB27
No ratings yet
STAB27
51 pages
Additional Material - Linear Regression
No ratings yet
Additional Material - Linear Regression
11 pages
Class 10 Multilevel Models
No ratings yet
Class 10 Multilevel Models
42 pages
01 - Intro
No ratings yet
01 - Intro
47 pages
Unit 1 - Class 2 - 1130 (Riley)
No ratings yet
Unit 1 - Class 2 - 1130 (Riley)
39 pages
Econometric Analysis Exam 2019
No ratings yet
Econometric Analysis Exam 2019
4 pages
Driving Book Farsi 2 2
No ratings yet
Driving Book Farsi 2 2
152 pages
Unit 2 - Class 2-Al-830
No ratings yet
Unit 2 - Class 2-Al-830
27 pages
Unit 1 - Class 4-Al-830
No ratings yet
Unit 1 - Class 4-Al-830
25 pages
Unit 1 - Class 5-Al-830
No ratings yet
Unit 1 - Class 5-Al-830
12 pages
Data Science 03 - Regression PDF
No ratings yet
Data Science 03 - Regression PDF
32 pages
OLS Assumptions & Issues Guide
No ratings yet
OLS Assumptions & Issues Guide
4 pages

Unit 2 - Class 4

Uploaded by

Unit 2 - Class 4

Uploaded by

Unit 2

Note! This is NOT the height of a particular cherry tree. Instead, it is an

(i.e., here a value between about 8 and 20)

(i.e., here a value outside of 8 and 20)

Interpolation tends to be a fairly safe

In more than one dimension, it is an outlier in ________________ or in

The least squares

(the red line is the original)

The least squares

(the red line is the original)

The least squares

(the red line is the original)

Experience = c(5, 7, 9, 11, 13, 15, 17, 19)

Look at that high correlation!

Experience = c(5, 7, 9, 11, 13, 15, 17, 19)

And if we look at the correlation between Non-University-Educated Wages

A lurking variable is a variable that is not one of the explanatory or response

This is why CORRELATION DOES NOT IMPLY CAUSATION!

What lurking variable(s) might we need to consider?

You might also like