Chapter 14: Correlation
What happens to one variable when another one changes?
Why do we care about correlation?
This answers the question in knowing if (not why) any two variables are related. On graphs, one
dot represents each observation for two variables
With a scatterplot, we can see lots of dots on a graph, but we can characterize how we see the
pattern of the dots and we use line graphs to help us
Positive correlation – both variables move in the same direction, can be increasing or
decreasing. They move in the same direction
Negative correlation – the two variables move in different directions
No correlation – one variable changes but the other does not. No relationship between the two
Perfect correlation – all the points are on the same line, and you can perfectly predict one
variable if you know the value of the other variable
Variance is the average squared deviations from the mean, while standard deviation is the
square root of this number. Both measures reflect variability in a distribution, but their units
differ: Standard deviation is expressed in the same units as the original values
Correlation coefficient for interval/ratio data:
The Pearson’s product-moment correlation coefficient, used for interval/ratio data. It is a
coefficient because it is an estimated number to represent the relationship between two
variables
o Requires variables to be normally distributed because we are using the mean and
standard deviation in its calculations, so those statistics need to be meaningful
Covariance is a measure of how much two variable co-vary together
o If one goes up and the other goes up, they are positively covary and vary together
o If one goes down and the other goes up, they are negatively covary. Still vary together
because you can predict one value know the first value
Here we multiply the deviation in one variable by the deviation of the other. As we are
multiplying two different numbers together, they could be positive or negative
If you have a lot of observation, there may be high covariance, this is why we divide by the
sample size to get the enragement of covariance. We use this to make predictions
If COVxy is greater than zero, then on average the two variables move in the same direction. If
COVxy is negative, then the two variables move in the opposite direction, and if COVxy is close
to zero then on average, the variables do not move together because sometimes the deviations
are in the same direction and other times the opposite direction
Every time the two variables move in the same direction, they move in the opposite direction
for the following observation. You cannot make predictions for one variable without knowing
the value of the other variable
Covariance is an absolute measure of association, and we need to control for the amount of
covariation that we would “expect” there to be for each set of variables
Overlap represents shared variation, or covariation the more circles that overlap, the greater
their covariation and subsequently, their correlation
Another way to interpret how strong this relationship is, is to calculate the slope of the line that
would have been in the parts
You can have a line that has a lope of +1 or -1 and have not each point on a single line, they can
have a little bit of variation
Nonparametric correlation coefficient:
Used when the data is not normally distributed
Here you rank all of the values for each variable, X and Y. The rankings will not be the same for
each observation, and this will only work when you have a perfect positive correlation between
the two variables
Last thing we do is testing our null and alternative hypothesis
Application using textbook data:
When our variables are not normally distributed we need to use the nonparametric Spearman’s
correlation coefficient which are calculated with statistics that have expected properties