Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
231 views2 pages

Correlation vs. Regression Basics

The document discusses the differences between correlation and linear regression. Correlation quantifies the relationship between two variables without fitting a line to the data, while linear regression finds the best fitting line to predict one variable from the other. The type of data, which variable is independent and dependent, and assumptions of each method are compared.

Uploaded by

wgl.joshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
231 views2 pages

Correlation vs. Regression Basics

The document discusses the differences between correlation and linear regression. Correlation quantifies the relationship between two variables without fitting a line to the data, while linear regression finds the best fitting line to predict one variable from the other. The type of data, which variable is independent and dependent, and assumptions of each method are compared.

Uploaded by

wgl.joshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

What is the goal?

Correlation quantifies the degree to which two variables are related. Correlation does not fit a line through
the data points. You simply are computing a correlation coefficient (r) that tells you how much one variable
tends to change when the other one does. When r is 0.0, there is no relationship. When r is positive, there
is a trend that one variable goes up as the other one goes up. When r is negative, there is a trend that
one variable goes up as the other one goes down.

Linear regression finds the best line that predicts Y from X. Correlation does not fit a line.

What kind of data?

Correlation is almost always used when you measure both variables. It rarely is appropriate when one
variable is something you experimentally manipulate.

Linear regression is usually used when X is a variable you manipulate (time, concentration, etc.)

Does it matter which variable is X and which is Y?

With correlation, you don't have to think about cause and effect. It doesn't matter which of the two
variables you call "X" and which you call "Y". You'll get the same correlation coefficient if you swap the
two.

The decision of which variable you call "X" and which you call "Y" matters in regression, as you'll get a
different best-fit line if you swap the two. The line that best predicts Y from X is not the same as the line
that predicts X from Y (however both those lines have the same value for R 2)

Assumptions

The correlation coefficient itself is simply a way to describe how two variables vary together, so it can be
computed and interpreted for any two variables. Further inferences, however, require an additional
assumption -- that both X and Y are measured, and both are sampled from Gaussian distributions. This is
called a bivariate Gaussian distribution. If those assumptions are true, then you can interpret the
confidence interval of r and the P value testing the null hypothesis that there really is no correlation
between the two variables (and any correlation you observed is a consequence of random sampling).

With linear regression, the X values can be measured or can be a variable controlled by the experimenter.
The X values are not assumed to be sampled from a Gaussian distribution. The vertical distances of the
points from the best-fit line (the residuals) are assumed to follow a Gaussian distribution, with the SD of
the scatter not related to the X or Y values.

Relationship between results

Correlation computes the value of the Pearson correlation coefficient, r. Its value ranges from -1 to +1.

Linear regression quantifies goodness of fit with r2, sometimes shown in uppercase as R2. If you put the
same data into correlation (which is rarely appropriate; see above), the square of r from correlation will
equal r2 from regression.

You might also like