Correlation
Correlation is a statistical measure that expresses the extent to which two variables are
linearly related.
It shows how one variable changes with another.
Correlation ≠ Causation
(Just because two things are correlated doesn't mean one causes the other.)
Direction:
Positive Correlation:
Both variables increase/decrease together.
E.g., height and weight.
Negative Correlation:
One variable increases, the other decreases.
E.g., price and demand.
Zero Correlation:
No relationship between the variables.
Value of r Degree
r = +1 Perfect Positive Correlation
r = -1 Perfect Negative Correlation
r close to ±1 Strong Correlation
r close to 0 Weak or No Correlation
Factors affecting Pearson Correlation
Linearity of Relationship:
Pearson’s r measures only linear relationships.
Non-linear relationships can give misleading results.
Outliers:
Extreme values can distort the correlation value significantly.
Range of Data:
Restricting the range can weaken the apparent correlation.
Measurement Error:
Errors in data collection reduce reliability of r.
Sample Size:
Small samples may produce unstable r values.
1.
Spearman Correlation
Used when:
Data is ordinal or in ranks.
Relatioship is non-linear or not normally distributed.
1.Values range between -1 and +1
2.Not affected by extreme values (robust to outliers)
3.Suitable for qualitative or ranked data
Simple Linear Regression
Regression estimates the value of one variable (dependent) based on another (independent).
Meaning of Regression Coefficient (b):
Indicates rate of change in Y with respect to X.
Sign shows direction (positive/negative).
Relationship between Regression and Correlation:
Regression is based on causal relationship (X predicts Y).
Pearson’s r is used to calculate regression coefficient.
Use correlation to understand association.
Use regression to make predictions.
Always analyze the type of data, outliers, and relationship before applying these tools.
Pearson
Pearson's r measures the strength and direction of the linear relationship between two
continuous (quantitative) variables.
Assumptions:
Both variables are quantitative and measured on an interval/ratio scale.
Relationship is linear.
No significant outliers.
Both variables are normally distributed (bivariate normality).
If the heights and weights of students are analyzed, Pearson's r will tell how strongly the two
are linearly related.