Statistical Analysis Example with
Mathematical Steps
This document includes a fully worked example covering correlation, regression, observed
vs expected values, chi-square test, and hypothesis testing using the alpha (α) level.
Step 1: Dataset
We will use the following dataset of 5 students (for simplicity):
Student | Hours Studied (X) | Exam Score (Y)
--------|-------------------|----------------
1 |2 | 50
2 |4 | 65
3 |6 | 75
4 |8 | 85
5 | 10 | 95
Step 2: Correlation Coefficient (r)
Formula:
r = Σ[(X - X̄ )(Y - Ȳ)] / sqrt[Σ(X - X̄ )² * Σ(Y - Ȳ)²]
Mean of X = (2+4+6+8+10)/5 = 6
Mean of Y = (50+65+75+85+95)/5 = 74
Now compute deviations and products:
X Y (X−X̄ ) (Y−Ȳ) (X−X̄ )(Y−Ȳ) (X−X̄ )² (Y−Ȳ)²
2 50 -4 -24 96 16 576
4 65 -2 -9 18 4 81
6 75 0 1 0 0 1
8 85 2 11 22 4 121
10 95 4 21 84 16 441
Σ(X−X̄ )(Y−Ȳ) = 220
Σ(X−X̄ )² = 40
Σ(Y−Ȳ)² = 1220
r = 220 / sqrt(40 × 1220) ≈ 220 / 220.9 ≈ 0.996
Interpretation: Strong positive correlation.
Step 3: Simple Linear Regression
Regression line: Y = a + bX
b = Σ[(X−X̄ )(Y−Ȳ)] / Σ(X−X̄ )² = 220 / 40 = 5.5
a = Ȳ − bX̄ = 74 − (5.5)(6) = 74 − 33 = 41
So regression equation:
Y = 41 + 5.5X
Example: Predict score for X = 8:
Y = 41 + 5.5×8 = 41 + 44 = 85 → Matches actual score.
Step 4: Observed vs Expected Scores
Use regression line to find expected scores and compare with actual:
X | Actual Y | Predicted Y = 41 + 5.5X | Error
2 | 50 | 52 | -2
4 | 65 | 63 | +2
6 | 75 | 74 | +1
8 | 85 | 85 |0
10| 95 | 96 | -1
Step 5: Chi-Square Test
Hypothesis: Attendance and Result are independent.
Contingency Table:
| Pass | Fail | Total
------------|------|------|------
Low Att. | 1 | 2 | 3
High Att. | 4 | 1 | 5
Total |5 |3 |8
Expected (Low, Pass) = (3×5)/8 = 1.875
Expected (Low, Fail) = (3×3)/8 = 1.125
Expected (High, Pass) = (5×5)/8 = 3.125
Expected (High, Fail) = (5×3)/8 = 1.875
χ² = Σ[(O−E)² / E]
= (1−1.875)²/1.875 + (2−1.125)²/1.125 + (4−3.125)²/3.125 + (1−1.875)²/1.875
≈ 0.408 + 0.680 + 0.245 + 0.408 = 1.741
Degrees of freedom = (2−1)(2−1) = 1
Critical value (α=0.05) = 3.841
Conclusion: 1.741 < 3.841 → Fail to reject H₀ → No significant association.
Step 6: Alpha (α) and Hypothesis Testing
We used α = 0.05 throughout.
- Correlation: r = 0.996 → Strong → Significant
- Regression: Good fit, low error → Significant
- Chi-Square: χ² = 1.741 < 3.841 → Not significant
Final Conclusion: Study hours predict exam scores strongly. However, no strong evidence
was found linking attendance to results.