1 Inferetial Statistics
Inference Table
Population Parameter Sample Statistics
µ x̄
σ s
p p̂
Mean: P P
x x
µ= , x̄ =
N n
Variance:
(x − µ)2 (x − x̄)2
P P
2 2
σ = , s =
N n−1
Empirical Rule
For µ ± σ, 2σ, 3σ, the bell shaped symmetric distribution approximately coves 68%, 95% and 99.7%
of all.
Chebychev’s Theorem
1
P (µ − kσ < x < µ + kσ) ≥ 1 − , for any random variable X
k2
Identify Outliers and Unusual Events
By Mean and Variance:
< µ − 2σ, > µ + 2σ
By Quartile:
< Q1 − 1.5IQR, > Q3 + 1.5IQR
2 Probability
Type of Probability and Probability Rules In Symbols
Number of outcomes in event E
Classical Probability P (E) = Number of outcomes in sample space
Empirical Probability P (E) = Frequency of event E
Total frequency = nf
Range of Probabilities Rule 0 ≤ P (E) ≤ 1
Complementary Events P (E ′ ) = 1 − P (E)
P (A and B) = P (A) · P (B | A)
Multiplication Rule
P (A and B) = P (A) · P (B) Independent events
P (A or B) = P (A) + P (B) − P (A and B)
Addition Rule
P (A or B) = P (A) + P (B) Mutually exclusive
Bayes Rule
P (A) · P (B | A)
P (A | B) =
P (A) · P (B | A) + P (A′ ) · P (B | A′ )
1
3 Random Variable
Discrete Random Variable
Mean/Expected Value: X
E(X) = µ = xP (x)
Variance: X 2
V ar(X) = σ 2 = (x − µ)2 P (x) = E(X 2 ) − E(X)
Common Discrete Distribution
Discrete
Probability X ∼ Binomial(n, p) X ∼ Geometric(p) X ∼ Poisson (µ)
Distribution
X = the # of success X = the # of the first X = the # of an
Random Variable
during n trials successful trial event occurs
Values X = 0, 1, 2, . . . , n X = 1, 2, 3, . . . X = 0, 1, 2, 3, . . .
n = # of all trials
p = P ( success )
Parameters p = P ( success) µ = mean
q = 1 − p = P ( failure )
q = 1 − p = P ( failure )
Probability µx e−µ
P (x) = Cnx px q n−x P (x) = pq x−1 P (x) = x!
Mass Function (p.m.f.)
1
Mean µ = np µ= p µ=µ
q
Variance σ 2 = npq σ2 = p2
σ2 = µ
Continuous Random Variable
Probability:
Z b Z a
P (a ≤ X ≤ b) = f (x)dx, P (X = a) = f (x)dx = 0
a a
Normal Standized:
X −µ X̄ − µ
Z= , Z= √ , ∼ N (0, 1)
σ σ/ n
Normal Approximation for Binomial
(a+√12 )−np
P (X ≤ a) ≈ P z< npq
( )−np a− 21
P (X ≥ a) ≈ P z > npq
√
(a−√12 )−np (b+√12 )−np
P (a ≤ X ≤ b) ≈ P npq <z< npq
4 Confidence Interval
Confidence interval for µ
x̄ ± Zc √σn
σ known
Confidence interval for µ
x̄ ± tc √sn , d.f. = n − 1
σ unknown
Confidence interval for
q
p̂ ± Zc p̂q̂
n
p
r r
Confidence interval for (n−1)s2 2
χ2R
< σ < (n−1)sχ2
, d.f. = n − 1
σ L
2
5 Hypothesis Testing
Hypothesis Testing for One Sample Test
Hypothesis test for µ x̄−µ
z= √
σ/ n
σ known
Hypothesis test for µ x̄−µ
t= √ ,
s/ n
d.f. = n − 1
σ unknown
Hypothesis test for
z = √p̂−p
p pq/n
Hypothesis test for (n−1)s2
χ2 = , d.f. = n − 1
σ2 σ2
Hypothesis Testing for Two Sample Test
Two Sample z-Test for the Difference Between Means (independent samples)
s
(x̄1 − x̄2 ) − (µ1 − µ2 ) σ12 σ22
z= , σx̄1 −x̄2 = +
σx̄1 −x̄2 n1 n2
Two Sample t-Test for the Difference Between Means, variances equal (independent samples/pooled
t-Test)
r s
(x̄1 − x̄2 ) − (µ1 − µ2 ) 1 1 (n1 − 1)s21 + (n2 − 1)s22
t= , sx̄1 −x̄2 = σ̂ + , σ̂ = , d.f. = n1 +n2 −2
sx̄1 −x̄2 n1 n2 n1 + n2 − 2
Two Sample t-Test for the Difference Between Means, variances unequal (independent samples)
s
(x̄1 − x̄2 ) − (µ1 − µ2 ) s21 s2
t= , sx̄1 −x̄2 = + 2 , d.f. = smaller of n1 − 1 and n2 − 1
sx̄1 −x̄2 n1 n2
Two Sample t-Test for the Difference Between Means (dependent samples/paried t-Test)
d¯ − µd
di = x1i − x2i , t= √ , d.f. = n − 1
sd / n
Two-Sample z-Test for the Difference Between Proportions (independent samples)
s
(p̂1 − p̂2 ) − (p1 − p2 ) 1 1 x1 + x2
z= , σp̂1 −p̂2 = p̄q̄ + , p̄ =
σp̂1 −p̂2 n1 n2 n1 + n2
6 Simple Linear Regression
Linear Model
P P P P P
n xy − ( x) ( y) y x
ŷ = mx + b, m= , b = ȳ − mx̄ = −m
n x2 − ( x)2 n n
P P
Correlation Coefficients P P P
xy − ( x)( y)
n
r=p P p P
n x2 − ( x)2 n y 2 − ( y)2
P P
The t-Test for the Correlation Coefficient
r r
t= =p , d.f. = n − 2
σr 2
(1 − r )/(n − 2)