18-Dec If F: D+E
If G: C+E
Population/Parameter; Sample/Statistic
Qualitative: nominal, ordinal
Quantitative: continuous, discrete
Numeric Representation: Measures of Center, Position, and Spread
Arithmetic mean μ = (Σ x) / N < P vs. S > x̄ = (Σ x) / n
Geometric mean nth √ (n1)(n2)…(nn)
Harmonic mean reciprocate, arithmetic mean, reciprocate
Quartiles divide by 4. Whole #, take average of n & n+1. Fraction #, round up.
IQR Q3 - Q1
SIQR IQR/2
Midrange (H - L ) / 2
Rel. pos. from median (x - median) / SIQR
Mid-Quartile range (Q1 + Q3) / 2
Standard Deviation & Bivariate Analysis
Deviation x-μ < P vs. S > x - x̄
SSX Σ [(x - x̄)2] or Σ (x2) - ((Σ x)2) / n)
SSY Σ [(y - y̅)2] or Σ (y2) - ((Σ y)2) / n)
SSXY Σ [(x - x̄ )(y - y̅)] or Σ (xy) - ((Σ x)(Σ y)) / n)
Variance X σ2 = (Σ (x - x̄)2) / N < P vs. S > s2 = (Σ (x - x̄)2) / (n - 1)
σ2 = SSX / N s2 = SSX / (n - 1)
Variance Y σ2 = (Σ (y - y̅)2) / N < P vs. S > s2 = (Σ (y - y̅)2) / (n - 1)
σ2 = SSY / N s2 = SSY / (n - 1)
Covariance SSXY / N < P vs. S > SSXY / (n - 1)
Standard Deviation σ < P vs. S > s
Z Score (x - μ) / σ < P vs. S > (x - x̄ ) / s
Chebyshev's Theorem 1 - (1 / k2)
Coefficient of Variation (sd rel. to mean) σ/μ < P vs. S > s / x̄
Empirical Rule 68 / 95 / 99.7
platy-, meso-, leptokurtic
correlation coefficient ρ = (covariance) / (σx)(σy) < P vs. S > r = (covariance) / (sx)(sy)
ρ = SSXY / √(SSX)(SSY) < P vs. S > r = SSXY / √(SSX)(SSY)
coefficient of determination ρ2 < P vs. S > r2
ρ2 = 1- SSE/SSY < P vs. S > r2 = 1 - SSE/SSY
% of Y variation due to error (ANOVA) 1 - ρ2 = SSE/SSY < P vs. S > 1 - r2 = SSE/SSY
Regression
Regression (best fit) line ŷ = b0 + b1x
slope b1 = SSXY / SSX
y-intercept b0 = y̅ - b1x̄ or b0 = (Σy - b1 Σx) / n
Errors e = y - ŷ
Errors squared e2 = (y - ŷ)2
SSE Σ [(y - ŷ)2] or SSY - b1SSXY
Standard Error of est. (se) √(SSE / (n - 2)) df = n - 2, b/c 2 terms (b0 and b1) are estimated
Assumptions: linearity, independence, normality, homoscedasticity, no outliers
Further Topics in Regression
CI (sample estimate) ± t (se of the estimate)
slope sample estimate b1 seb1 = (se / √SSX)
y-intercept sample estimate b 0 seb0 = (se / √SSX) * (√(1/n) +(x2/SSX))
average Y Confidence Interval < average y Prediction Interval
T-test stat for hyp for a pop. slope t* = (b1 - β1) / sb1
F-test stat for hyp for a pop. slope MSR/MSE = (SSR / df R) / (SSE / df E)
MSR (regression mean square) SSR
MSE (mean square error) SSE / (n - 2)
Multiple Regression
Regression (best fit) line ŷ = b0 + b1x1 + b2x2 + b3x3+...
b0 = average y value when all x values = 0
b1 = average change in y vs. x when all other x values stay fixed
coeff. of multiple determination r2 = SSR/SST
rADJ2 1 - [(1 - r2) * ((n - 1)/ (n - k- 1)], where K is # of independent variables
Standard Error of est. (se) √(SSE / (n - k)), for k terms in the model
Probability
Marginal probability: values in margins
Conditional probability: probability of A given B
Independence P(A|B)= P(A)
Non-independent without replacement
If events are mutually exclusive, they affect each other, and are dependent
Bernoulli Trial p + q = 1 (Random experiment with 2 outcomes)
Bayes' Theorem P(A|B)= (P(B|A) * P(A))/ P(B) or P(A|B) = (P(A & B))/ P(B)
Expected Value E(x) = μ = Σ(xp)
Variance σ2 = Σ(x2p) - μ2
Standard Deviation σ
Permutations nPx = n! / (n - x)!
Combinations nCx = n! / (x!(n - x)!)
Probability P(x successes) = nCx(p)x(q)n-x
Binomial model
-- mean μ = np
-- variance σ2 = npq
-- Standard Deviation σ
P(z test) x-μ/σ
Sampling Distributions, per CLT
-- mean μx = μ
-- variance σx̅2 = σ2 / n
-- St. Deviation / St. Error σx̅ = σ / √n
Z Score (for x̄ values) (x̄ - μ) / σx or (x̄ - μ) / (σ / √n)
Solve for x̄ values x̄ = μ + Zσx or x̄ = μ + Z(σ / √n)
Sample Size for means n = (Za/2 * (σ / E)) 2
Margin of Error (E) for means Za/2 (σ / √n)
CI for unknown pop. μ x̄ - Za/2 (σ / √n) < μ < x̄ + Za/2 (σ / √n)
Sample Size for proportions n = p̂q̂ (Za/2 / E)2
Margin of Error (E) for proportions Za/2(√(p̂q̂ / n))
CI for unknown pop. proportion p̂ - Za/2(√(p̂q̂ / n)) < P < p̂ + Za/2(√(p̂q̂ / n))
Confidence Intervals 90 / 95 / 99: 1.645 / 1.96 / 2.576
Correction factor to stand. error √((N - n) / (N - 1))
Hypothesis Testing
Type I Error a = probability of Type I error (reject true H O): significance of test
Type II Error b = probability of Type II error (not reject false H O): [] / operating curve
1 - b = probability of rejecting false H O: power of test / power curve
z-test statistic for pop. mean z* = (x̄ - μO) / (σ / √n)
z-test statistic for pop. prop. z* = (p̂ - pO) / √((pOqO) / n )
t-test statistic (σ unknown) t* = (x̄ - μO) / (s / √n)
CI for t dist x̄ - t (s / √n) < μ < x̄ + t (s / √n) df = n - 1
Two-Sample Tests
Cp. means of 2 diff. gps. HO: μ1 = μ2
Pooled est. of equal var. Sp2 = ((n1 - 1)S12 + (n2 - 1)S22) / (n1 + n2 - 2)
Pooled var. t-test for diff bt 2 means t* = (x̄ 1 - x̄ 2) - (μ1 - μ2) / √ (Sp2 * ((1 / n1) + (1 / n2)))
CI for diff. b/t means of 2 indep. Pop (x̄ 1 - x̄ 2) - tα/2√ (Sp2 * ((1 / n1) + (1 / n2))) < (μ1 - μ2) < (x̄ 1 - x̄2) + tα/2√ (Sp2 * ((1 / n1) + (1 / n2)))
Cp. means of 2 rel./paired gps. HO: D = 0
D = average difference among all pairs use t-distribution
d̅ = average difference among sample pairs use t-distribution
Paired t-test for mean difference t* = (d̅ - μD) / (sD / √n) df = n - 1
CI for Mean diff d̅ - tα/2(sD / √n ) < μD < d̅ + tα/2(sD / √n )
Cp. proportions of 2 diff. gps. HO: P1 - P2 = 0
Two-proportion z-test, pooled z* = ((p̂ 1 - p̂ 2) - (π1 - π2)) / √((p̅(1 -p̅)) * ((1 / n1) + (1 / n2)))
p̅ = pooled estimate (x1 + x2) / (n1 + n2)
CI for diff b/ 2 prop. (p1 - p2) - zα/2 / √((p1(1-p1) / n1) + (p2(1-p2) / n2))
Two-sample F test for equality of variances F* = S12 / S22
SST (Total variance) SSR (variance explained by model) + SSE (natural variation)