Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
20 views3 pages

Formulas

The document provides a comprehensive overview of statistical concepts, including measures of central tendency, variance, standard deviation, regression analysis, and hypothesis testing. It covers both single and multiple regression, probability theory, sampling distributions, and confidence intervals. Additionally, it addresses errors in hypothesis testing and various statistical tests for comparing means and proportions.

Uploaded by

yuliewong5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views3 pages

Formulas

The document provides a comprehensive overview of statistical concepts, including measures of central tendency, variance, standard deviation, regression analysis, and hypothesis testing. It covers both single and multiple regression, probability theory, sampling distributions, and confidence intervals. Additionally, it addresses errors in hypothesis testing and various statistical tests for comparing means and proportions.

Uploaded by

yuliewong5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 3

18-Dec If F: D+E

If G: C+E
Population/Parameter; Sample/Statistic
Qualitative: nominal, ordinal
Quantitative: continuous, discrete
Numeric Representation: Measures of Center, Position, and Spread
Arithmetic mean μ = (Σ x) / N < P vs. S > x̄ = (Σ x) / n
Geometric mean nth √ (n1)(n2)…(nn)
Harmonic mean reciprocate, arithmetic mean, reciprocate
Quartiles divide by 4. Whole #, take average of n & n+1. Fraction #, round up.
IQR Q3 - Q1
SIQR IQR/2
Midrange (H - L ) / 2
Rel. pos. from median (x - median) / SIQR
Mid-Quartile range (Q1 + Q3) / 2
Standard Deviation & Bivariate Analysis
Deviation x-μ < P vs. S > x - x̄
SSX Σ [(x - x̄)2] or Σ (x2) - ((Σ x)2) / n)
SSY Σ [(y - y̅)2] or Σ (y2) - ((Σ y)2) / n)
SSXY Σ [(x - x̄ )(y - y̅)] or Σ (xy) - ((Σ x)(Σ y)) / n)
Variance X σ2 = (Σ (x - x̄)2) / N < P vs. S > s2 = (Σ (x - x̄)2) / (n - 1)
σ2 = SSX / N s2 = SSX / (n - 1)
Variance Y σ2 = (Σ (y - y̅)2) / N < P vs. S > s2 = (Σ (y - y̅)2) / (n - 1)
σ2 = SSY / N s2 = SSY / (n - 1)
Covariance SSXY / N < P vs. S > SSXY / (n - 1)
Standard Deviation σ < P vs. S > s
Z Score (x - μ) / σ < P vs. S > (x - x̄ ) / s
Chebyshev's Theorem 1 - (1 / k2)
Coefficient of Variation (sd rel. to mean) σ/μ < P vs. S > s / x̄
Empirical Rule 68 / 95 / 99.7
platy-, meso-, leptokurtic
correlation coefficient ρ = (covariance) / (σx)(σy) < P vs. S > r = (covariance) / (sx)(sy)
ρ = SSXY / √(SSX)(SSY) < P vs. S > r = SSXY / √(SSX)(SSY)
coefficient of determination ρ2 < P vs. S > r2
ρ2 = 1- SSE/SSY < P vs. S > r2 = 1 - SSE/SSY
% of Y variation due to error (ANOVA) 1 - ρ2 = SSE/SSY < P vs. S > 1 - r2 = SSE/SSY
Regression
Regression (best fit) line ŷ = b0 + b1x
slope b1 = SSXY / SSX
y-intercept b0 = y̅ - b1x̄ or b0 = (Σy - b1 Σx) / n

Errors e = y - ŷ
Errors squared e2 = (y - ŷ)2
SSE Σ [(y - ŷ)2] or SSY - b1SSXY
Standard Error of est. (se) √(SSE / (n - 2)) df = n - 2, b/c 2 terms (b0 and b1) are estimated
Assumptions: linearity, independence, normality, homoscedasticity, no outliers
Further Topics in Regression
CI (sample estimate) ± t (se of the estimate)
slope sample estimate b1 seb1 = (se / √SSX)
y-intercept sample estimate b 0 seb0 = (se / √SSX) * (√(1/n) +(x2/SSX))

average Y Confidence Interval < average y Prediction Interval


T-test stat for hyp for a pop. slope t* = (b1 - β1) / sb1
F-test stat for hyp for a pop. slope MSR/MSE = (SSR / df R) / (SSE / df E)
MSR (regression mean square) SSR
MSE (mean square error) SSE / (n - 2)
Multiple Regression
Regression (best fit) line ŷ = b0 + b1x1 + b2x2 + b3x3+...
b0 = average y value when all x values = 0
b1 = average change in y vs. x when all other x values stay fixed
coeff. of multiple determination r2 = SSR/SST
rADJ2 1 - [(1 - r2) * ((n - 1)/ (n - k- 1)], where K is # of independent variables
Standard Error of est. (se) √(SSE / (n - k)), for k terms in the model
Probability
Marginal probability: values in margins
Conditional probability: probability of A given B
Independence P(A|B)= P(A)
Non-independent without replacement
If events are mutually exclusive, they affect each other, and are dependent
Bernoulli Trial p + q = 1 (Random experiment with 2 outcomes)
Bayes' Theorem P(A|B)= (P(B|A) * P(A))/ P(B) or P(A|B) = (P(A & B))/ P(B)
Expected Value E(x) = μ = Σ(xp)
Variance σ2 = Σ(x2p) - μ2
Standard Deviation σ
Permutations nPx = n! / (n - x)!
Combinations nCx = n! / (x!(n - x)!)
Probability P(x successes) = nCx(p)x(q)n-x
Binomial model
-- mean μ = np
-- variance σ2 = npq
-- Standard Deviation σ
P(z test) x-μ/σ
Sampling Distributions, per CLT
-- mean μx = μ
-- variance σx̅2 = σ2 / n
-- St. Deviation / St. Error σx̅ = σ / √n
Z Score (for x̄ values) (x̄ - μ) / σx or (x̄ - μ) / (σ / √n)
Solve for x̄ values x̄ = μ + Zσx or x̄ = μ + Z(σ / √n)
Sample Size for means n = (Za/2 * (σ / E)) 2

Margin of Error (E) for means Za/2 (σ / √n)


CI for unknown pop. μ x̄ - Za/2 (σ / √n) < μ < x̄ + Za/2 (σ / √n)
Sample Size for proportions n = p̂q̂ (Za/2 / E)2
Margin of Error (E) for proportions Za/2(√(p̂q̂ / n))
CI for unknown pop. proportion p̂ - Za/2(√(p̂q̂ / n)) < P < p̂ + Za/2(√(p̂q̂ / n))
Confidence Intervals 90 / 95 / 99: 1.645 / 1.96 / 2.576
Correction factor to stand. error √((N - n) / (N - 1))
Hypothesis Testing
Type I Error a = probability of Type I error (reject true H O): significance of test
Type II Error b = probability of Type II error (not reject false H O): [] / operating curve
1 - b = probability of rejecting false H O: power of test / power curve

z-test statistic for pop. mean z* = (x̄ - μO) / (σ / √n)


z-test statistic for pop. prop. z* = (p̂ - pO) / √((pOqO) / n )

t-test statistic (σ unknown) t* = (x̄ - μO) / (s / √n)


CI for t dist x̄ - t (s / √n) < μ < x̄ + t (s / √n) df = n - 1
Two-Sample Tests
Cp. means of 2 diff. gps. HO: μ1 = μ2
Pooled est. of equal var. Sp2 = ((n1 - 1)S12 + (n2 - 1)S22) / (n1 + n2 - 2)
Pooled var. t-test for diff bt 2 means t* = (x̄ 1 - x̄ 2) - (μ1 - μ2) / √ (Sp2 * ((1 / n1) + (1 / n2)))
CI for diff. b/t means of 2 indep. Pop (x̄ 1 - x̄ 2) - tα/2√ (Sp2 * ((1 / n1) + (1 / n2))) < (μ1 - μ2) < (x̄ 1 - x̄2) + tα/2√ (Sp2 * ((1 / n1) + (1 / n2)))
Cp. means of 2 rel./paired gps. HO: D = 0
D = average difference among all pairs use t-distribution
d̅ = average difference among sample pairs use t-distribution
Paired t-test for mean difference t* = (d̅ - μD) / (sD / √n) df = n - 1
CI for Mean diff d̅ - tα/2(sD / √n ) < μD < d̅ + tα/2(sD / √n )
Cp. proportions of 2 diff. gps. HO: P1 - P2 = 0
Two-proportion z-test, pooled z* = ((p̂ 1 - p̂ 2) - (π1 - π2)) / √((p̅(1 -p̅)) * ((1 / n1) + (1 / n2)))
p̅ = pooled estimate (x1 + x2) / (n1 + n2)
CI for diff b/ 2 prop. (p1 - p2) - zα/2 / √((p1(1-p1) / n1) + (p2(1-p2) / n2))
Two-sample F test for equality of variances F* = S12 / S22
SST (Total variance) SSR (variance explained by model) + SSE (natural variation)

You might also like