Statistic Cheat Sheet
Statistic Cheat Sheet
C01: Data Type, Graphs, 1-2 Variables C03: Categorical Data , Distribution (1 Sample) Multiple-Sample Test of Categorical Data: (chi-square table)
Chi-Square Test - Right Skewed Curve %,P +QP +789 * * No CLT
Univariate: Pie/Bar/Pareto Population Mean: # Sample Mean: #!
Constructing Confidence Interval: Calculate the Row & Column Total
1) Let 5 be true proportion of _______
Bivariate: Side by side Bar Population Size: N Sample Size: n
Identify O (Observed Data)
$/ = __% & n = __
Uni: Histogram/Box & Whisker
*Pareto: Cumulative % Line, Highest to Lowest, Variable
*
2) Given:
3) Since n$/ = ____ > 5 & n(1-$/ ) = ___ > 5, by CLT $/ is
Bi: Scatterplot *Histogram Shape: Uniform, Bell, :;<)=;>?@AB;@CD2)=;>?@
Focus on 80%-20% (weed out trivial) Right/Left Skewed Calculate E = ( EF?27)=;>?@
approximately normal
Categorical Numerical 4) For a 90% / 95% C.I for 5 is: Calculate
#G1H&"
I
%,$/ 6 , 7! 8
#)0# #(1)0# &
,9 ,$/ : , 7! 8 *= ( ____% , ____%)
#)0# #(1)0#& 1),;3<,>>>>>,R,>>>>>,STU,VWXUYUWXUWZ[,W\,TU]SZV\W^_VY
" 2 " 2
2) ;(< >>>>>,R,>>>>>,STU,TU]SZUX; related
3) @ + Default is 0.05 (unless stated in qn)
Continuous Discret
Measurable Countable
e 1 Sample Test of Hypothesis of 5 : Right Tail
1) ;3<,5 = , >>> Shaded Area (100% - 4 )
4) Since each E > 5 & since sample size is large enough,
* Left Tailed: ;3<,5 B , >>> ,;(< 5 C , >>>
2) ;(< 5 ? , >>> %a 6 b*+
* 2 Tailed: : ;3<,5 + , >>> ,,;(< 5 D , >>> P +789 + ` QP + #F1(&#J1(& *
c
Nominal Ordinal Interval Ratio Right or Left Tail Area ($/2)
"#
Critical Value
n*, d\We]g^V\W<, o_UTU, V^, UW\gp_, UqVXUWeU,
* *
)*#
Population S.D:
5) Reject ;3 when P(Z > z) < @ +,0.05 (default) - Right Tailed P +4K)789 Z_SZ, >>>>, R, >>>>, STU, XUYUWXUWZ[, TU]SZUX, k,
* Left Tailed: Reject ;3 when P(Z < z) < @ +,0.05
$% ' VWXUYUWXUWZ[, W\, TU]SZV\W^_VY, SZ, >>>r,
+ 6) Under ;3<,5 + Value in (1) * 2 Tailed: Reject ;3 when P(Z >| z|) < + +,0.05
Sample Mean:
e\WsVXUWeU
)4 = Critical Value
*Sample Average
&'(
25% 7) Test: Substitute value into (4), Z = ___
Since P P(Z >___) = ____ (find p-value in table) < @ +,0.05
#$%&"
!" "# ' Range: Max - Min
Sample SD:
8) We reject ;3 (When P is high, Null will Fly)
$%#&' *Five Number Summary C04: Numerical Data , Continuous Distribution (1 Sample)
*Describe: Symmetrical/Left or Right Skewed / Do Not Reject (When P is low, Null will go)
9) Conclusion: There is enough evidence that ______ than that
Population Mean: !
Median: Arrange in order (Middle/Sum of 2 Middle)
Sample Mean: !"
*Degrees of Freedom: The final unknown number is
of _____ at 5% significance level automatically known by default as all numbers add up to
Mode: Most reoccurring
Population S.D: "
certain value. So (X-1) values are FREE to vary
hello
Do Not Reject - There is not enough evidence that ____ than ____ at ___% significance level Sample S.D: s
* 2 Types of Errors
Type I Error: Reject ;3 when ;3 is true (False Positive) Population Size: N Sample Size: n
Type II Error: Not Rejecting ;3 when ;3 is false (False Negative)
C02: Chance
Hello
uGQv%w9 x y * when n B 30
L +
Classical Empirical C04: Categorical Data (2 Samples, Multiple Samples) MN
a) non-normal = Sampling Size (n) > 32 (Normal by CLT)
Base on Assumption (Fair & Independent) Base on Survey
b) non-normal = Sampling Size (n) < 30 (Not Normal)
Outcome equally likely Approximately Test for dinerence between proportions from 2 independent populations c) normal (Bell-Shaped) = For any (n) will still be normally distributed
},~•,€•‚ƒ•
%&
P(2B + 1G) = (B*B*G ), (B*G*B), (G*B*B) P(2B + 1G) = #$# * If no CLT, assume normality & not too skewed
Population Mean: 5( ,R,5+ Let 5( be true proportion of _______ & 5+ be true proportion of
! ! ! !
(B*B*G ) = # ! # ! # = " # #$
Sample Size: E( ,R,E+ 2) ;(< 5( D 5+, *Formulas can be manipulated to find z / w,|'0
3) @ + Default is 0.05 (unless stated in qn)
E.g., uQv%n9 A+)
To Find No. of Combo: nC2 = 10 Combo
E( E+ E( E+
uG L 7! x y*
Right or Left
Parent Population Normal OR L Tail Area ($/2)
Conditional Probability: ‘A Given B’/ ‘event given condition’ N is large enough (normal by CLT) MN
"
> 2 Share Birthday: 1 - P(No One Shares) = 1 - *,- 2 *,- 2 3, 5) Reject ;3 when P(Z >| z|) < + +,0.05 Where 5G + , 2$12"
)4 THEN when % is known
*,- *,. 5 65
- '!K21(
Area under z = p-value area (rejection region) < + ]
8) We reject ;3 *If n not large enough (<30, no CLT but Assume not too skewed)
+'!K21(
A’ (A’) " "
9) We have enough evidence that there is a significant dinerence
between true population of ___ & ___ Why: t distribution arrives with 1 condition, that sampled
population is normally distributed, or at least not too skewed
Interpret Confidence Interval of !"
We are __% confident that _____ is between uG L '!K21( x y. Since lower confidence limit for
/
For 90% / 95% C.I: = ( ____ , ____) " M2
___ is bigger/smaller than upper/lower limit of that for the ___, the true mean of ___ significantly
exceeds that of the ___.
,$/( %A 6 ,$/( * ,$/+%A 6 ,$/+ * A A
%,$/( 6 ,$/+* L , 74 MN : O %,$/( 6 ,$/+* L , 74 M5G%A 6 5G*% : *
,E( ,E+ E( E+
1 Sample Test of Hypothesis of,w : Right Tail
C.I for 5( 6 5+ Let w,be true mean of _______
+ +
2) ;(< w,,
$ "
Since both limits are negative, the true proportion of ____ is smaller than ______. This is because 5( 6 5+ 3) @ + Default is 0.05 (unless stated in qn)
7) Test: Substitute working in (4) = __ >
$%
&' ()
*+,- 9) We have enough evidence that the true
mean ___ is ______
Downloaded by Annie Adora ([email protected])
lOMoARcPSD|15504527
; 7!
:
+ "#
2-Sample Test of Spreads of z(+& z++ (F-test, no CLT but need Normal Distribution) —
the sample correlation coenicient, thus conclude this is both significant & positive
~ &$-
<0 &
[ \ +(7
*Keywords: Test the equality of population variances of ___ & ___
Significance of Slope
Let "7 be true proportion of _______ be true s.d of __ in units & "8 be true s.d of __ in units
5
Evidence: !"(= __ E( = __ ,&ˆ ‰( , + , >>,,,R,,,!" + , + , >>, E+ = __ __ ,&ˆ ‰+ , + , >>, * &() ? , &+ 1),;3< C9 & + &=&+DE&FGEHI/
Unshaded Area (100% - 4 )
1),;3< "9 & + & ": 2) ;(< C9 & D &= (Yes Slope)
5)
2) ;(< "9 & D & ": Reject ;3 when to < ____ 3) @ + Default is 0.05 (unless stated in qn)
3) @ + Default is 0.05 (unless stated in qn) 4) Since p-value = ___ <,@,¡U,iUjUeZ,;3,%¡_UTU,,C7 D I9 C7 ? I*
6) Under ;3< !C + &=
Right or Left
4) Assuming Normality of ____ for ___ & ___, F = $& ~ Š!K) 2 1(V)))2
/ Tail Area ($/2)
/"))
5)Conclusion: Enough evidence of a significantly non-zero slope between Y & that due to X.
" & $& "'$
5) 7) Test: Substitute working in (4) = ___ < ___ As shown by sample correlation coenicient, we know this is a significantly positive/negative
$,
4) Since p-value = ___ <,@,¡U,iUjUeZ,;3,%¡_UTU,CE ,D I9 CD ? I*
"
, ) $*;A B9 7[
˜46
F > ___
2-Sample Test of Means of !7 &'&!8 (After f-test, Uses t-test, Assumed not too skewed in absence of CLT) ,
) 5)Conclusion: Enough evidence of a significantly non-zero true intercept in this regression
)
(1) "7 &'&"8 &are UNKNOWN & (1) "7 &'&"8 &are UNKNOWN &
C06: Numerical Data , Continuous (Multiples Samples) :# 9#
(2) "7 &'&"8 are unequal or diQerent (2) "7 &'&"8 are equal Linear Regression:
7 " # )9$ 5
¥
4J %( . 4J ) +9$ ( .–
¥
8"
* *
:
$$#%&” ' # %& “( ) $*;<=>* + ++&,-7 . & ,- 8 / 0& &$-$ ?-"(8 1238 + 5 //& 8"
Variation in Y that is explained by variation in X
# #
¥ Sample
’+ ’, 47 48 Bigger › , more variability explained by data in model (X)
:
2–
) :
2–
+ Size
) ¥
¥ xx% of variation in Y is explained by variability in X. This constitutes a very strong testimonial to the regression
model’s accounting for variability in Y. Only (100% - xx%) = __% of variability in Y is due to variables absent in
1),;3<,o_U,¡_UU],V^,¢S]SWeU
2) ;(: The Wheel is not balance (What we are trying to find)
(1) "7 &'&"8 &are KNOWN
Regression line: Straight line that fits the data the best (best fit line). Method to get it = Least Squares because it
3) @ + Default is 0.05 (unless stated in qn)
Building Confidence Interval:
minimizes the squared deviations between every observed y value & the corresponding prediction
4) Identify O, Calculate Probability under ;3, Calculate E (P*Total), Calculate
#G1H&"
Regression Equation: Unknown Population -> Y = œ(u : , œ3 : •,%|››‡›,/E,ž* / Known Sample -> žŸ + (u :,
$$#%&” ' # %& “( ) 9* * %a 6 b*+
4Y ) . 4Z . 5)
4- 4.
3
P +789 + ` QP +#F1(&*
)
1 Y) 8 Z & c
X ,X , ) ¥ Serve to give regression coenicients but also to predict numerical outcomes based on independent values
(> 6?7
0+ 0,
(> 6 7 )
¥ Sample Slope (b1): ,
6) Under ;3, Wheel is balanced
, $8 9'
0
7) Test:,P + + , >>>,?,dTVZVeS],fS]gU,
P +4K)789
Paired-Samples Test of Mean Dinerence Between 2 Dependent Samples:
h*,l\,W\Z,iUjUeZ,;3,
n*, d\We]g^V\W<, o_UTU, V^, W\Z, UW\gp_,
*Samples are related/dependent when matched/paired to characteristic (measurement) o Sample regression slope represent estimated expected rise in Y per unit rise in X. For every 1
= Critical Value
„ instead of u,
*The dinerence is 1 Sample => Perform 1 Sample Test on (D’s) [use —, „)
*Need to find Each Dinerence Score (D) of a pair
UqVXUWeU, Z_SZ, ¡_UU], V^, W\Z, ¢S]SWeUX9,
unit more/less in X, the rate of Y rises/drops by ___ unit
Z_UTU£\TU,¤_UU],V^,W\Z,gW¢S]SWeUXˆ
+> ?/ +> 6/
Sample Intercept (b0): %- ) . /A % ) = 01 . /A 23
When "@ is Known When "@ is Unknown
@ @
¥
(Almost Never) (Almost Always)
; 7!
: ; 7!
:
! "# ~ $%&'() + "#
Sample intercept says that for 0 unit of X, the average rate of Y is ____ unit which is
— —
o
~ &$-
<0 & <0 &
meaningful/not meaningful here
Predictions: Let Linear Regression Equation be represented by žŸ +
+(7
(u :,
X[ \[
3
$, $,
¥ 3
, ) 9* 4[
, ) $*;A B9 7[
˜46 ˜46
) ) ¥ We predict for X increase/decrease of _ unit will be __ unit on average
) , )
Downloaded by Annie Adora ([email protected])