Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
17 views3 pages

Statistic Cheat Sheet

This document is a cheat sheet for statistics, covering various topics such as data types, graphs, hypothesis testing, and confidence intervals. It includes formulas and methods for analyzing categorical and numerical data, as well as error types and sampling distributions. The content is tailored for students at Singapore Management University and serves as a quick reference guide for statistical concepts and calculations.

Uploaded by

Khanh Dương
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views3 pages

Statistic Cheat Sheet

This document is a cheat sheet for statistics, covering various topics such as data types, graphs, hypothesis testing, and confidence intervals. It includes formulas and methods for analyzing categorical and numerical data, as well as error types and sampling distributions. The content is tailored for students at Singapore Management University and serves as a quick reference guide for statistical concepts and calculations.

Uploaded by

Khanh Dương
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

lOMoARcPSD|15504527

Statistic Cheat Sheet

Introduction to statistic (Singapore Management University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Annie Adora ([email protected])
lOMoARcPSD|15504527

C01: Data Type, Graphs, 1-2 Variables C03: Categorical Data , Distribution (1 Sample) Multiple-Sample Test of Categorical Data: (chi-square table)
Chi-Square Test - Right Skewed Curve %,P +QP +789 * * No CLT
Univariate: Pie/Bar/Pareto Population Mean: # Sample Mean: #!
Constructing Confidence Interval: Calculate the Row & Column Total
1) Let 5 be true proportion of _______
Bivariate: Side by side Bar Population Size: N Sample Size: n
Identify O (Observed Data)
$/ = __% & n = __
Uni: Histogram/Box & Whisker
*Pareto: Cumulative % Line, Highest to Lowest, Variable
*
2) Given:
3) Since n$/ = ____ > 5 & n(1-$/ ) = ___ > 5, by CLT $/ is
Bi: Scatterplot *Histogram Shape: Uniform, Bell, :;<)=;>?@AB;@CD2)=;>?@
Focus on 80%-20% (weed out trivial) Right/Left Skewed Calculate E = ( EF?27)=;>?@
approximately normal
Categorical Numerical 4) For a 90% / 95% C.I for 5 is: Calculate
#G1H&"
I

%,$/ 6 , 7! 8
#)0# #(1)0# &
,9 ,$/ : , 7! 8 *= ( ____% , ____%)
#)0# #(1)0#& 1),;3<,>>>>>,R,>>>>>,STU,VWXUYUWXUWZ[,W\,TU]SZV\W^_VY
" 2 " 2
2) ;(< >>>>>,R,>>>>>,STU,TU]SZUX; related
3) @ + Default is 0.05 (unless stated in qn)
Continuous Discret
Measurable Countable
e 1 Sample Test of Hypothesis of 5 : Right Tail
1) ;3<,5 = , >>> Shaded Area (100% - 4 )
4) Since each E > 5 & since sample size is large enough,
* Left Tailed: ;3<,5 B , >>> ,;(< 5 C , >>>
2) ;(< 5 ? , >>> %a 6 b*+
* 2 Tailed: : ;3<,5 + , >>> ,,;(< 5 D , >>> P +789 + ` QP + #F1(&#J1(& *
c
Nominal Ordinal Interval Ratio Right or Left Tail Area ($/2)

3) @ + Default is 0.05 (unless stated in qn)


No Order Ordered No True 0 True 0

Reject,;3 when t+ < Critical Value


5)
4) Since n$/ = ____ & n(1-$/ ) = ___ > 5, by CLT P-Value

Interquartile Range: Q3-Q1


6) Under ;3, Independent
Numerical Univariate Data
$/ 6 ,5
7 +,
7) Test:,P + + , >>>,?,dTVZVeS],fS]gU,
! 85%A 6 5* .-() ,-()
*True Average Z ~ N(0,12)
5 h*,iUjUeZkl\,m\Z,iUjUeZ,;3,
Population Mean: Left Right

"#
Critical Value
n*, d\We]g^V\W<, o_UTU, V^, UW\gp_, UqVXUWeU,
* *

)*#
Population S.D:
5) Reject ;3 when P(Z > z) < @ +,0.05 (default) - Right Tailed P +4K)789 Z_SZ, >>>>, R, >>>>, STU, XUYUWXUWZ[, TU]SZUX, k,
* Left Tailed: Reject ;3 when P(Z < z) < @ +,0.05
$% ' VWXUYUWXUWZ[, W\, TU]SZV\W^_VY, SZ, >>>r,
+ 6) Under ;3<,5 + Value in (1) * 2 Tailed: Reject ;3 when P(Z >| z|) < + +,0.05
Sample Mean:
e\WsVXUWeU
)4 = Critical Value
*Sample Average

&'(
25% 7) Test: Substitute value into (4), Z = ___
Since P P(Z >___) = ____ (find p-value in table) < @ +,0.05
#$%&"
!" "# ' Range: Max - Min
Sample SD:
8) We reject ;3 (When P is high, Null will Fly)
$%#&' *Five Number Summary C04: Numerical Data , Continuous Distribution (1 Sample)
*Describe: Symmetrical/Left or Right Skewed / Do Not Reject (When P is low, Null will go)
9) Conclusion: There is enough evidence that ______ than that
Population Mean: !
Median: Arrange in order (Middle/Sum of 2 Middle)
Sample Mean: !"
*Degrees of Freedom: The final unknown number is
of _____ at 5% significance level automatically known by default as all numbers add up to
Mode: Most reoccurring
Population S.D: "
certain value. So (X-1) values are FREE to vary
hello
Do Not Reject - There is not enough evidence that ____ than ____ at ___% significance level Sample S.D: s
* 2 Types of Errors
Type I Error: Reject ;3 when ;3 is true (False Positive) Population Size: N Sample Size: n
Type II Error: Not Rejecting ;3 when ;3 is false (False Negative)
C02: Chance
Hello

Enect of Sample Size on Sampling Distribution of !"


Probability of 2 Faces: e.g., Gender/Coin

uGQv%w9 x y * when n B 30
L +
Classical Empirical C04: Categorical Data (2 Samples, Multiple Samples) MN
a) non-normal = Sampling Size (n) > 32 (Normal by CLT)
Base on Assumption (Fair & Independent) Base on Survey
b) non-normal = Sampling Size (n) < 30 (Not Normal)
Outcome equally likely Approximately Test for dinerence between proportions from 2 independent populations c) normal (Bell-Shaped) = For any (n) will still be normally distributed
},~•,€•‚ƒ•
%&
P(2B + 1G) = (B*B*G ), (B*G*B), (G*B*B) P(2B + 1G) = #$# * If no CLT, assume normality & not too skewed
Population Mean: 5( ,R,5+ Let 5( be true proportion of _______ & 5+ be true proportion of
! ! ! !
(B*B*G ) = # ! # ! # = " # #$

uGQv%w9 x y * 7 + LRMN uQv%w9 z +* 7+


Evidence: $/( = __ & E( = __ $/+ + , >>,,& E+ = __
! ! !
P(2B + 1G) = " # #$ +" # #$ +" # #$
Sample Mean: #!( ,R,#!+
L + OP1)Q O1)Q
OR
1),;3< 5( + 5+
L
MN

Sample Size: E( ,R,E+ 2) ;(< 5( D 5+, *Formulas can be manipulated to find z / w,|'0
3) @ + Default is 0.05 (unless stated in qn)
E.g., uQv%n9 A+)
To Find No. of Combo: nC2 = 10 Combo

Monty’s Paradox: Switch for x2 -> Chance of Winning $%&'()* +


()
,-&,$%&./'01* +
+ 4) Since E( = ____ & E+ = ___ is large enough by CLT.

P (X = 6) = P (Z < *,= P (Z < 6{* = 0.5 - 0.49865 = 0.00135


* *
,1)S

Unshaded Area (100% - 4 )


Simpson’s Paradox: The 1st variable & the 2nd variable are associated with the 3rd variable. The 3rd variable completely (
Separate Proportions Pooled Proportions
%,$/( 6 #/+ * 6 %5( 6 5+* Building Confidence Interval of !" (t table)
reverses the initial correlation between the 1st & 2nd. This reversal is hallmark of Simpson’s Paradox which materializes
%,$/( 6 #/+ * 6 %5( 6 5+*
F +, F +,
with a confounding/lurking variable.
%,$/(%A 6 ,$/(* %,$/+%A 6 ,$/+* A A
OR
8 : ,
~ N(0,1)
85G%A 6 5G*% : *
~ N(0,1) *Keywords: Confidence Interval / Interval Estimate

E( E+ E( E+
uG L 7! x y*
Right or Left
Parent Population Normal OR L Tail Area ($/2)
Conditional Probability: ‘A Given B’/ ‘event given condition’ N is large enough (normal by CLT) MN
"

> 2 Share Birthday: 1 - P(No One Shares) = 1 - *,- 2 *,- 2 3, 5) Reject ;3 when P(Z >| z|) < + +,0.05 Where 5G + , 2$12"
)4 THEN when % is known
*,- *,. 5 65

6) Under ;39 5( + 5+,H 5( 6 5+ + IJ


uG L '!K21( x y*
Parent Population Normal OR $ "
Parent Population NOT Normal AND
7) Test: P( Z > | z (Substitute Working) |) = P( Z > | __ |) K [RT Area -
/
Sample Size large enough (normal by CLT) n is large enough (normal by CLT) M2
THEN when % is known THEN when % is unknown
"
Bayes’ Rules: 4

- '!K21(
Area under z = p-value area (rejection region) < + ]
8) We reject ;3 *If n not large enough (<30, no CLT but Assume not too skewed)
+'!K21(
A’ (A’) " "
9) We have enough evidence that there is a significant dinerence
between true population of ___ & ___ Why: t distribution arrives with 1 condition, that sampled
population is normally distributed, or at least not too skewed
Interpret Confidence Interval of !"

We are __% confident that _____ is between uG L '!K21( x y. Since lower confidence limit for
/
For 90% / 95% C.I: = ( ____ , ____) " M2

___ is bigger/smaller than upper/lower limit of that for the ___, the true mean of ___ significantly
exceeds that of the ___.
,$/( %A 6 ,$/( * ,$/+%A 6 ,$/+ * A A
%,$/( 6 ,$/+* L , 74 MN : O %,$/( 6 ,$/+* L , 74 M5G%A 6 5G*% : *
,E( ,E+ E( E+
1 Sample Test of Hypothesis of,w : Right Tail
C.I for 5( 6 5+ Let w,be true mean of _______
+ +

Evidence: E = __ !, „ = __ & + , >>,, +'!K21(U) >> Reject ./ .0123.4.5.666


5)

Where ,$/( + , 2$ & ,$/+ + , 2" 1),;3< w, 6) Under ;3< w +T%


5 5 '
"

2) ;(< w,,
$ "

Since both limits are negative, the true proportion of ____ is smaller than ______. This is because 5( 6 5+ 3) @ + Default is 0.05 (unless stated in qn)
7) Test: Substitute working in (4) = __ >

4) Since E = ____ > 30, by CLT & since z,…E†E‡.E


8) We reject ;3
results in negative by between ___% and ___% with ___ confidence

$%
&' ()
*+,- 9) We have enough evidence that the true
mean ___ is ______
Downloaded by Annie Adora ([email protected])
lOMoARcPSD|15504527

Linear Regression (Continue): Testing Significance


Paired-Samples Test of Mean Dinerence Between 2 Dependent Samples: (Continue)

Let !@ be true mean of the price dinerence of _______ in Units


C05: Numerical Data , Continuous Distribution (2 Samples) Significance of Model
Evidence: ™ —= __ ™ —+=__ ,E] =__ — „ =__ š] + >>>
1),;3< B& + &=
1),;3<&!C B,k,= &=
Population Mean: !7 &'&!8 Sample Mean: !"(R,!" + 2) ;(< B& D &=
2) ;(< !C C,k,? &=
Population S.D: "7 &'&"8 & Sample S.D: &() R,&+
3) @ + Default is 0.05 (unless stated in qn)
3) @ + Default is 0.05 (unless stated in qn)
4) Since p-value = ___ <,@,¡U,iUjUeZ,;3,%¡_UTU,Y,D I9 $ ? I*
Sample Size: E() R,E+) 4) Assuming the paired price dinerences are not too skewed, & since "@ is unknown,
5)Conclusion: Strong evidence of a significant correlation between Y & that due to X. Given

; 7!
:
+ "#
2-Sample Test of Spreads of z(+& z++ (F-test, no CLT but need Normal Distribution) —
the sample correlation coenicient, thus conclude this is both significant & positive
~ &$-
<0 &
[ \ +(7
*Keywords: Test the equality of population variances of ___ & ___
Significance of Slope
Let "7 be true proportion of _______ be true s.d of __ in units & "8 be true s.d of __ in units
5

Evidence: !"(= __ E( = __ ,&ˆ ‰( , + , >>,,,R,,,!" + , + , >>, E+ = __ __ ,&ˆ ‰+ , + , >>, * &() ? , &+ 1),;3< C9 & + &=&+DE&FGEHI/
Unshaded Area (100% - 4 )
1),;3< "9 & + & ": 2) ;(< C9 & D &= (Yes Slope)
5)

2) ;(< "9 & D & ": Reject ;3 when to < ____ 3) @ + Default is 0.05 (unless stated in qn)
3) @ + Default is 0.05 (unless stated in qn) 4) Since p-value = ___ <,@,¡U,iUjUeZ,;3,%¡_UTU,,C7 D I9 C7 ? I*
6) Under ;3< !C + &=
Right or Left
4) Assuming Normality of ____ for ___ & ___, F = $& ~ Š!K) 2 1(V)))2
/ Tail Area ($/2)
/"))
5)Conclusion: Enough evidence of a significantly non-zero slope between Y & that due to X.
" & $& "'$
5) 7) Test: Substitute working in (4) = ___ < ___ As shown by sample correlation coenicient, we know this is a significantly positive/negative

6) Under ;3< "9 & + & ":. &&&(.%( ) %*


8) Do not Reject slope, which lies between __ & __, with ___% confidence
% :
)
9) Do not have enough evidence of a significant
7) Test: Substitute working in (4) = ___ < ___ diQerence between the ____ & ____ at ___% Significance of Intercept
- '!K21(
P-Value (Evidence F)
4 8) Do not Reject +'!K21( significance
4

1),;3< CD & + &=


+ " "
+ 9) Do not have enough evidence of a significant
2) ;(< CD & D &=
diQerence between the variances of ____ for
Š!K)&2$&1(V)))2"'$ ____ & ____ at ___% significance
Building Confidence Interval:
3) @ + Default is 0.05 (unless stated in qn)

$,
4) Since p-value = ___ <,@,¡U,iUjUeZ,;3,%¡_UTU,CE ,D I9 CD ? I*
"

, ) $*;A B9 7[
˜46
F > ___
2-Sample Test of Means of !7 &'&!8 (After f-test, Uses t-test, Assumed not too skewed in absence of CLT) ,
) 5)Conclusion: Enough evidence of a significantly non-zero true intercept in this regression
)

Interpret Confidence Interval of !"


*Keywords: Is there evidence, back up with C.I, on Average model, which falls between ____ & ____ with ___% confidence. It is meaningful/not
meaningful
Welch-Satterthwaite Pooled Since D falls within the C.I, there is no significant between ____ & ___ on average with a ____% Confidence
Separate Variances Variances C07: Sample Size Determination (More Tests)

(1) "7 &'&"8 &are UNKNOWN & (1) "7 &'&"8 &are UNKNOWN &
C06: Numerical Data , Continuous (Multiples Samples) :# 9#
(2) "7 &'&"8 are unequal or diQerent (2) "7 &'&"8 are equal Linear Regression:

%,‹"+ Œ,‹", *Œ%!- Œ!. * 4J %( . 4J ) . 1


4J ) 9$ 5 23 ) 9$ #/0 23 ) $$
Linear Correlation Coenicient (r):
! "# ~ !Žˆ• %,‹"+ Œ,‹", *Œ%!- Œ!. *
6-
(> 6?7
! "# ~#!’+•’,Œ“
6-
$8 9'$8 :' CI
M +) • ,) ,
/ , / ,
8•‘, %0 •0 *,
+ + 0
: - : :

<(> 6 , 7$8 9' )(> ? , 7$8 :' )


¥
0+ 0, + , , ,
0 0

Since › + + , >>>,is very close to +1, there exist a ________


4J %( . 4J )
Where
.
7 " )9$
Where

7 " # )9$ 5
¥

####01 2 = 3455#67#8497+#4+#:+ ######&/1 = W $%W) #


6-
$% #&'2 " 3)$%" #&'2") " linear correlation between Y & that due to X. The higher Error
W 3%" #1' the X, the higher the Y & vice versa.
Margin : - :

Coenicient of Determination (r2):


I, = , › + = A in %
Building Confidence Interval: Building Confidence Interval:

4J %( . 4J ) +9$ ( .–
¥

•+) , •,) , +9$ (


:

8"
* *
:

$$#%&” ' # %& “( ) $*;<=>* + ++&,-7 . & ,- 8 / 0& &$-$ ?-"(8 1238 + 5 //& 8"
Variation in Y that is explained by variation in X
# #
¥ Sample
’+ ’, 47 48 Bigger › , more variability explained by data in model (X)
:

2–
) :

2–
+ Size
) ¥
¥ xx% of variation in Y is explained by variability in X. This constitutes a very strong testimonial to the regression
model’s accounting for variability in Y. Only (100% - xx%) = __% of variability in Y is due to variables absent in

Very Rare: 2-Sample Z Test of !7 &'&!8


Goodness of Fit Test: Univariate Data / Chi-Square distribution
the model. Amount of variability in predictions this model yield is reduced by xx%

1),;3<,o_U,¡_UU],V^,¢S]SWeU
2) ;(: The Wheel is not balance (What we are trying to find)
(1) "7 &'&"8 &are KNOWN
Regression line: Straight line that fits the data the best (best fit line). Method to get it = Least Squares because it
3) @ + Default is 0.05 (unless stated in qn)
Building Confidence Interval:
minimizes the squared deviations between every observed y value & the corresponding prediction
4) Identify O, Calculate Probability under ;3, Calculate E (P*Total), Calculate
#G1H&"

+&6- + 7&6- , /7+!A7!–/


! "# ~ $%&'() *
I

Regression Equation: Unknown Population -> Y = œ(u : , œ3 : •,%|››‡›,/E,ž* / Known Sample -> žŸ + (u :,
$$#%&” ' # %& “( ) 9* * %a 6 b*+
4Y ) . 4Z . 5)
4- 4.
3
P +789 + ` QP +#F1(&*
)
1 Y) 8 Z & c
X ,X , ) ¥ Serve to give regression coenicients but also to predict numerical outcomes based on independent values

(> 6?7
0+ 0,

Reject,;3 when t+ < Critical Value


$8 9'$8 :'
0

(> 6 7 )
¥ Sample Slope (b1): ,
6) Under ;3, Wheel is balanced
, $8 9'
0
7) Test:,P + + , >>>,?,dTVZVeS],fS]gU,
P +4K)789
Paired-Samples Test of Mean Dinerence Between 2 Dependent Samples:
h*,l\,W\Z,iUjUeZ,;3,
n*, d\We]g^V\W<, o_UTU, V^, W\Z, UW\gp_,
*Samples are related/dependent when matched/paired to characteristic (measurement) o Sample regression slope represent estimated expected rise in Y per unit rise in X. For every 1
= Critical Value
„ instead of u,
*The dinerence is 1 Sample => Perform 1 Sample Test on (D’s) [use —, „)
*Need to find Each Dinerence Score (D) of a pair
UqVXUWeU, Z_SZ, ¡_UU], V^, W\Z, ¢S]SWeUX9,
unit more/less in X, the rate of Y rises/drops by ___ unit

Z_UTU£\TU,¤_UU],V^,W\Z,gW¢S]SWeUXˆ
+> ?/ +> 6/
Sample Intercept (b0): %- ) . /A % ) = 01 . /A 23
When "@ is Known When "@ is Unknown
@ @
¥
(Almost Never) (Almost Always)

; 7!
: ; 7!
:
! "# ~ $%&'() + "#
Sample intercept says that for 0 unit of X, the average rate of Y is ____ unit which is
— —
o

~ &$-
<0 & <0 &
meaningful/not meaningful here
Predictions: Let Linear Regression Equation be represented by žŸ +
+(7

(u :,
X[ \[
3

For X of __ unit, substitute values into žŸ +


5 5
(u :,

$, $,
¥ 3

, ) 9* 4[
, ) $*;A B9 7[
˜46 ˜46
) ) ¥ We predict for X increase/decrease of _ unit will be __ unit on average
) , )
Downloaded by Annie Adora ([email protected])

You might also like