Chapter 2
Statistical modeling & introduction to probability
Population vs samples
We collect data on a sample,
but we make statements
(inferences) about the
population
What is a statistical model?
• A mathematical equation(s) having at least one variable exhibiting
stochastic (i.e. probabilistic) variation to represent the inherent
uncertainty of observing its potential values.
• Response variable (Y) that are predicted by explanatory variables (X)
Y ~ X …. Y ~ X1 + X2 + X3
• Explanatory variables can be numeric, categories denoting groups, or a
combination of both (i.e. interactions).
• Statistical models typically carry assumptions
• Response variables assumed to be random variable we can’t know
everything, some unknowns, measurement error, randomness
• If no variability no need for statistics
• Models are built off probability distributions to help understand
relationships between response & explanatory variables
How do we know what distribution
to use with our data?
• Depends on the attributes and ‘shape’ of the response
variable
• Requires you to do lots of data exploration and ‘looking’
at data prior to navigating down statistical models
Distributions in R
• Let’s jump over….
Why Learn Probability?
• Nothing in life is certain. In everything we do, we
gauge the chances of successful outcomes, from
business to medicine to the weather
• A probability provides a quantitative description of the
chances or likelihoods associated with various
outcomes
• It provides a bridge between descriptive and
inferential statistics
Probabilit
y
Population Sample
Statistic
s
What is this thing called probability?
• Probability – principled way of quantifying uncertainty by
assigning plausibility or credibility to a set of mutually exclusive
possibilities or results of an experiment or observations
• Roots as an understand previous & potential gains tied to
gambling (1600s)
• Bernoulli – The Art of Conjecturing (1713)
• An index of uncertainty bounded between 0 and 1
• Expanded probability beyond gambling… human mortality, criminal
justice
• Human existence is existential lottery akin to a game of chance
The Probability of an Event
• The probability of an event A measures “how often” A
will occur. We write P(A).
• Suppose that an experiment is performed n times. The
relative frequency for an event A is
Number of timesA occurs f
n n
• If we let n get infinitely
large,
f
P ( A) lim
n n
Example
• A bag of M&Ms contains 25 candies:
• Raw Data: m m m m m m m m m m
m m m m m m m m m m
m m m m m
• Statistical Table:
Color Tally Frequency Relative Percent
Frequency
Red mmm 3 3/25 = .12 12%
Blue mmmmmm 6 6/25 = .24 24%
Green mm mm 4 4/25 = .16 16%
Orange mmmmm 5 5/25 = .20 20%
Brown mm m 3 3/25 = .12 12%
Yellow mmmm 4 4/25 = .16 16%
The Probability of an Event
• P(A) must be between 0 and 1.
• If event A can never occur, P(A) = 0. If
event A always occurs when the experiment
is performed, P(A) =1.
• The sum of the probabilities for all simple
events in S equals 1.
•• The
The probability
probability of of an
an event
event A
A
is
is found
found by
by adding
adding thethe
probabilities
probabilities of
of all
all the
the simple
simple
Finding Probabilities
• Probabilities can be found using
• Estimates from empirical studies
• Common sense estimates based on equally likely
events.
• Examples:
–Toss a fair coin.P(Head) = 1/2
– Suppose that 10% of the U.S. population
has red hair. Then for a person selected at
random, P(Red hair)
= .10
Using Simple Events
• The probability of an event A is equal to the
sum of the probabilities of the simple events
contained in A
• If the simple events in an experiment are
equally likely, you can calculate
n A number of simple events in A
P ( A)
N total number of simple events
Example 1
Toss a fair coin twice. What is the probability of
observing at least one head?
1st Coin 2nd Coin Ei P(Ei)
H
H HH
HH 1/4 P(at
P(atleast
least11head)
head)
H
H 1/4 =
TT HT
HT =P(E
P(E1))+
1 +P(E
P(E2))+
2 +P(E
P(E3))
3
1/4 =
=1/4
1/4+
+1/4
1/4+
+1/4
1/4=
=3/4
3/4
H
H TH
TH 1/4
TT
TT TT
TT
Example 2
A bowl contains three M&Ms®, one red, one blue
and one green. A child selects two M&Ms at
random. What is the probability that at least one is
red?
1st M&M 2nd M&M Ei P(Ei)
m RB 1/6
m RB
m 1/6 P(at
RG
RG P(atleast
least11red)
red)
1/6 =
m BR =P(RB)
P(RB)+ +P(BR)+
P(BR)+P(RG)
P(RG)+
+
m BR 1/6 P(GR)
P(GR)
m
BG 1/6 =
=4/6
4/6=
=2/3
2/3
BG
m 1/6
m GB
GB
m GR
GR
Example 3
The sample space of throwing a pair of dice is
Example 3
Event Simple events Probability
Dice add to 3 (1,2),(2,1) 2/36
Dice add to 6 (1,5),(2,4),(3,3), 5/36
(4,2),(5,1)
Red die show 1 (1,1),(1,2),(1,3), 6/36
(1,4),(1,5),(1,6)
Green die show 1 (1,1),(2,1),(3,1), 6/36
(4,1),(5,1),(6,1)
Counting Rules
• Sample space of throwing 3 dice has 216 entries,
sample space of throwing 4 dice has 1296 entries, …
• At some point, we have to stop listing and start thinking
…
• We need some counting rules
The mn Rule
• If an experiment is performed in two stages, with
m ways to accomplish the first stage and n ways to
accomplish the second stage, then there are mn
ways to accomplish the experiment.
• This rule is easily extended to k stages, with the
number of ways equal to
n1 n2 n3 … nk
Example: Toss two coins. The total number of
simple events is:
2
22
2==4
4
Examples m
m
Example: Toss three coins. The total number of simple
events is: 2
22 222==
8
8 of simple
Example: Toss two dice. The total number
events is: 6
666== 36
36
Example: Toss three dice. The total number of simple
events is: 66 6
66 6= =
216
216
Example: Two M&Ms are drawn from a dish containing two
red and two blue candies. The total number of simple
events is:
4
43
3== 12
12
Operations on Sets: Union, Intersection, Difference
Adding
probabilities
Multiplying
probabilities
The Multiplicative Rule for Intersections
• For any two events, A and B, the probability that both A and B
occur is
P(A B) =
P(A B) = P(A)
P(A) P(B
P(B given
given that
that A
A
occurred)
occurred) =
= P(A)P(B|A)
P(A)P(B|A)
• If the events A and B are independent, then the probability
that both A and B occur is
P(A B) =
P(A B) = P(A)
P(A)
P(B)
P(B)
Example
Suppose we decided to flip a coin three times in a row. We
know that the probability of tails is 50%. We know our two flips
will be independent events. What is the probability that it we
get tails – tails?
Ok, now we have a little bit of
basic probability…. Let’s get back
to the chapter
Linking probability with statistics
• Fisher 1922 – coined the term parameter, statistic, variance,
sufficiency, consistency, information, estimation, maximum
likelihood estimation, and optimality
• Developed maximum likelihood estimates – a universal
method for parameter estimation applicable even to samples
of moderate size and without the restrictive assumptions of
other existing methods, calculated precision of error
estimates (standard error), & developed modern hypothesis
testing
A new statical hypothesis paradigm
• ‘every experiment may be said to exist only to give the facts a
chance of disproving a null hypothesis’
• Used data to calculate a p-value – probability of observing
similar or more extreme data in a potential random samples
from the same hypothetically infinite statistical population
• Why is 0.05 the universal metric? a beef between Fisher &
Pearson…
• Fisher was a total force baysian methods went by the
wayside
• Until 1990s.. Computers mixed with Monte Carlo Markov Chain
algorhithms allowed to explore parameter space