Statistics class 2
Example my drive time from home to USF has a mean of 30 minutes and
a std deviation of 2 minutes
a. If nothing is known about the shape of the drive times, what
proportion of times are less than 24 minutes?
Answer: at most 11
b. Assuming a mound-shaped symmetric distribution. What proportion
of drive times falls between 32 and 36 minutes?
Answer: Approximately 16 percent
Modified Empirical Rule – works for skewed data sets
Approximately 60 to 90% of the data fall between x+-s
Approximately 90-100% fall between x+-2s
Approximately 100% fall between x+-3s
We use these rules to interpret the std deviation by explaining where we
expect most of the data to dell
-For(modified) Empirical rules: we expect most to fall between x+-2s
-For Chebyshev: we expect most to fall between x+-3s
Measures of relative standing
Ex. ACT vs SAT
1. Percentiles – give the percent of the data in the distribution that fall
below a particular value.
Ex. 75th percentile – upper quartile (75% fall below and 25% fall above)
25th percentile – lower quartile (25% fall below and 75% fall above)
50th percentile -median
Z-scores => Z = (x-m)/sigma
The z-score tells us the number of std deviations on observation falls from
the mean, and in which direction
Ex. Drive Times: x=30; s=2
X=27; x= (27-30)/2 = -3/2 = -1.5
X=38; z= (38-30) = 8/2 = 4
Suppose 33 minutes is the 78th percentile.78 percent of my drive time is
less than 33 minutes and 22 percent exceed.
Outliers
- Outside x+-2s for mound-shaped data
- Outside x+-3s for all other data
Outliers indicate a problem with the data:
1. Observation is miscoded
2. The observation comes from a population different than the one
specified
3. Observation is the result of a rare chance
Ex. Poker Chip
Population: 95% red
5% white
Pick; red
Methods of Detecting Outliers
1. Z-score method: calculate the z-score for an observation and
considering the correct rule to use, determine if it is an outlier
Chebychev: Outliers if z>=+-3
(Modified) Empirical: Outlier if z>=+-2
2. Box Plot Metho: a graphical method that uses the quartiles of the
data as the basis for identifying outliers.
Statistics Notes class 3
Probability
Experiment - the process of making an observation.
Sample Points - most basic outcomes of an experiment.
Event – an outcome of the experiment; a collection of sample points
Probability Rules
1. 0<=P(x)<=1
2. SIGMA p(x) =1
Symbols:
Union (or probability) – P (A or B) = P (A U B)
Intersection (and probability) – P (A and B) =P (A^B)
Conditional Probability: P (A | B); Given that event B has occurred, what is
the probability that event A occurs
Mutually Exclusive Events – two events are mutually exclusive if they
cannot occur at the same time
Independent Events – the outcome of one event does not change or affect
the probability of observing another event
Complement – the complement of event A is the event that A does not
occur
Ex. Birthday Problem
What is the probability at least two of us have the same birthday?
Probability Tables
Car Inventory
Color
a. What is the probability that a randomly selected car has a manual
transmission? =30/50
b. What proportion of cars are red or blue? = (11+15)/50=26/50
c. What proportion of cars are black or have a manual transmission?
d. What proportion are red and have automatic transmission? = 7/50
e. Given that the cat has a manual transmission, what is the
probability it is red? = 4/30
Statistics Class Notes 3 (Week 5)
Random Variables and Probability Distributions
Claim: 30% of all college students have a tattoo.
We randomly sample five college students and find one with a tattoo
What can we say about the claim?
Outcomes: NNNTN; NNNNN; TNNNN; NTNTN. How many outcomes: 32
that’s 2^5
What if n=50? 2^50 =1.126*10^15
Random Variable – a variable that assigns a number to every outcome
of the experiment. x= the number with a tattoo.
n=5: x=0,1,2,3,4,5
n=50: x=0,1, 2, … ,50
Two Types:
1. Discrete - the variable assigns a “countable” number of values to
the outcomes of the experiment
Ex. The number of students with a tattoo; Exam 1 grades; The
number of phones that ring during class.
2. Continuous - the variable can assign any value in an interval of
values.
Ex. My drive time to USF, Heights of Students.
Probability Distribution – a table, graph, or formula that gives the
probability of observing each value of the random variable.
Must Follow
1. 0=<p(x)=<1
2. Sigma p(x)=1
Ex. n=5 students, (assume p=0.3)
Expected Value of a Discrete Random Variable
The expected number of children for a US Female is 1.78
The average SAT score of USF freshmen is 1306
Expected value of X: E(x) = Mue (m) = Sigma xp(x)
Ex. n=5 students
Expected number with a tattoo: 1.5
Expected values should be interpreted as long-term averages, not as
values we expect to observe on a single attempt.
Binomial Probability Distribution
Criteria:
1. The experiment consists of “n” identical trials.
2. There are two outcomes, success and failure, possible for each trial.
3. The probability of success = P; The probability of failure = 1-P
4. The trials are independent.
The binomial random carriable is:
X = the number of successes in the “n” trials (x=0,1,2, …, n)
Formula:
The left part: The number of outcomes with x successes and (n-x) failures
The right part: the probability of each outcome that has x successes and
(n-x) failures
Mean: M=np; Std.Dev: T=sqr root (np(1-p))
Ex. Suppose you decide to guess on every question on a 20-question
true/false exam. We are determining the number of questions you guess
correctly.
a. Is this a binomial?
- n = 20 questions
- Success = guess correctly
- P(success) = 0.5 = p => (1-p) =0.5
- Questions are independent
X = the number of questions guessed correctly
b. Find the probability that you guess exactly half the questions
correctly.
P (x=10) = (20! /10! (20-10)!) * (.5) ^10 * (1-0.5) ^20-10 =
0.176197
c. Find the probability that you pass the exam with at least a C.
P(x>=14) = P(14) + P(15) + … + P(20)
It’s Tough!!!
Cumulative Binomial Probabilities
- Software or Table provide
P(x=<k) for certain values of n and p
- Get clever! Re-think the binomial problems in terms of “=<” probabilities
Ex. Binomial: n=15, p=0.4
a. P(x=<8) = 0.905
b. P(x>=4) = P(x=<15) or 1 – P(x=<3)
Binomial: P(x=10) = P(x=<10) – P(x=<9)
c. P(x>=14) = P(x=<20) – P(x=<13)
d. P(4<x=<14) = P(x=<14) – P(x=<4)
Excel: Use Binoomial.Dist Function
Binom.Dist(k,n,p,True/False); True-less or equal to, False-equal to
Ex. Mars Inc. claims that 40% of their M&Ms are red. To test this claim, we
are going to randomly sample 20 M&Ms.
a. Find the probability that at least 7 but fewer than 12 of the M&Ms
are red
Success = red: n=20, p=0.4
P(7=<x<12)
Ex. Find the probability that more than 7 are not red
Make success “not red” and draw the dots. Or P(x=<12)
Statistics class notes 4
Examples
a. P(Z=<1.00)=0.8413
b. P(Z>0.32)= 1-0.6255=0.3745
P(-1.50<Z<0.610) =P(Z=<0.61)-P(Z=<-1.5)
=0.6623
Find the z-score, Zo, such that:
d. P(Z=<Zo)=0.8944
Zo=+1.25
e. P(Z<Zo)=0.1056
Zo=-1.25
f. P(Z>Zo)=0.9582; Zo=-1.73
We can use the standard normal distribution to solve all normal curve
questions.
Key: Z=(z-m)/standard deviation
Working with normals in Excel
Norm.Dist(x,m,sd,True)
- returns the”<=” probability for x
Norm.Inv(prob.;m;st)
Ex. the GPAs of students follow a normal distribution with m=2.8 and
sd=0.4
a. What proportion of students have GPAs above 3.0?
Method 1: Convert to z-score; solve using z-table
Z=(x-m)/sd;
z=(3.0-2.8)/0.4=0.50=0.50;1-0.6915=0.3085
Method 2: Use Excel (I love Excel)
=1-Norm.Dist(3.0;2.8;0.4;True)=0.3085
b. Find the proportion of students with GPAs between 2.0 and 2.5
Z1=-0.75
Z2=-2.00
=Norm.Dist(2.5;2.8;0.4;True)- Norm.Dist(2.0;2.8;0.4;True)
c. Identify the GPA that identifies the lowest 9% of student GPAs.
Method 1: use the “<=” probability and the cumulative normal table
to find Zo. Zo= -1.34
Solve for Xo=m+ZoSt=2.264
Method 2: Use Excel
=Norm.Inv(.09;2.8;0.4)
Assessing the normality of data
1. Chapter 2 plots – construct a histogram and/or stem-and-leaf plot of
the data (don’t be too judgy!!)
2. Empirical Analysis – create the x +-s, x+-2s, x+-3s intervals.
Compare the percentage of your data in these intervals to the
Empirical rule’s 68%, 95%, and 100%. (pay particular attention to
the 68%)
3. *Calculate the value of IQR/s=(Q3-Q1)/s. The closer it is to 1.3, the
more normal your data is.
4. Construct a normal probability Plot. The straighter the plot, the more
normal the data is.