Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
31 views61 pages

Probability Distributions Course

The document covers topics on discrete probability distributions including the binomial and Poisson distributions. It provides examples and formulas for calculating the mean and variance of these distributions. Worked examples are given to demonstrate calculating probabilities using the binomial distribution.

Uploaded by

donbradman334
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views61 pages

Probability Distributions Course

The document covers topics on discrete probability distributions including the binomial and Poisson distributions. It provides examples and formulas for calculating the mean and variance of these distributions. Worked examples are given to demonstrate calculating probabilities using the binomial distribution.

Uploaded by

donbradman334
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

CE23216 Numerik 1

Semester 2023/1

Week 12

Rehan Hussain, Ph.D.

1
Topics covered in Part 1

Week Title
Week 9 Introduction to statistics and probability
Week 10 Descriptive statistics and sampling techniques
Week 11 Probability theory
Week 12 Discrete and continuous probability distributions
Week 13 Variance, co-variance, and correlation
Week 14 Statistical inference methods
Week 15 Statistical analysis using Octave/MATLAB
Week 16 UAS

2
Outline

What's in this week’s lectures?

1. Discrete probability
distributions

2. Continuous probability
distributions

3. Sampling distribution

3
Topic 1: discrete probability
distributions

4
Mean and variance of a probability
distribution

— As seen earlier, the mean is a measure of the center of a sample, while the
variance is a measure of its dispersion or variability.

— If X is a discrete random variable with probability mass function f(x), the mean
and variance of x are defined as follows:

𝜇 = # 𝑥" . 𝑓(𝑥" ) = 𝐸(𝑋)

𝜎 ! = # 𝑥" − 𝜇 ! . 𝑓 𝑥" = # 𝑥" ! . 𝑓 𝑥" − 𝜇! = 𝑉(𝑋)

5
Mean and variance of a probability
distribution
Recall the 3-coin-tosses example with X = number of heads

𝑥 𝑓(𝑥) 𝑥𝑓(𝑥) (𝑥 − 𝜇)2𝑓(𝑥)


0 1/8 0 (-1.5)2(1/8)
Mean:
1 3/8 3/8 (-0.5)2(3/8)
2 3/8 6/8 (0.5)2(3/8)
3 1/8 3/8 (1.5)2(1/8)

Variance:

6
Bernoulli trials

— Consider the following random experiments:

1. Flip a coin 10 times. Let X = number of heads obtained.

2. A worn machine tool produces 1% defective parts. Let X = number of defective


parts in the next 25 parts produced.

3. In the next 20 births at a hospital, let X = the number of female births.

— In all of these examples, note that there are only two possible outcomes to the
random experiment. Such an experiment is known as a Bernoulli trial.

7
Bernoulli trials

— Bernoulli trials are named after the Swiss mathematician


Jacob Bernoulli (1654–1705). His work on probability was
published posthumously, in 1713.

— The two outcomes of a Bernoulli trial are often labelled as


‘success’ and ‘failure’, but these labels do not necessarily
imply that one outcome is more desirable.

Jacob Bernoulli

8
Types of discrete probability
distributions
1. Uniform: probability is equally distributed

2. Binomial: number of successes in n trials

3. Geometric: number of trials until a success or failure is observed

4. Hypergeometric: sampling without replacement in a small population

5. Negative binomial: number of trials necessary to observe r successes

6. Poisson: number of successes in an interval of fixed length (e.g. time, distance,


or area).

9
The binomial distribution
• If a random experiment consists of n Bernoulli trials such that:

1. The trials are independent


2. The probability of a success in each trial, denoted as p, remains constant

• Then, the random variable ‘X = number of successful trials’ is known as a


binomial random variable, with probability mass function as follows:

𝑛
𝑓 𝑥 =𝑃 𝑋=𝑥 = 𝑝 ' (1 − 𝑝)()'
𝑥
For 𝑥 = 0, 1, 2, … , 𝑛

10
Constructing a binomial distribution
— Let X = number of heads from flipping a coin 5 times. Hence, X is binomially
distributed with n = 5, p = 0.5.

𝑃 𝑥=0 0.35

5 0.3
= 0.5*(1 − 0.5)+)*
0 0.25

= 1,32 0.2

P(x)
0.15
…and so on. Hence, the probability 0.1
distribution is as shown in the figure:
0.05

Note that for p = 0.5, the binomial 0


0 1 2 3 4 5
distribution will be symmetrical
x

11
Worked Example 1: binomial
distributions
Each sample of water from taken a lake has a 10% chance of containing a particular
organic pollutant. In total, 18 samples were taken. The pollutant content is independent
between samples.
(a) Find the probability that, in the 18 samples, exactly 2 contain the pollutant.
(b) Determine the probability that at least 4 samples contain the pollutant.

Solution:
(a) 18 ! "#
𝑃 𝑥=2 = 0.1 0.9 = 0.2835
2
(b) 𝑃 𝑥 ≥4 =1−𝑃 𝑥 <4
= 1 − (𝑃 𝑥 = 0 + 𝑃 𝑥 = 1 + 𝑃 𝑥 = 2 + 𝑃(𝑥 = 3))
= 0.0982

12
Class Exercise 1: binomial
distributions
A marksman hits a target 80% of the time. He fires 5 shots at the target.

(a) What is the probability that exactly 3 shots hit the target?
(b) What is the probability that exactly 2 shots hit the target?

13
Binomial distribution table (cumulative)
• Since the binomial distribution contains only 2 distinct parameters, n and p, it can
be tabulated as follows:

15
Using the cumulative probability table
— We can use the cumulative probability table to find probabilities for selected binomial
distributions.
1. Find the correct value of n in the table.
2. Find the column for the correct value of p.
3. Remember, “c” gives the cumulative probability,
P(x £ c) = P(x = 0) +…+ P(x = c)
4. Hence, to calculate the probability at a fixed value, subtract as appropriate, e.g.:

𝑃 𝑥 =2 =𝑃 𝑥 ≤2 −𝑃 𝑥 ≤1

16
Using the cumulative probability table

For Class Exercise 1, the relevant data can be obtained from the cumulative probability
table as shown:

17
Class Exercise 2: using the binomial table

For the archer from Class Exercise 1, what is the probability c p = 0.80
that: 0 0.000
(a) more than 3 of his shots hit the target?
(b) less than 4 but more than 1 of his shots hit the target?
1 0.007
2 0.058
3 0.263
4 0.672
5 1.000

18
Mean and variance of the binomial distribution

— For a binomial experiment with n trials and a probability p of success for a


given trial,
𝜇 = 𝑛𝑝
𝜎 ! = 𝑛𝑝(1 − 𝑝)

— For example, the mean, variance, and standard deviation of the binomial
distribution with n = 4 and p = 0.1 are
𝜇 = 𝑛𝑝 = 4(0.1) = 0.4
𝜎2 = 𝑛𝑝(1 − 𝑝) = 4(0.1)(0.9) = 0.36
𝜎 = 0.36 = 0.6
20
The Poisson distribution
— The Poisson distribution (after the French mathematician
Siméon Denis Poisson, 1781-1840) is the discrete probability
distribution for the number of events occurring in a given time
period/area, given the average number of times , λ, the event
occurs over that interval. It has the following conditions:

1. An event can occur any number of times over the given time
period/area.
2. Events occur independently.
3. The average rate of occurrence is constant (that is, it does
not change with time). S. D. Poisson

4. The probability of an event occurring is proportional to the


magnitude of the time period/area.
21
The Poisson distribution

The probability mass function for the Poisson distribution is given by:

7, 8 -.
𝑓 𝑥 =𝑃 𝑋=𝑥 = 9!
For x = 0, 1, 2, …

The mean and variance of the Poisson distribution are given by:

mean, µ = 𝜆 and variance, σ2 = 𝜆

Note that the mean and variance of the Poisson distribution are equal!

22
The Poisson distribution
• To illustrate the Poisson probability distribution, consider the case where
𝜆 = 2.5:

𝜆! 𝑒 "# 2.5! 𝑒 "$.&


𝑃 𝑋=0 = = = 0.082
0! 0!
𝜆' 𝑒 "# 2.5' 𝑒 "$.&
𝑃 𝑋=1 = = = 0.205
1! 1!
𝜆$ 𝑒 "# 2.5$ 𝑒 "$.&
𝑃 𝑋=2 = = = 0.257
2! 2!
𝑃 𝑋 = 3 = 0.213

𝑃 𝑋 = 4 = 0.133

23
Worked Example 2: Poisson distribution
In the World Cup, an average of 2.5 goals are scored each game. Modeling this
situation with a Poisson distribution, what is the probability that
(a) 3 or fewer goals are scored in 2 games?
(b) more than 3 goals are scored in 2 games?

Solution:
𝜆 = 2.5 x 2 = 5 (a) 𝑃(𝑋 ≤ 3) = 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2) + 𝑃(𝑋 = 3)
+! > "# +$ > "# +% > "# +& > "#
= *!
+ ?!
+ @!
+ A!
Time period is now 2 games!
= 0.26503

(b) 𝑃(𝑋 > 3) = 1 − 𝑃(𝑋 ≤ 3)


= 0.73497

24
Topic 2: continuous probability
distributions

25
Continuous random variables

— A continuous random
variable is one which takes
values in an uncountable
set (meaning that it has an
infinite number of values).

— They are used to measure


variables such as height,
weight, time, volume,
position, etc.

26
Discretization of continuous random variables

• Suppose we measure the heights of students in this class. If we round to the nearest foot,
the resulting probability histogram (left) is coarse.
• Next, if height is measured to the nearest inch, the possible probability histogram (middle)
has more bins and much smoother appearance.
• As we continue to measure height more and more precisely, the resulting probability
histogram (right) approaches a smooth curve. However, there will always be some amount
of discretization when measuring random variables.

27
Probability density function (pdf)
• The probability density function of a continuous random variable
X, normally denoted as f(x), is the function such that:

1. 𝑓 𝑥 ≥ 0 (i.e., the function is


always non-negative)
%
2. ∫$% 𝑓 𝑥 𝑑𝑥 = 1 (i.e., the total
area under the curve is 1)
'
3. 𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = ∫& 𝑓 𝑥 𝑑𝑥
(i.e., the probability of x in the
interval [a,b] is given by the
area under the curve from a
to b.

28
Some features of pdfs

— Note that for a specific value of a continuous random variable,


𝑃(𝑥 = 𝑎) = 0
This is because the “area above a point” is a line segment and
hence has 0 area. Specifically, this means that:
𝑃(𝑥 < 𝑎) = 𝑃(𝑥 ≤ 𝑎)
𝑃(𝑎 < 𝑥 < 𝑏) = 𝑃(𝑎 ≤ 𝑥 < 𝑏) = 𝑃(𝑎 < 𝑥 £ 𝑏) = 𝑃(𝑎 £ 𝑥 £ 𝑏)

P(x < a) P(x > b)

a b
29
Probability calculation
The probability that a continuous random variable x lies
between a lower limit a and an upper limit b is

(cumulative area (cumulative area


to the left of b) to the left of a)

P(a < x < b) = P(x < b) – P(x < a)

30
Mean and variance of a continuous random variable

• The mean (μ or E(X)) and variance (σ2 or V(X)) of a


continuous random variable with pdf f(x) are given by the
following:

8
𝜇 = 𝐸(𝑋) = ∫78 𝑥 𝑓(𝑥)𝑑𝑥

! 8 !
𝑉 𝑋 =𝜎 = ∫78 𝑥−𝜇 𝑓(𝑥)𝑑𝑥
8 !
= ∫78 𝑥 𝑓(𝑥)𝑑𝑥 − 𝜇!

31
Class Exercise 3: Mean and variance of a continuous
random variable

Suppose 𝑓 𝑥 = 1.5𝑥 @ for −1 < 𝑥 < 1.

Determine:
(a) the mean
(b) the variance
of the continuous random variable X.

32
The normal distribution

— The most important continuous probability distribution of all is the normal


distribution.

— Due to its symmetrical, bell-like shape, it is also referred to as bell curve.


Another name for it is a Gaussian distribution.

— Its name comes from the fact that it comes up so often in science and nature
that it was considered by past statisticians to be the ‘default’ distribution!

— Its importance comes partly from the central limit theorem, which we will
discuss later on.

34
The normal distribution

• The pdf of the normal distribution with mean


𝜇 and variance 𝜎 ! is given by:

1 # %"& (
"
𝑓 𝑥 = 𝑒 $ ' −∞<𝑥 <∞
𝜎 2𝜋

35
The standard normal distribution

— The simplest case of a normal distribution is when 𝜇 = 0 and 𝜎 = 1. This is


called a standard normal distribution.

— In the standard normal distribution, the standard normal variable is denoted


as Z.

— Since the total area under the standard normal distribution is 1, the areas
under the curve correspond to probabilities, e.g.:
𝑃 −1 ≤ 𝑧 ≤ 1 = 0.682
𝑃 −2 ≤ 𝑧 ≤ 2 = 0.954
𝑃 −3 ≤ 𝑧 ≤ 3 = 0.997

36
The standard normal distribution

Note: Due to the symmetry


of the curve,

𝑃 (𝑍 ≤ 0) = 0.5

𝑃 𝑍 ≤ −𝑧 = 𝑃 (𝑍 ≥ 𝑧)

37
The standard normal distribution

• We can use the standard normal


probability table to look up the
values of 𝑃(𝑍 ≤ 𝑧):

• 𝑃 (𝑍 ≤ 𝑧) = Area under curve to


the left of z

• 𝑃 (𝑎 ≤ 𝑍 ≤ 𝑏) = [Area to left of
b] − [Area to left of a]

38
The standard normal distribution

• Note: first column gives


precision to 1 decimal
place, while subsequent
columns give precision to
2 decimal places!

• For example,

P(Z ≤ 1.37) = 0.9147

39
Worked Example 3: standard normal probability

Using the standard normal probability table:


(a) calculate 𝑃(−0.155 < 𝑍 < 1.60)
(b) Locate the value of z that sa9sfies P(Z > z) = 0.025

Solu9on: (a) From the table, we see that:


P(Z ≤ 1.6) = area to left of 1.60 = 0.9452

Note that the table only gives the accuracy of z to two decimal places.
Hence, to find P (Z ≤ −0.155), we need to interpolate linearly between
P (Z ≤ −0.15) and P (Z ≤ −0.16), which gives
P (Z ≤ −0.155) = 0.4384

Therefore,
P(−0.155 < Z < 1.60) = 0.9452 – 0.4384 = 0.5068
40
Worked Example 3: standard normal probability

(b) Since the total area is 1, the area to the left of z is


1 − 0.0250 = 0.9750.
From the table, we see that The marginal value with the tabular entry 0.9750 is
z = 1.96

41
Probability of normally distributed variables

• For any normally distributed variable X with 𝐸 𝑋 = 𝜇 and 𝑉 𝑋 = 𝜎 ! , we


can standardize it (i.e., convert it to the standard normal distribution) by
calculating the Z-score:

=7> @7>
∴ P(a ≤ X ≤ b) = P ≤ Z ≤
? ?

• This allows us to use the same table seen earlier to calculate the probability
of any normally-distributed variable.

42
Worked Example 4: standardizing a normal
probability
Given that X is normally distributed with µ = 60 and σ = 4, find 𝑃(55 ≤ 𝑋 ≤ 63).

Solution:
First, calculate Z:
x = 55 à 𝑧 = (55 − 60)/4 = −1.25
x = 63 à 𝑧 = (63 − 60)/4 = 0.75
Hence, P(55 ≤ X ≤ 63) = P(−1.25 ≤ Z ≤ 0.75)
Using the standard normal table,
P(−1.25 ≤ Z ≤ 0.75) = P(Z ≤ 0.75) – P(Z ≤ −1.25)
= 0.7734 – 0.1056 = 0.6678

43
Class Exercise 4: percentile of a normal
population
Data on sleep suggests that the population of hours of sleep per person per night can be modeled
as a normal distribution with μ = 7.2 hours and σ = 1.3 hours.

(a) Determine the probability of sleeping less than 6.5 hours.

(b) Find the the value of sleep hours which has a cumulative probability of 70%.

44
Central limit theorem
— One of the main reasons why the normal distribution is so important is because
of the central limit theorem, which states that:

J is the known as the sampling distribution of the mean.


— The distribution of 𝑋

46
Central limit theorem

The figure illustrates the central


limit theorem at work for an
asymmetric population
distribution. We can see that as
n becomes larger, the
distributions of 𝑋J increasingly
approach the symmetrical bell
curve shape.

47
Topic 3: sampling distribution

48
Sampling distribution
— A sampling distribution is the probability distribution of a statistic (such as mean
or variance) – i.e., it is the distribution of the statistic if we were to repeatedly
draw random samples from a population.

— Sampling distributions are used to calculate the probability that sample


statistics could have occurred by chance, and thus to decide whether
something that is true of a sample statistic is also likely to be true of a
population parameter.

— They are an important part of statistical inference methods, which will be


covered in more detail in Week 14.

49
Calculating a sampling distribution
— Consider a population consisting of three
housing units, where the value of X, the
number of rooms for rent in each unit, is
shown in the illustration.

— Let us select a random sample of size n = 2


(with replacement). The observations of X
A'BA(
in the sample are denoted as X1 and X2. :
𝑋= !
Hence, the sample mean, 𝑋,; is given by:

50
Calculating a sampling distribution

— The nine possible samples are


equally likely so, for instance,
@
𝑃 𝑋; = 2.5 = . Continuing in
B
this manner, we obtain the
; which is given
distribution of 𝑋,
in Table 2.

51
Calculating a sampling distribution
— Compare the population distribution, X, with the sampling
J
distribution of the mean, 𝑋:

— Note that the two distributions


are differently shaped – the
population follows a
rectangular distribution,
whereas the sampling
distribution is bell-shaped.

— Note also that the mean of


the sampling distribution is
equal to the population
mean. This is always true!
𝜇=3 𝜇)( = 3
52
Worked Example 5: sampling distributions of the mean
and median

A large population is described by the probability distribution

Let X1, X2, X3 be a random sample of size 3 from this distribution.


(a) List all the possible samples and determine their probabilities.
(b) Determine the sampling distribution of the sample mean.
(c) Determine the sampling distribution of the sample median.
(d) Calculate the mean and variance of the population and the sampling distribution of the mean.

53
Worked Example 5: sampling distributions of the mean
and median

Solution:
(a) First, list out each possible
sample. Since the sampling is
random, the probability of
drawing a particular sample is:

𝑃 𝑥! , 𝑥" , 𝑥# = 𝑓 𝑥1 . 𝑓 𝑥2 . 𝑓 𝑥3

To find the median, arrange the


sample in ascending order and
pick the middle value.

54
Worked Example 5: sampling distributions of the mean
and median

(b) To calculate the probabilities


of the means, add up the
probabilities for each sample with
the same mean.

(c) To calculate the probabilities


of the medians, add up the
probabilities for each sample with
the same median.

55
Worked Example 5: sampling distributions of the mean
and median
(d) The population mean is given by
𝜇 = K 𝑥𝑓(𝑥) = 6.9

The mean of the sampling distribution of 𝑋J is given by


𝐸 𝑋J = K 𝑥𝑓(
̅ 𝑥)̅ = 6.9

The population variance is given by


𝜎 ! = ∑ 𝑥2𝑓(𝑥) − 𝜇2 = 27.09
The variance of the sampling distribution of 𝑋J is given by
V 𝑋J = ∑ 𝑥̅ 2𝑓(𝑥)̅ − 𝜇2 = 9.03
27.09 𝜎!
= =
3 3

56
Sampling distribution for normal populations

— Hence, the z-factor for the sampling distribution is

𝑥̅ − 𝜇
𝑧= 𝜎
H 𝑛

and the probability of 𝑋J is given by


We can use the
standard normal
table as before!

57
Worked Example 6: probability of sample
mean
A random sample of size n = 16 is taken from a normal
distribution with 𝜇 = 10 and 𝜎 = 8. Determine the probability
that the mean is greater than 12.

Solution:
̅
K7> L! 7LM
𝑃(𝑥̅ > 12) = 𝑃 ! > #
H " H $%
= 𝑃(𝑧 > 1)
= 1 − 𝑃(𝑧 ≤ 1)
= 1 – 0.8413 = 0.1587
58
Class Exercise 5: probability of sample
mean
A soda filling machine is supposed to fill cans of soda with 12
fluid ounces. Suppose that the fills are normally distributed with
a mean of 12.1 oz and a standard deviation of 0.2 oz.
(a) What is the probability that a single can weighs less than
12 oz?
(b) What is the probability that the average fill for a 6-pack of
soda is less than 12 oz?

59
Problem Set 4

61
Questions 1 and 2

62
Questions 3 and 4

63
Question 5

Given a random variable X which is normally distributed with a mean of 70


and a standard deviation of 4.5, find:

(a) the probability that a value is between 65 and 80, inclusive.

(b) the probability that a value is greater than or equal to 75.

(c) the probability that a value is less than 62.

(d) the 90th percentile for this distribution.

64
Questions 6 and 7

65
Question 8

66

You might also like