Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views33 pages

Conditional

Conditional Probability problems

Uploaded by

cansuman44
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views33 pages

Conditional

Conditional Probability problems

Uploaded by

cansuman44
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

[Home]

Conditional Probability
Multiplication rule
Theorem of total probability
Rejection sampling
Bayes' theorem
Use of Bayes' theorem
Urn Models
What are they?
Why care?
Fallacies regarding conditional probability
Mistaking P (A|B)for P (B|A)

Simpson's Paradox
Monty Hall problem
Problems for practice

Conditional Probability
Probability that a coin toss would result in a head is a statement more
about our ignorance regarding the outcome than an absolute property of
the coin. If our ignorance level changes (eg, if we get some new
information) the probability may change. We deal with this
mathematically using the concept of conditional probability.

EXAMPLE 1: Here is a box full of shapes.


A box of shapes
I pick one at random. What is the probability that it is a triangle? The
answer is P (triangle) = . 5

12

Now, someone gives me some extra information: the randomly selected


shape happens to be green in colour. What is the probability of its being
triangle in light of this extra information?

Now my sample space is narrowed down to only the green shapes.

Narrowed sample space


Here the probability of triangle is different 2

7
.

We cannot use the same notation P (triangle) for this new quantity. We
need a new notation that reflects our extra information. The new
notation is P (triangle|green). We call it the conditional probability
of
the selected shape being a triangle giventhat it is green. ■

In general, the notation is P (A|B) where A, B are any two events. The
mathematical definition is just as it should be. Instead of the entire
sample space Ω you now narrow you focus down to only B. So A is now
narrowed down to A ∩ B. So P (A|B) actually measures the P (A ∩ B)
relative to P (B). Hence the definition is:
Definition: Conditional probability
If A, B are any two events with P (B) > 0 then
P (A ∩ B)
P (A|B) = .
P (B)

If P (B) = 0, then P (A|B) is undefined.

Theorem
Consider a probability P on some sample space. Fix any event B
with P (B) > 0. For all event A define P (A) as P (A) = P (A|B).
′ ′

Then P is again a probability.


Proof: We have to check that the three axioms are satisfied by P .


The first two axioms obviously hold! For the third axiom, let A 1, A2 , . . .

be countably many disjoint events. Then

P ((A1 ∪ A2 ∪ ⋯) ∩ B) P ((A1 ∩ B) ∪ (A2 ∩



P (A1 ∪ A2 ∪ ⋯) = =
P (B) P (B)

[QED]
::

EXERCISE 1: Show that if P (A|B) = P (A)then A, B must be


independent. Is the converse true? Be careful with the second part!

Multiplication rule
::

EXERCISE 2: Show that if P (A) > 0 then P (A ∩ B) = P (A)P (B|A).

This result is just a minor rearrangement of the definition. But it has an


intuitive interpretation. A ∩ B means both A and B has happened. We
are finding its probability in two steps: first the probability that A has
happened, P (A). Then, P (B|A), the conditional probability that B has
happened given that A has happened. This is often represented
diagrammatically:

This form is particularly useful when A, B are events such that A indeed
occurs before B in the real world. Here is an example.

EXAMPLE 2: A box contains 5 red and 3 green balls. One ball is drawn
at random, its colour is noted, and is replaced back. Then one more ball
of the same colour is added. Then a second ball is drawn. What is the
probability that both the balls are green?

SOLUTION: Notice that randomness enters in two stages, since there


are two random selections involved. Let A be the event that the first ball
is green, and B be the event that the second ball is green.

We are to find P (A ∩ B) = P (A)P (B|A).

What is the probability that the first ball is green? The answer is
. Before drawing the second ball, the composition of the box
3
P (A) =
8

has changed depending on the outcome of the first stage. This is where
conditional probability helps. Given that the first ball was green, we
know the composition of the box before the second drawing: 5 red and
3 + 1 = 4 green. So P (B|A) =
4
.
9

The final answer therefore is 3

8
×
4

9
=
1

6
.

It is instructive to check this by simulation.

balls = c('r','r','r','r','r','g','g','g')
event = c()
for(i in 1:5000) {
first.draw = sample(balls,1)
newballs = c(balls,first.draw)
second.draw = sample(newballs,1)
event[i] = (first.draw=='g' && second.draw=='
}
mean(event)

Run in cloud

Often, in case of multistage random experiments, it is easier to think


about the diagram than about the definition of conditional probability.

In a similar way, you can prove (by induction) the following theorem.

Multiplication rule
Let A 1, . . . , An , B be events such that P (A
1 ∩ ⋯ ∩ An ) > 0. Then

P (A1 ∩ ⋯ ∩ An ∩ B) = P (A1 )P (A2 |A1 )P (A3 |A1 ∩ A2 ) ⋯ P (B|A1

Theorem of total probability


Sometimes an event can occur via different paths. To find the probability
of such an event we need to add the probabilitis of all the paths. This
leads to the theorem of total probability.

Theorem of total probability


Let A 1, be mutually exclusive and exhaustive events, where
. . . , An

∀i P (A ) > 0. Let B be any event. Then


i

P (B) = ∑ P (Ai )P (B|Ai ).

1
Proof: The following diagram illustrates the situation.

Theorem of total probability


We need to add the probabilities from all the paths from Start to B. The
probability of a path is computed by multiplying the probabilities along
each of the arrows along it.

Now let's write down the formal proof.

Since A 1 ∪ ⋯ ∪ An = Ω,

hence B = B ∩ Ω = (B ∩ A 1) ∪ ⋯ ∪ (B ∩ An ).

Also, since A 's are disjoint, hence B ∩ A 's are disjoint as well.
i i

So P (B) = ∑ n

1
P (B ∩ Ai ) = ∑
n

1
P (Ai )P (B|Ai ), as required. [QED]
Rejection sampling
Suppose that ϕ ≠ A ⊆ B are finite sets. You have a list of all elements
of B. But you do not have a list of all elements of A. However, given any
element of B you can check if it is in A or not. In this case how can you
draw one element randomly from A?

One way is to use rejection sampling


. In this technique you draw one
element of B randomly. If it is in A, then stop and output that element.
Else, you again draw a random element from B (with replacement), and
continue like this.
This procedure is bound to terminate after a finite number of steps. The
output will be a random sample from A.

::

EXERCISE 3: How to choose between 5 friends with equal probability


using only a fair die? The following R code will give a hint.

repeat {
x = sample(6,1)
if (x<=5) break
}

Run in cloud

Bayes' theorem
Multi-stage random experiments are all around us. Many processes in
nature occur step by step, and each step involves some randomness.
Often the last layer of randomness is due to the measurement error.
Bayes' theorem is a way to "remove" this last layer to look deeper.

The theorem of total probability lets us move forward along the arrows,
while Bayes' theorem lets us move backwards.

Bayes' theorem (version 1)


Let A, B be any two events with P (A), P (B) > 0. Then

P (A)P (B|A)
P (A|B) = .
c c
P (A)P (B|A) + P (A )P (B|A )
Proof: First think of the formula in terms of the following diagram. The
denominator is the probability of reaching B from Start. The numerator
is the probability of only the red path.

The proof is very simple:

P (A ∩ B) P (A)P (B|A) P (A)P (B|A)


P (A|B) = = =
c
P (B) P (B) P (A)P (B|A) + P (A )P (B

as required. [QED]
Bayes' theorem (version 2)
Let A , . . . , A be mutually exclusive and exhaustive events. Let B
1 n

be any event. We assume P (A ), . . . , P (A ), P (B) > 0. Then for


1 n

any k = 1, . . . , n,

P (Ak )P (B|A)
P (Ak |B) = n
.
∑ P (Ai )P (B|Ai )
i=1

::

EXERCISE 4: Look at the following diagram and write down the proof.

More general form of Bayes' theorem


The main idea behind Bayes' theorem goes beyond these two versions.
Whenever, you can draw an arrow diagram connecting events, and know
all the labelling probabilities, you can apply Bayes' theorem.

Use of Bayes' theorem


EXAMPLE 3: I live in a locality where burglary is uncommon. The
chance that a burglar breaks into my house is 0.1. I have a dog that is
highly likely to bark (say, with 0.95 probability) if a burglar enters.
However, otherwise my dog is a quiet one. If there is no burglar around,
he barks with probability only 0.01. I hear my dog bark. What is the
chance that a burglar has entered?

SOLUTION: Let A = {burglar has entered } and B = {dog barks}.

We are given that


c
P (A) = 0.1, P (B|A) = 0.95, P (B|A ) = 0.01.

So we get the following diagram.

We want to find P (A|B). To apply Bayes theorem we need to find


P (B).

c c
P (B) = P (A) ⋅ P (B|A) + P (A ) ⋅ P (B|A )

= 0.1 × 0.95 + (1 − 0.1) × 0.01

= 0.104

Now apply Bayes theorem to get


0.1 × 0.95
P (A|B) = = 0.913.
0.104

Diagrammatically, you can think like this. To find P (B), we consider all
paths from start to B. Multiply the probabilities along each path and add.
Thus P (B) = 0.1 × 0.95 + 0.9 × 0.01 = ⋯ Similarly to find (A ∩ B)
add the probabilities of all the paths from start to B through A.

Here P (A ∩ B) = 0.1 × 0.95.

So now you can find P (A|B) = P (A∩B)

P (B)
. ■

This is an example of a two stage random experiment. The first stage is


whether a burglar enters or not. The second stage is whether the dog
barks or not.

As in the above example, a typical problem starts by telling you


unconditional probability of the first stage, and the conditional
probability of the second stage given the first. Only the outcome of the
second stage is observed, and the problem is to find the conditional
probability of the first stage given the outcome of the second stage.

The same approach is applicable to any similar multistage experiment.

Urn Models
What are they?
An urn model is a multistage random experiment. It consists of one or
more boxes (called urns), each containing coloured balls (balls are all
distinct, even balls having the same colour). Balls are drawn at random
(using SRSWR or SRSWOR) and depending on the outcome, some balls
are added/removed/transferred. Then again a few balls are drawn, and
so on. Here is one example.
EXAMPLE 4: An urn contains 3 red and 3 green balls. One ball is drawn
at random, its colour noted, and returned to the urn. Then another ball
of the same colour is added to the urn. Then the same process is
repeated again and again. The possibilities grow like this:

Typical questions of interest here are:

1. What is the probability that at the 10-th stage we shall have 12 red
and 4 green balls?
2. What is the probability that the ball drawn at stage n is red?
3. Given that we have exactly 6 red balls at the 9-th stage, what is the
(conditional) probability that we had exactly 4 red balls at the 6-th
stage?

All such questions may be answered by using the theorem of total


probability and Bayes' theorem. By the way, one of the above three
questions may be answered immediately. Which one? What is the
answer?

The above urn model is an example of the Polya Urn Model , where in
general we start with a red and b green balls, and at each stage a random
ball is selected, replaced and c more ball(s) of its colour is(are) added.

Why care?
You may see this link for further discussion. Some real life scenarios can
be mathematically treated as urn models.
We shall discuss two such examples.

EXAMPLE 5: Most people form their opinions based on random


personal experience, instead of a carefully planned overall survey of a
situation. Polya's urn model is a simple version of this, as the following
story shows.

An American lady comes to India. She has heard about the unheigenic
condition prevaling here, and is apprehensive about flu. Well, as luck
would have it, on her way from the airport she meets a man suffering
from flu. "Oh my," she shudders, "so the rumour about flu is not
unfounded, it seems!". The very next day her city tour is cancelled,
because the guide is down with flu. "What a terrible country this is!", the
lady starts to worry, "It is full of flu!" So imagine her panic when on the
third day she learns that a waiter in the hotel has caught the disease.

Now here is the story of another American visitor to our country. He is


also apprehensive of flu. But on the first day he does not meet any flu-
case. "May be this fear of flu in India is a rumour after all," he thinks
with some relief at the end of the day. The next day passes, and still he
does not meet a single person with flu. He is now quite confident that the
apprehension about flu is not serious. When yet another day further
supports his optimistic belief, he starts thinking that the expensive flu-
vaccine he took back home was a wastage of money.

Which of these two view points is reasonable? Neither. They both


formed their own ideas based on their personal random experience. The
true prevalence of flu in India is the same for both of them, but their
personal beliefs about it are drastically different.

Polya's urn model captures this idea. A red ball means fear of flu, a
green ball means the opposite. Initially they were equal in number. The
lady met a flu case on day 1 (i.e., randonly selected a red ball), and her
fear deepened (one more red ball added). The man did not meet any flu
case in day 1 (green ball selected), so his courage increased (one more
green ball added). Yet, what is the chance of selecting a red ball at stage
1? It is still same as stage 0 (ie, the true prevalence rate of flu has not
1

changed from stage 0).

This model also demonstates a common phenomenon: once you


randomly select balls of a certain colour in the first few stages, the
(conditional) probability of selecting more balls of that colour increases.
Indeed, people who has met more good people in their childhood tend
to see more good people around them. Similarly, people who has met
more bad people during their childhood are more likely to find faults
with everybody.

However, one must understand that the real situation is far too complex
to be captured adequately by Polya's urn model. ■

Here is another real life situation captured by urn models.

EXAMPLE 6: In the Ehrenfest model of heat exchange physicists


consider two connected containers with k particles distributed between
them. At each step a particle is chosen at random and transferred to the
other container. The question is: What is the distribution of particles at
the n-th stage. This may be thought of as follows: one urn contains k
balls some of which are red and the rest green. A ball is drawn at
random, removed, and another ball of the opposite colour is added.
Here red balls play the role of particles in the first container, and green
balls those in the other. ■

Fallacies regarding conditional


probability
Conditional probabilities are often used wrongly in our everyday life.
Here are three examples.
Mistaking P (A|B) for P (B|A)

Parents of most prospective candidates for ISI admission wonder: "Does


a particular coaching centre increase the chance of admission to the
ISI?" Stated in terms of probabilities this is a question involving P (A|B)
where A is that a (randomly selected) student gets admitted to ISI, and
B is that the student went to that coaching centre.

Most parents go about guessing P (A|B) as follows. They would enquire


from successful students from the previous years if they had studied at
that coaching centre or not. When they hear that out of the 90% students
came from that centre, they are impressed about its performance.

Is this decision logically valid?

No, what the parents learned from their survey was that P (B|A) is
large. This does not imply in any way that P (A|B) is large. They should
have surveyed the coaching goers and figured out the proportion that got
admitted. This proportion could have been (and most often is)
microscopically low.

Simpson's Paradox
Suppose that A 1, A2and B are three events such that
|B) and also P (A |B ) < P (A |B
c c
P (A1 |B) < P (A2 1 2 ).

Can you conclude from this that P (A 1) < P (A2 )? (Think before
clicking here.)

Now consider the following real life data set.


It is about number of death penalties given for murder cases. The cases
have been classified by three factors:

the race of the victim (i.e., the person murdered): white or black
the race of the defendant (i.e., the person accused): white or black
whether death penalty was given: yes or no.

The red and green parts give the actual data, the remaining numbers are
derived from them. For example the 11.3 is obtained as 53/(53 + 414).
The blue part is obtained by adding the red and green parts. For
example, 414 + 16 = 430.

Now consider the cases where the victim is white (the red part in the
table). Notice that for white defendants 11.3% got a death penalty, while
for black defendants the percentage is 22.9%. Thus if

A1 denotes the event "White defendant gets death penalty"


A is the event that "Black defendant gets death penalty",
2

B is the event that "the victim is white",

then we infer P (A 1 |B) < P (A2 |B).

Again, focusing on the green part we get a similar observation (0.0 <
2.8). So we infer P (A |B ) < P (A |B ).
1
c
2
c

So we combine these to conclude P (A ) < P (A ). Thus, it seems that


1 2

the victim's race does not matter: a white defendant is always less likely
to get a death penalty.

So let's ignore the victim's race. This basically means adding the red and
green tables to get the blue table. Similar argument based on this
combined table, however, seems to indicate P (A ) > P (A ) since
1 2

11.0 > 7.9.

What went wrong? This is called Simpson's paradox and often crops up
in practice.

(Think before clicking here.)

Monty Hall problem


This is based on a popular TV reality show.

The host of the program shows you three closed doors. You know
that a random one of these hides a car (considered a prize), the
remaining two doors hide goats (considered valueless). You are to
guess which door has the car. If you guess correctly, then you get
the car. Once you choose a door, the host opens some other door
and shows that there is a goat behind it. Now you are given an
option to switch to the other closed door. Should you switch?
Remember that the host knows the contents behind each door and
will always show you a door with a goat.

You can play this game online here.

Here are two ways to think about this, both natural but leading to
opposite conclusions:

1. Whether your original selection was right or wrong, there is always at


least another door hiding a goat. So the host will always open that.
There is no extra info in it. Thus, nothing can be gained by switiching.
2. Earlier you had three doors and knew nothing about their contents.
Now you at least know the content behind one door. In light of this
extra information, switiching is justified.

The confusion remains even if you do some conditional probability


computations. Let's label the the door you chose originally by the
number 1. Also let's label with the number 2 the door opened by the
host. The remaining door is labelled 3.

Here the sample space is {1, 2, 3}, the numbers denoting the possible
positions of the car. The unconditional probabilities were each. The
1

conditional probabilities are , 0, .


1

2
1

Does the confusion go away now? Unfortnately, no:

1. since > you should switch.


1

2
1

2. But the conditional probability of both doors 1 and 3 are 1

2
. So
nothing is to be gained by switching.

How to resolve the paradox? You might like to simulate the situation
using R. Allegedly, the famous mathematician G Polya was not
convinced about the correct answer until he was shown a computer
simulation!

car = sample(3,1000,rep=T)
host = c(3,2,3)
other = host[car]
sum(car==1)
sum(car==other)

Here is an explanation of the code. We shall play the game 1000


Run in cloud

times. Each time we freshly randomize the position of the car. This is
done in the first line of the code. We need a strategy for the host.
Remember that the door you selected first is called door 1. So the host's
strategy is like a function that maps car's true position to door to be kept
closed. If the car is not behind door 1, then the host has only one choice.
If the car is behind door 1, then the host can open either 2 or 3. Here,
w.l.g., we are keeping 3 closed. So the function is
host[1] = 3, host[2] = 3 and host[3] = 2. In other words, the strategy

is the array (3, 2, 3).

Problems for practice


::

EXERCISE 5: Is it true that


c
c
P (A|B) + P (A |B) = 1? Is it true that
P (A|B) + P (A|B ) = 1?

[Hint]
::

EXERCISE 6: "It is possible to have events such that P (A|B) = 1


A, B

but P (B|A) ≠ 1" Disprove or provide an example.

[Hint]
::

EXERCISE 7: "It is possible to have events such that


A, B

P (A|B) > 0.99 but P (B|A) < 0.01" Disprove or provide an example

to this statement.

[Hint]
::

EXERCISE 8: Let u2n denote the probability that a random path of


length 2n starting from (0, 0) passes through (2n, 0). Also, let u = 1.
0
Let v denote the probability that a random path of length 2n starting
2n

from (0, 0) returns to 0 for the first time at 2n. Then show without using
the explicit form of u and v that
2n 2n

v2 u2n−2 + ⋯ + v2n u0 = u2n .

[Corrected an error pointed out by Krishnam Baregama.]

[Hint]
::

EXERCISE 9: P (A ∩ B) > 0. Show that P (A|B) = P (B|A) if and only


if P (A) = P (B).

[Hint]
::

EXERCISE 10: Modern digital communication relies on transmitting 0's


and 1's from one device to another. Suppose that device A transmits a 0
with probability 0.4 and a 1 with probability 0.6. The communication
channel is noisy, so if a 1 is transmitted, it may get corrupted to a 0 in 5%
of the cases. If a 0 is transmitted, it may be corrupted into a 1 in 1%
cases. Given that device B has received a 1, what is the chance that it is
uncorrupted?

[Hint]
::

EXERCISE 11: A doctor diagnoses a disease correctly in 90% cases. If


the diagnosis is wrong, the patient dies with probability 50%. Even for a
correct diagnosis the patient dies in 10% cases. Given that a patient has
died find the conditional probability that the diagnosis was correct.
[Hint]
::

EXERCISE 12: Two fair dice are rolled. What is the conditional
probability that at least one shows a 6 given that the dice show different
numbers?

[Hint]
::

EXERCISE 13: If two fair dice are rolled, what is the conditional
probability that the first one shows 6 given that the sum of the outcomes
of the dice is i? Compute for all possible values of i.

[Hint]
::

EXERCISE 14: Here is part of a Ludo board.


What is the probability that the counter will arive at 10 in exactly two
moves? Assume that the die shows i with probability p for i = 1, . . . , 6.
i

Let T15×15 be a matrix with (i, j)-th entry p whenever


j−i

j − i ∈ {1, . . . , 6} and 0 else. Show that the probability that the counter

arrives at 14 (starting from 1) in exactly 3 moves equals the (1, 14)-th


entry of T . 3

[Hint]
::

EXERCISE 15: Let An×n = ((pij )) be a matrix where each p ≥ 0 and


stochastic
ij

for each i we have ∑ j


pij = 1. (Such a matrix is called a
matrix.) We have a ludo board with n positions:

The matrix governs the random motion of a counter jumping back and
forth over this board in the following way: If the counter is at i then it
moves to j with probability p . (If i = j, then the counter stays put.) All
ij

moves are independent. Show that the probability of the counter moving
from i to j in exactly k moves is the (i, j)-th entry of the matrix A . k

[Hint]
::

EXERCISE 16: We have N + 1 urns, labelled 0, 1, . . . , N . The urn with


label k contains k red and N − k green balls. One urn is selected at
random, and an SRSWR of size n is drawn. All the n balls are found to
be red. One more ball is drawn from the same urn. Find the conditional
probability that this ball is also red.

[Hint]
::

EXERCISE 17:

[Hint]
::

EXERCISE 18:
[Hint]
::

EXERCISE 19:

[Hint]
::

EXERCISE 20:

[Hint]
::

EXERCISE 21:

[Hint]
::

EXERCISE 22:
[Hint]
::

EXERCISE 23:

[Hint]
::

EXERCISE 24:

[Hint]
::

EXERCISE 25:
[Hint]
::

EXERCISE 26:

[Hint]
::

EXERCISE 27:

[Hint]
::

EXERCISE 28:
[Hint]
::

EXERCISE 29:

[Hint]
::

EXERCISE 30:

[Hint]
::

EXERCISE 31:
[Hint]
::

EXERCISE 32:

[Hint]
::

EXERCISE 33:

[Hint]
::

EXERCISE 34:

[Hint]
::
EXERCISE 35:

[Hint]
::

EXERCISE 36:

[Hint]
::

EXERCISE 37:

[Hint]
::
EXERCISE 38:

[Hint]
::

EXERCISE 39:

[Hint]
::

EXERCISE 40:

[Hint]
::

EXERCISE 41:
[Hint]
::

EXERCISE 42:

[Hint]
::

EXERCISE 43:

[Hint]
::
EXERCISE 44:

[Hint]
::

EXERCISE 45:

In other words, we have a random experiment that can output


1, 2, . . . , m with probabilities p , . . . , p . The experiment is run n
1 m

times. What is the chance that the last outcome is different from all the
earlier ones?

[Hint]
::

EXERCISE 46:
[Hint]
::

EXERCISE 47:

[Hint]
::

EXERCISE 48:
[Hint]
::

EXERCISE 49:

[Hint]
::
EXERCISE 50:

[Hint]
::

EXERCISE 51: Let a, b, c ∈ N. Suppose that we start with a red and b


green balls in an urn. We draw a ball at random, note its colour, replace
it, and add c more balls of that color. We continue this process again and
again. What is the probability that at the n-th stage the ball drawn will
be red? Does the probability depend on n?

[Hint]
::

EXERCISE 52: Same set up as in the last problem. Fix two natural
numbers m < n. What is the probability that the ball drawn at stage m
is green and the ball drawn at stage n is red? Does the answer depend
on m and n?

[Hint]

You might also like