UNIT4
Bayesian and
Computational Learning
Dr Raghavendra S
Associate Professor & Coordinator
BTech(CSE) in AIML
MISSION VISION CORE VALUES
CHRIST is a nurturing ground for an Excellence and Service Faith in God | Moral Uprightness
individual’s holistic development to make Love of Fellow Beings 1
effective contribution to the society in a Social Responsibility | Pursuit of
CHRIST
Deemed to be University
UNIT4- Bayesian and Computational
Learning
●Bayes Theorem
●Concept Learning
●Maximum Likelihood
●Minimum Description Length Principle
●Bayes Optimal Classifier
●Gibbs Algorithm
●Naïve Bayes Classifier
●Bayesian Belief Network
●EM Algorithm.
2
Excellence and Service
CHRIST
Deemed to be University
BAYES
THEOREM
3
Excellence and Service
CHRIST
Deemed to be University
Bayesian Learning
● Bayesian Learning make use of the probability to model
the data and to measure the uncertainty in the
prediction.
● It uses prior knowledge to make the prediction.
● Can be used for both the classification and regression
problems
● Each observed training data can incrementally
decrease or increase the estimated probability of a
hypothesis.
● It combine the priorExcellence
knowledge with the observed data
and Service
4
CHRIST
Deemed to be University
Bayes Theorem gives Conditional Probability of an
event A given another event B has occurred
Where,
P(A|B) : Probability of A given B
P(B|A) : Probability of B given A
P(A) : Prior Probability of event A
P(B) : Prior Probability of event B
Algorithmic Steps
1. Compute Prior Probability of A
2. Compute Prior Probability of B
3. Compute Likelihood Probability of B given A
4. Find Posterior Probability of A given B using step 1,2,3
5
Excellence and Service
CHRIST
Deemed to be University
Bayes Theorem Derivation
This is Bayes
Theorem
6
Excellence and Service
CHRIST
Deemed to be University
Ex
7
Excellence and Service
CHRIST
Deemed to be University
h: Hypothesis
D: Training Data
Ignore P(D) which is
common to both cases
Max
New patient is not
having cancer
8
Excellence and Service
CHRIST
Deemed to be University
Max
New patient is not
having cancer
9
Excellence and Service
CHRIST
Deemed to be University
CONCEPT
LEARNING
10
Excellence and Service
CHRIST
Deemed to be University
11
Excellence and Service
CHRIST
Deemed to be University
Concept Learning
● The problem of inducing a generic function from
specific training example is the central concept of
concept learning.
● Concept learning can be formulated as a problem of
searching through a predefined space of potential
hypothesis for the hypothesis that best fit the training
examples.
● We need to go with predefined set of potential
hypothesis to identify One hypothesis that best fits the
training examples.
12
Excellence and Service
CHRIST
Deemed to be University
Concept Learning
Case Study
13
Excellence and Service
CHRIST
Deemed to be University
In which
heuristically search
for the most
optimum or best
solution wrt given
training set
14
Excellence and Service
CHRIST
Deemed to be University
Illustration: Determine when Tom enjoys
his sports under the given conditions
Possible
values for
each
attributes
15
Excellence and Service
CHRIST
Deemed to be University
16
Excellence and Service
CHRIST
Deemed to be University
17
Excellence and Service
CHRIST
Deemed to be University
18
Excellence and Service
CHRIST
Deemed to be University
Add 2 more possibilities to each we get
19
Excellence and Service
CHRIST
Deemed to be University
Substitute ? With all possible values and determine decision. Ex
Sunny, Warm, Normal/High, Strong, Warm/Cool, Same/Change
20
Excellence and Service
CHRIST
Deemed to be University
Substitute ? With all possible values and determine decision. Ex
Sunny, Warm, Normal/High, Strong, Warm/Cool, Same/Change
21
Excellence and Service
CHRIST
Deemed to be University
Substitute ? With all possible values and determine decision. Ex
Sunny, Warm, Normal/High, Strong, Warm/Cool, Same/Change
22
Excellence and Service
CHRIST
Deemed to be University
Substitute ? With all possible values and determine decision. Ex
Sunny, Warm, Normal/High, Strong, Warm/Cool, Same/Change
23
Excellence and Service
CHRIST
Deemed to be University
FIND-S Algorithm – Finding a Maximally
Specific Hypothesis
24
Excellence and Service
CHRIST
Deemed to be University
25
Excellence and Service
CHRIST
Deemed to be University
26
Excellence and Service
CHRIST
Deemed to be University
27
Excellence and Service
CHRIST
Deemed to be University
28
Excellence and Service
CHRIST
Deemed to be University
29
Excellence and Service
CHRIST
Deemed to be University
30
Excellence and Service
CHRIST
Deemed to be University
31
Excellence and Service
CHRIST
Deemed to be University
32
Excellence and Service
CHRIST
Deemed to be University
33
Excellence and Service
CHRIST
Deemed to be University
Maximum
Likelihood
&
Least Squared
Error
Excellence and Service
34
CHRIST
Deemed to be University
Maximum Likelihood and Least-Squared
Error Hypothesis
35
Excellence and Service
CHRIST
Deemed to be University
Maximum Likelihood and Least-Squared
Error Hypothesis
36
Excellence and Service
CHRIST
Deemed to be University
Maximum Likelihood and Least-Squared
Error Hypothesis
37
Excellence and Service
CHRIST
Deemed to be University
Maximum Likelihood and Least-Squared
Error Hypothesis
38
Excellence and Service
CHRIST
Deemed to be University
Maximum Likelihood and Least-Squared
Error Hypothesis
39
Excellence and Service
CHRIST
Deemed to be University
Maximum Likelihood and Least-Squared
Error Hypothesis
40
Excellence and Service
CHRIST
Deemed to be University
Maximum Likelihood and Least-Squared
Error Hypothesis
● Let us assume that the Target variable is normally
distributed. If the target variables are normally
distributed then we can use the probability density
function of normal distribution as below:
Here, μ = Mean, x = input and σ = Standard Deviation
(assume constant)
41
Excellence and Service
CHRIST
Deemed to be University
Maximum Likelihood and Least-Squared
Error Hypothesis
● Let us assume that the Target variable is normally
distributed. If the target variables are normally
distributed then we can use the probability density
function of normal distribution as below:
Here, μ = Mean, x = input and σ = Standard Deviation
(assume constant)
● We know that, in equation given below, the probability
is normally distributed, we can replace by
so that = and h =
42
Excellence and Service
CHRIST
Deemed to be University
Maximum Likelihood and Least-Squared
Error Hypothesis
● From the above replacement we get:
43
Excellence and Service
CHRIST
Deemed to be University
Maximum Likelihood and Least-Squared
Error Hypothesis
● Taking logarithm on the RHS, we get:
Taking logarithm will reduce the
Complexity and the Summation
will be converted to Product term.
Formula used is:
ln(xy) = ln(x) + (ln(y)
ln(ex ) = x * ln(e)
44
Excellence and Service
CHRIST
Deemed to be University
Maximum Likelihood and Least-Squared
Error Hypothesis
● Taking logarithm on the RHS, we get:
● Discard the constant, we get:
45
Excellence and Service
CHRIST
Deemed to be University
Maximum Likelihood and Least-Squared
Error Hypothesis
● Taking logarithm on the RHS, we get:
● Discard the constant, we get:
● Maximalizing the negative quantity is nothing but
minimizing the positive quantity, so convert the argmax
to argmin, we get:
46
Excellence and Service
CHRIST
Deemed to be University
Maximum Likelihood and Least-Squared
Error Hypothesis
● Taking logarithm on the RHS, we get:
● Discard the constant, we get:
● Maximalizing the negative quantity is nothing but
minimizing the positive quantity, so convert the argmax
to argmin, we get:
● Finally discard the constant, we get:
47
Excellence and Service
CHRIST
Deemed to be University
MLE-In case of discrete values
Discrete Ex: Probability of Outlook is Sunny Given yes
and
Probability of Outlook is Sunny Given No
48
Excellence and Service
CHRIST
Deemed to be University
What if MLE for Continuous data?
Target is Attributes are
Discrete Continuous
Use this
equation
49
Excellence and Service
CHRIST
Deemed to be University
BAYES OPTIMAL
CLASSIFIER
(Maximization function)
50
Excellence and Service
CHRIST
Deemed to be University
Bayes Optimal Classifier(Max
function)
51
Excellence and Service
CHRIST
Deemed to be University
52
Excellence and Service
CHRIST
Deemed to be University
Example-Bayes Optimum Classifier
53
Excellence and Service
CHRIST
Deemed to be University
Max{0.4, 0.6}= 0.6
belongs to -ve
54
Excellence and Service
CHRIST
Deemed to be University
55
Excellence and Service
CHRIST
Deemed to be University
GIBBS
ALGORITHM
For more hypothesis
56
Excellence and Service
CHRIST
Deemed to be University
Gibbs Algorithm
57
Excellence and Service
CHRIST
Deemed to be University
Naive Bayes Classifier
58
Excellence and Service
CHRIST
Deemed to be University
59
Excellence and Service
CHRIST
Deemed to be University
60
Excellence and Service
CHRIST
Deemed to be University
61
Excellence and Service
CHRIST
Deemed to be University
Po= LPr/Pr
62
Excellence and Service
CHRIST
Deemed to be University
Case1: Based on highest probability
63
Excellence and Service
CHRIST
Deemed to be University
64
Excellence and Service
CHRIST
Deemed to be University
Case2: Based on Most Recent Data
65
Excellence and Service
CHRIST
Deemed to be University
66
Excellence and Service
CHRIST
Deemed to be University
EXAMPLE1
NAÏVE BAYES CLASSIFIER
67
Excellence and Service
CHRIST
Deemed to be University
Example1:
Find the probability of the given instance for the given prior and
conditional probabilities using Naïve Bayes Theorem
1 Target variable with
4 Attributes with given Cond. Prob given prior Prob
Find
probability of
the given
instance
68
Excellence and Service
CHRIST
Deemed to be University
Prior Probabilities of Target Variable
Frequency Tables of Conditional Probabilities of Attributes wrt Y/N
69
Excellence and Service
CHRIST
Deemed to be University
Generalized Naïve Bayes formula is
Prob of A given B
70
Excellence and Service
CHRIST
Deemed to be University
71
Excellence and Service
CHRIST
Deemed to be University
Try this: Find the probability of playing in
weather Sunny
Bayes Theorem
Problem Firmulation
2/9 * 9/14
--------------- = 0.399
5/14
72
Excellence and Service
CHRIST
Deemed to be University
Example2: (3 Targets, 3 Attributes)
Find the Probability of a given Fruit being Orange, banana or others given its
prior probabilities having 3 attributes being yellow, Sweet, and Long given its
conditional probabilities using Naïve Bayes Algorithm
3 Target Variables
with given prior
Prob 3 Attributes with given conditional Prob
Fruit Yellow Sweet Long Total
1) Orange 350 450 0 650
2) Banana 400 300 350 400
Total
3) Others 50 100 50 150
Probabilities
Total 800 850 400 1200 given
Generalized Naïve Bayes formula is
Prob of A given B
73
Excellence and Service
Fruit yellow Sweet Long Total
CHRIST
Orange 350 450 0 650 Deemed to be University
Banana 400 300 350 400
Others 50 100 50 150
Total 800 850 400 1200
74
Excellence and Service
CHRIST
Deemed to be University
P(Fruit |Orange)= P(O|Y) x P(O|S) x P(O|L) = 0.53 x 0.69 x 0 = 0
Similarly
Determine probability of Fruit being Banana & Others
P(fruit |Banana)= P(B|Y) x P(B|S) x P(B|L) = 1 x 0.75 x 0.87 = 0.65
P(fruit |Others)= P(Ot|Y) x P(Ot|S) x P(Ot|L) = 0.33 x 0.66 x 0.33 = 0.072
Hence given fruit is belongs to Banana being the highest probability of
0.65
75
Excellence and Service
CHRIST
Deemed to be University
Joint Probability
Distribution
For all combinations of Variables
76
Excellence and Service
CHRIST
Deemed to be University
Joint Probability Distribution
X is not true
Y is not true
77
Excellence and Service
CHRIST
Deemed to be University
Limitations of Joint Probability
Distribution
78
Excellence and Service
CHRIST
Deemed to be University
Bayesian Belief
Network
Also called as
Bayesian Network/Belief
Network/Probabilistic N/w
79
Excellence and Service
CHRIST
Deemed to be University
Bayesian Networks
● A Bayesian network is a probabilistic graphical model
which represents a set of variables and their conditional
dependencies using a directed acyclic graph.
● It is also called a Bayes network, belief network,
decision network, or Bayesian model.
● Bayesian networks are probabilistic, because these
networks are built from a probability distribution, and
also use probability theory for prediction and anomaly
detection.
● Real world applications are probabilistic in nature, and
to represent the relationship between multiple events,
we need a Bayesian network.
● Bayesian Network can be used for building models from
data and experts opinions, and it consists of two parts:
1. Directed Acyclic Graph
2. Table of conditional probabilities. 80
Excellence and Service
CHRIST
Deemed to be University
Bayesian Networks - ● A Bayesian network graph is made
Example up of nodes and Arcs (directed
links), where:
Each node corresponds to the
random variables, and a variable can
be continuous or discrete.
Arc or directed arrows represent the
causal relationship or conditional
probabilities between random variables.
These directed links or arrows connect the
pair of nodes in the graph.
These links represent that one node directly
influence the other node, and if there is no
directed link that means that nodes are
independent with each other
• In the above diagram, A, B, C, and D
are random variables represented by
the nodes of the network graph.
• If we are considering node B, which
is connected with node A by a
81
directed
Excellence and Service arrow, then node A is called
CHRIST
Deemed to be University
Bayesian Networks
● Each node in the Bayesian network has condition
probability distribution
P(Xi | Parent(Xi) ), which determines the effect of the parent
on that node.
● Bayesian network is based on Joint probability distribution
and conditional probability.
82
Excellence and Service
CHRIST
Deemed to be University
Joint probability distribution
● If we have variables x1, x2, x3,....., xn, then the
probabilities of a different combination of x1, x2, x3..
xn, are known as Joint probability distribution.
P[x1, x2, x3,....., xn], it can be written as the following way
in terms of the joint probability distribution.
= P[x1 | x2, x3,....., xn] P[x2, x3,....., xn]
= P[x1 | x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
In general for each variable Xi, we can write the equation
as:
P(Xi | Xi-1,........., X1) = P(Xi | 83
Excellence and Service
Parents(Xi ))
CHRIST
Deemed to be University
84
Excellence and Service
CHRIST
Deemed to be University
Ex: Compute Conditional Probability
for the DAG
= 0.7 x 0.4 x 0.6 x 0.3 = 0.0504
85
Excellence and Service
CHRIST
Deemed to be University
86
Excellence and Service
CHRIST
Deemed to be University
87
Excellence and Service
CHRIST
Deemed to be University
88
Excellence and Service
CHRIST
Deemed to be University
89
Excellence and Service
CHRIST
Deemed to be University
90
Excellence and Service
CHRIST
Deemed to be University
EM Algorithm
91
Excellence and Service
CHRIST
Deemed to be University
92
Excellence and Service
CHRIST
Deemed to be University
EM
Algorithm
Expectation Maximization For Unsupervised
Learning
93
Excellence and Service
CHRIST
Deemed to be University
EM Algorithm
94
Excellence and Service
CHRIST
Deemed to be University
EM Algorithm
Clustering
Get best
Cluster
95
Excellence and Service
CHRIST
Deemed to be University
EM Flow Chart
96
Excellence and Service
CHRIST
Deemed to be University
97
Excellence and Service
CHRIST
Deemed to be University
EM Algorithm: Example1
Problem definition
• Let A & B : two Coins
• ϴ1: Probability of Getting Head with Coin A
• ϴ2: Probability of Getting Head with Coin B
• Find the final probability values of ϴ1 & ϴ2 by tossing
coin A & B 5 times randomly
98
Excellence and Service
CHRIST
Deemed to be University
99
Excellence and Service
CHRIST
Deemed to be University
This indicates that whenever we Toss Coin A then
there is 80% of chance that we get HEAD and
whenever we Toss Coin B then there is 45% of
chance that we get HEAD
In the above scenario we see that 5 experiments
are done and, in each experiment, we know which
coin is selected. If suppose the experiments are
conducted without knowing the COIN labels then
it will be difficult
Excellence to identify the COIN, cannot fill100
and Service
the TABLE and calculate the THETA values.
CHRIST
Deemed to be University
To solve this, we make use of the EM
algorithm.
We consider the same experiment but without
mentioning the Labels.
Excellence and Service
101
CHRIST
Deemed to be University
Step1: Assume Probabilities
102
Excellence and Service
CHRIST
Deemed to be University
E-Step2
Using Initial ϴ1 & ϴ2; Head & Tail information
103
Excellence and Service
CHRIST
Deemed to be University
Step3- Compute L(H), L(T) for each coin
Step4: Compute new ϴ1 & ϴ2
Repeat Step1 to
Step4 till the
convergence
Reached
104
Excellence and Service
CHRIST
Deemed to be University
Theta A 0.6
Theat B 0.5
HEADS TAILS HEADS TAILS
# of # of
L(A) L(B) P(A) P(B) COIN A COIN B
Heads Tails
Exp 1 5 5 0.000796262 0.000976563 0.45 0.55 2.2 2.2 2.8 2.8
Exp 2 9 1 0.004031078 0.000976563 0.80 0.20 7.2 0.8 1.8 0.2
Exp 3 8 2 0.002687386 0.000976563 0.73 0.27 5.9 1.5 2.1 0.5
Exp 4 4 6 0.000530842 0.000976563 0.35 0.65 1.4 2.1 2.6 3.9
Exp 5 7 3 0.00179159 0.000976563 0.65 0.35 4.5 1.9 2.5 1.1
21.3 8.6 11.7 8.4
New Theta A 0.713012
New Theta B 0.581339
105
Excellence and Service
CHRIST
Deemed to be University
106
Excellence and Service
CHRIST
Deemed to be University
Unit4 Summary
●Bayes Theorem- Derivation
●Concept Learning
●Maximum Likelihood(Minimization)
●Minimum Description Length Principle
●Bayes Optimal Classifier(Maximization)
●Gibbs Algorithm(n Hypothesis)
●Naïve Bayes Classifier((Maximization)
●Bayesian Belief Network(Conditional Prob)
●EM Algorithm.(Clustering)
107
Excellence and Service