Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views4 pages

Bayesian Learning Lecture 1

Bayesian Learning involves using Bayes Theorem to determine the most probable hypothesis based on observed training data and prior probabilities. The Naïve Bayes Classifier is a classification technique that assumes all predictors are independent, allowing for efficient predictions based on maximum a posteriori estimates. While it is easy to implement and performs well with categorical data, it has limitations such as zero probability issues with unseen categories and the unrealistic assumption of independence among predictors.

Uploaded by

nahimalumja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views4 pages

Bayesian Learning Lecture 1

Bayesian Learning involves using Bayes Theorem to determine the most probable hypothesis based on observed training data and prior probabilities. The Naïve Bayes Classifier is a classification technique that assumes all predictors are independent, allowing for efficient predictions based on maximum a posteriori estimates. While it is easy to implement and performs well with categorical data, it has limitations such as zero probability issues with unseen categories and the unrealistic assumption of independence among predictors.

Uploaded by

nahimalumja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Bayesian Learning

→ To determine the best hypothesis from space H, given the observed training data.
→ Bayes Theorem
→ To find the most probable hypothesis
→ Given the data D
→ Given initial knowledge about the prior probabilities of the various
hypothesis in H

Notations
→ P(h)
→ Initial probability that the hypothesis h holds, before we have observed the
training data.
→ Prior probability of h
→ That h is a correct hypothesis
→ If no such prior knowledge, simply assign the same prior probability to
each candidate hypothesis.
→ P(D)
→ Prior probability that the training data D will be observed, given no knowledge
about which hypothesis holds.
→ P(D/h)
→ Probability of observing data D, given some world in which hypothesis h
holds.
→ P(h/D)
→ Posterior probability of h
→ Probability that h holds, given the observed training data D
→ Reflects the influence of the training data.

Bayes Theorem
𝑃(𝐷⁄ℎ)𝑃(ℎ)
→ 𝑃(ℎ⁄𝐷 ) = 𝑃(𝐷)

Naïve Bayes Classifier


→ Naive (Stupid!) Bayes is a classification technique based on Bayes’ Theorem.
→ Assumes all predictors (features/attributes) are independent to each other i.e., there is
no inter-relation among those.
→ Each of the attributes should be considered for classification/prediction, contributing
equally to the outcome.
→ Each instance 𝑥 is described by a conjunction of attribute values.
→ The target function 𝑓(𝑥) can take on any value from some finite set 𝑉
→ Given a set of training examples of the target function
→ A new instance described by the tuple of attribute values 〈𝑎1 , 𝑎2 , … , 𝑎𝑛 〉
→ The learner is asked to predict the target value, or classification, for this new
instance.

Bayesian approach

→ To assign the most probable target value 𝑣𝑀𝐴𝑃 (maximum a posteriori)


→ Given the attribute values 〈𝑎1 , 𝑎2 , … , 𝑎𝑛 〉 that describe the instance
→ 𝑣𝑀𝐴𝑃 = 𝑎𝑟𝑔𝑚𝑎𝑥
𝑣𝑗 ∈ 𝑉𝑃(𝑣𝑗 |𝑎1 , 𝑎2 , … , 𝑎𝑛 )

→ Pick up the output with maximum probability


𝑃(𝐷⁄ℎ)𝑃(ℎ)
Applying Bayes Theorem: 𝑃(ℎ⁄𝐷) = 𝑃(𝐷)

𝑎𝑟𝑔𝑚𝑎𝑥𝑃(𝑎 ,𝑎 ,…,𝑎 ⁄𝑣 )𝑃(𝑣 )


→ 𝑣𝑀𝐴𝑃 = 𝑎𝑟𝑔𝑚𝑎𝑥
𝑣𝑗 ∈ 𝑉𝑃(𝑣𝑗 |𝑎1 , 𝑎2 , … , 𝑎𝑛 ) =
1 2 𝑛 𝑗 𝑗
𝑃(𝑎1 ,𝑎2 ,…,𝑎𝑛 )
𝑣𝑗 ∈ 𝑉
𝑎𝑟𝑔𝑚𝑎𝑥
= 𝑣𝑗 ∈ 𝑉𝑃(𝑎1 , 𝑎2 , … , 𝑎𝑛 ⁄𝑣𝑗 )𝑃(𝑣𝑗 )

→ Estimate 𝑃(𝑣𝑗 )
→ by counting how many times each target value 𝑣𝑗 occurs in the training
data

Naïve Bayes Classifier Approach

→ Assume that the attribute values are conditionally independent, given the target value.
→ That is, the probability of observing the conjunction 𝑎1 , 𝑎2 , … , 𝑎𝑛 is just the product
of the probabilities for the individual attributes.
→ 𝑃(𝑎1 , 𝑎2 , … , 𝑎𝑛 |𝑣𝑗 ) = ∏𝑖 𝑃(𝑎𝑖 |𝑣𝑗 )
→ Substituting in the previous equation
→ Target value output by the Naive Bayes classifier
→ 𝑣𝑁𝐵 =
𝑎𝑟𝑔𝑚𝑎𝑥
𝑣𝑗 ∈ 𝑉𝑃(𝑣𝑗 ) ∏𝑖 𝑃(𝑎𝑖 |𝑣𝑗 )

Example

→ Outlook = sunny
Temperature = cool
Humidity = high
Wind = strong
→ To predict the target value (yes or no) of the target concept Play Tennis for this new
instance.
→ 𝑣𝑁𝐵 =
𝑎𝑟𝑔𝑚𝑎𝑥
𝑣𝑗 ∈ {𝑦𝑒𝑠,𝑛𝑜}𝑃(𝑣𝑗 )𝑃(𝑂𝑢𝑡 = 𝑠𝑢𝑛⁄𝑣𝑗 ) 𝑃(𝑇𝑒𝑚𝑝 = 𝑐𝑜𝑜𝑙 ⁄𝑣𝑗 ) 𝑃(𝐻𝑢𝑚 =
ℎ𝑖𝑔ℎ⁄𝑣𝑗 ) 𝑃(𝑊𝑖𝑛𝑑 = 𝑠𝑡𝑟𝑜𝑛𝑔 ⁄𝑣𝑗 )
→ Calculation of 𝑣𝑁𝐵 is required to be based on 10 probabilities that can be
estimated from the training data.
→ From 14 training examples
9
→ 𝑃(𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠) = = 0.64
14
5
→ 𝑃(𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑛𝑜) = 14 = 0.36
→ Conditional Probabilities for wind=strong
→ 𝑃(𝑊𝑖𝑛𝑑 = 𝑠𝑡𝑟𝑜𝑛𝑔 |𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠) = 3⁄9 = 0.33
𝑃(𝑊𝑖𝑛𝑑 = 𝑠𝑡𝑟𝑜𝑛𝑔 |𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑛𝑜) = 3⁄5 = 0.60
→ 𝑃(𝐻𝑢𝑚 = ℎ𝑖𝑔ℎ|𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠) = 3⁄9 = 0.33
𝑃(𝐻𝑢𝑚 = ℎ𝑖𝑔ℎ |𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑛𝑜) = 4⁄5 = 0.80
→ 𝑃(𝑂𝑢𝑡 = 𝑠𝑢𝑛𝑛𝑦 |𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠) = 2⁄9 = 0.22
𝑃(𝑂𝑢𝑡 = 𝑠𝑢𝑛𝑛𝑦 |𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑛𝑜) = 3⁄5 = 0.60
→ 𝑃(𝑇𝑒𝑚𝑝 = 𝑐𝑜𝑜𝑙 |𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠) = 3⁄9 = 0.33
→ 𝑃(𝑇𝑒𝑚𝑝 = 𝑐𝑜𝑜𝑙 |𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑛𝑜) = 1⁄5 = 0.20
→ Using these probability estimates and similar estimates from the remaining attribute
values
→ 𝑃(𝑦𝑒𝑠)𝑃(𝑠𝑢𝑛𝑛𝑦⁄𝑦𝑒𝑠)𝑃(𝑐𝑜𝑜𝑙 ⁄𝑦𝑒𝑠) 𝑃(ℎ𝑖𝑔ℎ⁄𝑦𝑒𝑠) 𝑃(𝑠𝑡𝑟𝑜𝑛𝑔 ⁄𝑦𝑒𝑠)
→ 0.64 × 0.22 × 0.33 × 0.33 × 0.33 = 0.005
→ 𝑃(𝑛𝑜)𝑃(𝑠𝑢𝑛𝑛𝑦⁄𝑛𝑜)𝑃(𝑐𝑜𝑜𝑙 ⁄𝑛𝑜) (ℎ𝑖𝑔ℎ⁄𝑛𝑜) (𝑠𝑡𝑟𝑜𝑛𝑔⁄𝑛𝑜)
→ 0.36× 0.6 × 0.2 × 0.8 × 0.6 = 0.0207
→ Thus the naïve Bayes Classifier assigns the target value Play Tennis = ‘No’ to this
new instance, based on the probability estimates learned from the training data.
Pros and Cons of Naive Bayes

→ Pros:
→ It is easy and fast to predict class of test data set.
→ It also performs well in multi class prediction
→ When assumption of independence holds, a Naive Bayes classifier performs
better compared to other models like logistic regression and you need less
training data.
→ It performs well in case of categorical (discrete) input variables compared to
numerical (continuous) variable(s).
→ Cons:
→ If categorical variable has a category (in test data set), which was not observed
in training data set, then model will assign a 0 (zero) probability and will be
unable to make a prediction.
→ This is often known as “Zero Frequency”.
→ Another limitation of Naive Bayes is the assumption of independent
predictors.
→ In real life, it is almost impossible that we get a set of predictors which
are completely independent.

You might also like