0 ratings0% found this document useful (0 votes) 59 views10 pagesNaive Bayes Algorithm Notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Naive Bayes
Naive bayes is a supervised machine learning algorithm based on Bayes
theorem. It is a probabilistic classifier , means it predicts based on the
probability. It is used for spam detection, sentiment analysis, text
classification etc.
Naive : It is called naive because it assume that the occurence of a certain
feature is independent of the occurence of the other features.
Example: Movie Genre Classification
Suppose you want to classify movies into genres based on two features:
the presence of keywords in the movie's description and the director of the
movie.
In a "naive" approach, you might assume that the presence of keywords
(e.g., action, romance, comedy) and the director's name are independent
factors when determining a movie's genre. In other words, you treat these
features as if they don't influence each other.
For instance:
If a movie's description contains the keyword "action," you might
immediately assume it's an action movie, without considering the director.
If a movie is directed by a famous director known for making romantic
films, you might classify it as a romance movie, regardless of the
keywords in the description.
The "naive" aspect here is that you're assuming no correlation or
interaction between keywords and the director's influence on the movie's
genre.
Bayes: It is called Bayes because it depends on the principle of Bayes
theorem.Firstly , lets discuss about Conditional Probability
It refers to the probability of event occuring given that another event has
already occured. In text classification, predicting the probability of
particular class(Spam/Not Spam) given the presence of certain
features(words) in a document.
Example of dice : Rolling two dice together .| Condibionall Racha bined,
7CAls) = = at n v
given PLR) to
jess x Radding Kise dice Frogaten,
| Dt
eee Oj Ce fh, i)
| The with be ke bah A, 36 oukomes
[Evie Aah. apace 7o-b ne inthe eee Tak value akharined
Aice
P(A=5) =
OLD hak — y thak DitDpr 210
PW@ = Ploitpr cle) =32 =)
36 Te
Mou im Conditional lx obabi dik, “conti wis
Q- Bubabitct, ty sf Dice Di =S 4 Ven Du 4D2 < 10
PLDi=5 ea Elo) = PCA [g) =f(Ang)
po Peg)
‘Plang-5 Pls) — 22
Be Me
Pldizs| DitDe < ta) =2,Bayes Theorem for spam filtering
But what does this theorem have to do with our spam filter? We want to
find out what is the likelihood of a specific message to be spam. But a
message consists of multiple words. In order to find the combined
probability of the words we first have to find the probability of each
separate word being a spam word. This is also known as the 'spaminess' of
a word and we can calculate it by using one special case of Bayes Theorem
where the event is a binary variable.
P(S|W) = —
Pi
) + P(WH)- P(A)’
where,
= P(S|W) is the probability that a message is 3 spam, knowing that a spe word E
« P(W1S) is the probability that the specific word appears in spam messages;
« P(S) is the overall probability that any given message is spam;
« P(W/#) is the probability that the specific word appears in ham messages;
« P(A) is the overall probability that any given message is hamBayes Thesstnn y cathe A
Pula PH she) a
Poker PCR = eviderce
Ras ailehe 4 Giveen PCR) #0
Paave Le way Cond tient Pasbabit if
PCA/g) = Pere)
PCa)
Ang= Ona <
PG/e) = elena) = PCANg) —@
— PUD PCA)
fCANg) = PCe/A) PCA) *anwnge@
Farm ©
P(n[e) = PCp/a) PCa)
PCa)Madceanabiech “Ltuttien
Feakuns Ke Ix %2,%3 — -=-%}
C Neos vortable iz
PUy|x) = PCy) oO)
POX
PXD= 8 Gx, y---%|¥)
| =A xy aes al) POX |xax%4 Xe)
PCG [DY
By Cand Honl Inde bendence
nen = POu| Fe PCa | ¥ LP, |) P %)
P Gay)
ae
Aina na eT
(asta a aes AvdeSentences Ouiput
Send us your password Spam
Send us your review Spam
Review your password Ham
Review us Ham
Send us password Spam
Send us your account Spam
Probability of Spam and Ham is :
P(Spam)= 4/5
P(Ham) =2/6
Probability of every word in Spam and Ham
Vocabulary Spam Ham
password 2/4 12
review 14 2/2
send. 3/4 1/2
us 3/4 12
your 3/4 1/2
account 14 0/2
Now new messege came and we have to check whether it is spam or ham
“review us now”
Conditional Probability
P(review us now/Spam)=P(0,1,0,1,0,0)=(1-2/4)(1/4)(1-
3/4y*(3/4)*(3/4)*(1-3/4)*(1-L/4)= 9/2048P(review us now/Ham)=P(0,1,0,1,0,0)= (1-1/2)*(2/2)*(1-L/2)*(L/2)*(1-
L/2y*(1-0/2)=1/16
Apply Bayes theorem:
P(Spam/review us now)
=P(review us now/Spam)*P (Spam)/P(review us now/Spam)*P(Spam)
+P(review us now/Ham)*P(Ham)
=9/2048*4/6/9/2 048*4/6+ 1/16*2/6 = 0.1229 approx.
P(Ham/review us now)
=P(review us now/Ham)*P(Ham)/P(review us now/Ham)*P(Ham)
+P(review us now/Spam)*P (Spam)
1/16*2/6/1/16 *2/6+9/2048*4/5= 0.8446 approx.
From the predicted probability , we can say that our messege is Ham.
Advantages:
1. Easy to understand and implement.
2. Computationally efficient and requires small amout of training data.
3. Works well with high-dimensional data
4, Less prone to overfitting.
5. It can be used in online learning, where the model can be updated with
new data without the need for retraining from scratch.
Disadvantages:
1. It assumes features should independent. In reality, many real world
datasets have correlated features which can lead to suboptimal solution.
2. When a feature doesn’t appear in the training data for a particular class,
it assigns probability of zero, which can lead to incorrect classifications.3.Due to its simplicity, Naive bayes may not capture complex relationships
in the data as well as more advanced models like decision trees or neural
networks.
4. It can be insensitive to imbalance datasets.