Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
59 views10 pages

Naive Bayes Algorithm Notes

Algo notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
59 views10 pages

Naive Bayes Algorithm Notes

Algo notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 10
Naive Bayes Naive bayes is a supervised machine learning algorithm based on Bayes theorem. It is a probabilistic classifier , means it predicts based on the probability. It is used for spam detection, sentiment analysis, text classification etc. Naive : It is called naive because it assume that the occurence of a certain feature is independent of the occurence of the other features. Example: Movie Genre Classification Suppose you want to classify movies into genres based on two features: the presence of keywords in the movie's description and the director of the movie. In a "naive" approach, you might assume that the presence of keywords (e.g., action, romance, comedy) and the director's name are independent factors when determining a movie's genre. In other words, you treat these features as if they don't influence each other. For instance: If a movie's description contains the keyword "action," you might immediately assume it's an action movie, without considering the director. If a movie is directed by a famous director known for making romantic films, you might classify it as a romance movie, regardless of the keywords in the description. The "naive" aspect here is that you're assuming no correlation or interaction between keywords and the director's influence on the movie's genre. Bayes: It is called Bayes because it depends on the principle of Bayes theorem. Firstly , lets discuss about Conditional Probability It refers to the probability of event occuring given that another event has already occured. In text classification, predicting the probability of particular class(Spam/Not Spam) given the presence of certain features(words) in a document. Example of dice : Rolling two dice together . | Condibionall Racha bined, 7CAls) = = at n v given PLR) to jess x Radding Kise dice Frogaten, | Dt eee Oj Ce fh, i) | The with be ke bah A, 36 oukomes [Evie Aah. apace 7 o-b ne inthe eee Tak value akharined Aice P(A=5) = OLD hak — y thak DitDpr 210 PW@ = Ploitpr cle) =32 =) 36 Te Mou im Conditional lx obabi dik, “conti wis Q- Bubabitct, ty sf Dice Di =S 4 Ven Du 4D2 < 10 PLDi=5 ea Elo) = PCA [g) =f(Ang) po Peg) ‘Plang-5 Pls) — 22 Be Me Pldizs| DitDe < ta) =2, Bayes Theorem for spam filtering But what does this theorem have to do with our spam filter? We want to find out what is the likelihood of a specific message to be spam. But a message consists of multiple words. In order to find the combined probability of the words we first have to find the probability of each separate word being a spam word. This is also known as the 'spaminess' of a word and we can calculate it by using one special case of Bayes Theorem where the event is a binary variable. P(S|W) = — Pi ) + P(WH)- P(A)’ where, = P(S|W) is the probability that a message is 3 spam, knowing that a spe word E « P(W1S) is the probability that the specific word appears in spam messages; « P(S) is the overall probability that any given message is spam; « P(W/#) is the probability that the specific word appears in ham messages; « P(A) is the overall probability that any given message is ham Bayes Thesstnn y cathe A Pula PH she) a Poker PCR = eviderce Ras ailehe 4 Giveen PCR) #0 Paave Le way Cond tient Pasbabit if PCA/g) = Pere) PCa) Ang= Ona < PG/e) = elena) = PCANg) —@ — PUD PCA) fCANg) = PCe/A) PCA) *anwnge@ Farm © P(n[e) = PCp/a) PCa) PCa) Madceanabiech “Ltuttien Feakuns Ke Ix %2,%3 — -=-%} C Neos vortable iz PUy|x) = PCy) oO) POX PXD= 8 Gx, y---%|¥) | =A xy aes al) POX |xax%4 Xe) PCG [DY By Cand Honl Inde bendence nen = POu| Fe PCa | ¥ LP, |) P %) P Gay) ae Aina na eT (asta a aes Avde Sentences Ouiput Send us your password Spam Send us your review Spam Review your password Ham Review us Ham Send us password Spam Send us your account Spam Probability of Spam and Ham is : P(Spam)= 4/5 P(Ham) =2/6 Probability of every word in Spam and Ham Vocabulary Spam Ham password 2/4 12 review 14 2/2 send. 3/4 1/2 us 3/4 12 your 3/4 1/2 account 14 0/2 Now new messege came and we have to check whether it is spam or ham “review us now” Conditional Probability P(review us now/Spam)=P(0,1,0,1,0,0)=(1-2/4)(1/4)(1- 3/4y*(3/4)*(3/4)*(1-3/4)*(1-L/4)= 9/2048 P(review us now/Ham)=P(0,1,0,1,0,0)= (1-1/2)*(2/2)*(1-L/2)*(L/2)*(1- L/2y*(1-0/2)=1/16 Apply Bayes theorem: P(Spam/review us now) =P(review us now/Spam)*P (Spam)/P(review us now/Spam)*P(Spam) +P(review us now/Ham)*P(Ham) =9/2048*4/6/9/2 048*4/6+ 1/16*2/6 = 0.1229 approx. P(Ham/review us now) =P(review us now/Ham)*P(Ham)/P(review us now/Ham)*P(Ham) +P(review us now/Spam)*P (Spam) 1/16*2/6/1/16 *2/6+9/2048*4/5= 0.8446 approx. From the predicted probability , we can say that our messege is Ham. Advantages: 1. Easy to understand and implement. 2. Computationally efficient and requires small amout of training data. 3. Works well with high-dimensional data 4, Less prone to overfitting. 5. It can be used in online learning, where the model can be updated with new data without the need for retraining from scratch. Disadvantages: 1. It assumes features should independent. In reality, many real world datasets have correlated features which can lead to suboptimal solution. 2. When a feature doesn’t appear in the training data for a particular class, it assigns probability of zero, which can lead to incorrect classifications. 3.Due to its simplicity, Naive bayes may not capture complex relationships in the data as well as more advanced models like decision trees or neural networks. 4. It can be insensitive to imbalance datasets.

You might also like