Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
64 views2 pages

Document Vector Table Question 2

Uploaded by

jjmanavalan09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views2 pages

Document Vector Table Question 2

Uploaded by

jjmanavalan09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Create a Document vector table using bag of words Algorithm for

the following corpus .


Document 1: We can use health chatbots for treating stress
Document 2: We can use NLP to create chatbots and we will be making
health chatbots now
Document 3: Health chatbots cannot replace human counsellors now
Answer:
I. Text Normalisation: In Text Normalisation, we undergo several steps to
normalise the text to a lower level
a. Sentence Segmentation: Under sentence segmentation, the whole corpus
is divided into sentences.
We can use health chatbots for treating stress.
We can use NLP to create chatbots and we will be making health
chatbots now!!
Health chatbots cannot replace human counsellors now

b. Tokenisation: Under tokenisation, every word, number and special


character is considered separately and each of them is now a separate
token

We, can, use, health, chatbots, for, treating, stress,.,


We, can, use, NLP, to, create, chatbots, and, we, will, be, making,
health, chatbots, now,!,!
Health, chatbots, cannot, replace, human, counsellors, now

c. Removal of stop Words: Stopwords are the words which occur very
frequently in the corpus but do not add any value to it

We, use,
health,chatbots,treating,stress,NLP,create,making,now,cannot,replace,human,
counsellors,and
d. Converting into common case: After the stopwords removal, we
convert the whole text into a similar case, preferably lower case
we, use, health, chatbots,
treating,stress,nlp,create,making,now,cannot,replace,human, counsellors ,and
2. Stemming/Lemmatisation: Stemming and lemmatization both are
alternative processes to each other as the role of both the processes is
same – removal of affixes
we, use, health, chatbot, treat, stress, nlp, create, make, now, cannot, replace,
human, counsellor,and
II. CRETAE A DICTIONARY: list down all the words which occur in all three
documents
we use health chatbo treat stress nlp create make
t

now cannot replac human counsell and


e or
III. Create a Document Vector for 1 document: for each word in the
document, if it matches with the vocabulary, put a 1 under it. If the same word
appears again, increment the previous value by 1. And if the word does not occur
in that document, put a 0 under it.
we use healthchatb treat stress nlp create make
ot
1 1 1 1 1 1 0 0 0
now canno replac huma counsell and
t e n or
0 0 0 0 0 0
iv. Create a Document Vector for 3 documents.
we use health chatbot treat stress nlp create make
1 1 1 1 1 1 0 0 0
1 1 1 2 0 0 1 1 1
0 0 1 1 0 0 0 0 0

now cannot replace human counsellor


0 0 0 0 0
1 0 0 0 0
1 1 1 1 1

You might also like