0% found this document useful (0 votes)

31 views6 pages

NLP - Worksheet Solved

The document provides steps for processing text data for natural language processing tasks. It describes segmenting text into sentences and tokens, removing stopwords and special characters, stemming and lemmatizing words, creating bag-of-words representations by counting word frequencies in documents, and calculating term frequency-inverse document frequency (tf-idf) to measure how important words are to documents. The document contains an example text corpus and walks through each processing step on the data.

Uploaded by

lomej22442

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views6 pages

NLP - Worksheet Solved

Uploaded by

lomej22442

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

NLP Worksheet

Text processing, bag of words, tf-idf activity

Suppose you have obtained these information and you would like to analyse it. Let’s start by making it ready for the
computer!

Corpus
Document 1: We can use health chatbots for treating stress.
Document 2: We can use NLP to create chatbots and we will be making health chatbots now!
Document 3: Health Chatbots cannot replace human counsellors now. Yay >< !! @1nteLA!4Y

Step 1: Sentence Segmentation

No. Sentence

1 We can use health chatbots for treating stress.

2 We can use NLP to create chatbots and we will be making health chatbots now!
3 Health Chatbots cannot replace human counsellors now. Yay! >#< @!@!nLtAY

Step 2: Tokenization
Separate your sentences into tokens. How many tokens do you have?

Tokens

We, can, use, health, chatbots, for, treating, stress, ., We, can, use, NLP, to, create, chatbots,
and, we, will, be, making, health, chatbots, now, !, Health, Chatbots, cannot, replace, human,
counsellors, now, ., Yay, !, >#<, @!@!nLtAY

26
Number of tokens: ________
Step 3: Remove stopwords, special characters, numbers
List out the stopwords, special characters, and numbers that you want to remove!

Stopwords, special characters, and numbers

Remove stopwords, special characters, numbers List out the stopwords, special characters,
and numbers that you want to remove! Stopwords, special characters, and numbers

we, can, use, for, to, and, we, will, be, now, ., !, >#<, @!@!nLtAY

Step 4: Converting text to a common case

Which text do you need to modify? What is the modified form?

Modified form

Converting text to a common case Which text do you need to modify? What is the modified
form? Modified form

health chatbots, nlp, create chatbots, making health chatbots, health chatbots cannot replace
human counsellors, yay

Step 5: Stemming
List out the stem words.

Stem words

Stemming List out the stem words. Stem words

health, chatbot, nlp, creat, chatbot, make, health, chatbot, health, chatbot, cannot, replac,
human, counsellor, yay

Step 6: Lemmatization
List out the root words/ lemma.

Lemma

Lemmatization List out the root words/ lemma. Lemma

health, chatbot, nlp, create, chatbot, make, health, chatbot, health, chatbot, can, replace,
human, counsellor, yay
Final data
List out the final, processed data.

Processed data

Congratulations, you’ve managed to process the data!

Bag of words
Step 1: Collect data and process it
For this exercise, we can use the sentences without processing it so that it is easier for us to read the sentences.

No. Sentence

1 We can use health chatbots for treating stress

2 We can use NLP to create chatbots and we will be making health chatbots now

3 Health chatbots cannot replace human counsellors now

Step 2: Create dictionary

Make a list of all the different words in the text.

Dictionary

dictionary = [‘we’, ‘can’, ‘use’, ‘health’, ‘chatbots’, ‘for’, ‘treating’, ‘stress’, ‘nlp’, ‘to’, ‘create’,
‘and’, ‘will’, ‘be’, ‘making’, ‘now’, ‘cannot’, ‘replace’, ‘human’, ‘counsellors’]

Step 3:Create document vectors

Use the next page to create your document vector!
document_vectors = [ [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],# sentence 1
[1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0], # sentence 2
[0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1] # sentence 3 ]

Each vector has the same length as the dictionary, and each element corresponds to the
count of a word in the sentence. For example, the first element of the first vector is 1, which
means that the word ‘we’ appears once in the first sentence. The last element of the last vector
is 1, which means that the word ‘counsellors’ appears once in the last sentence.
Tf-idf
You’ve obtained your bag of words. Now let’s continue with the tf-idf!

Step 1 - 3: Count the number of documents where the word appears at least once & write that
number down next to the word in your vocabulary to get your document frequency. Draw your
own table for this!

Word Document
Example 1 Document
of a document frequency:2 Document 3
we 1 1 0
can 1 1 0
use
aman1 1 0and Anil are stressed went to a therapist download health chatbot
health 1 1 1
chatbots 1 1 1
for 21 0 0 1 2 1 1 2 2 2 1 1 1 1
treating 1 0 0
stress 1 0 0
nlp 0 1 0
Your
to 0document
10 frequency:
create 0 1 0
and 0 1 0
will 0 1 0
be 0 1 0
making 0 1 0
now 0 1 1
cannot 0 0 1
replace 0 0 1
human 0 0 1
counsellors 0 0 1

Step 4: Get your inverse document frequency.

Word Document
Example 1 Document
of an inverse document 2 Document 3
frequency:
we 0.025 0.016 0
can 0.025 0.016 0
use 0.025and
aman 0.016are
anil 0
stressed went to a therapist download health chatbot
health 0 0 0
chatbots0 0 0
for 0.068 0 0
treating
3/2 0.068
3/1 3/2 0 3/1 0
3/1 3/2 3/2 3/2 3/1 3/1 3/1 3/1
stress 0.068 0 0
nlp 0 0.043 0
to 0 0.043 0
Your inverse
create 0 document
0.043frequency:
0
and 0 0.043 0
will 0 0.043 0
be 0 0.043 0
making 0 0.043 0
now 0 0.016 0.016
Step 5: Get your tf-idf
Example of a tf-idf:

After log operation:

Your tf-idf:

NLP Worksheet2222
No ratings yet
NLP Worksheet2222
10 pages
NLP Worksheet: Text Processing, Bag of Words and TF-IDF
100% (2)
NLP Worksheet: Text Processing, Bag of Words and TF-IDF
10 pages
NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity
No ratings yet
NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity
6 pages
NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity
No ratings yet
NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity
6 pages
NLP for Chatbot Development
No ratings yet
NLP for Chatbot Development
5 pages
Bag of Words Algorithm - Saanvi XC
No ratings yet
Bag of Words Algorithm - Saanvi XC
3 pages
Text
No ratings yet
Text
3 pages
Document Vector Table Question 2
No ratings yet
Document Vector Table Question 2
2 pages
Bag of Words
No ratings yet
Bag of Words
19 pages
Mental Health Conversation Chatbot
No ratings yet
Mental Health Conversation Chatbot
6 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
5 pages
DB Grammar Report Suhaib Seminar Report 1.0
No ratings yet
DB Grammar Report Suhaib Seminar Report 1.0
13 pages
NLP Notes CL 10
No ratings yet
NLP Notes CL 10
13 pages
Natural Language Processing
No ratings yet
Natural Language Processing
10 pages
Dupppppppppp
No ratings yet
Dupppppppppp
15 pages
Natural Language Processing Notes Class 10
No ratings yet
Natural Language Processing Notes Class 10
10 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
NLP Introduction
No ratings yet
NLP Introduction
35 pages
W Ith Support From
No ratings yet
W Ith Support From
73 pages
NLP Ai X
No ratings yet
NLP Ai X
6 pages
Combating Depression in Students Using An Intelligent ChatBot A Cognitive Behavioral Therapy
No ratings yet
Combating Depression in Students Using An Intelligent ChatBot A Cognitive Behavioral Therapy
4 pages
NLP Revision Notes and Applications
No ratings yet
NLP Revision Notes and Applications
4 pages
NLP Challenges & Techniques
No ratings yet
NLP Challenges & Techniques
45 pages
Natural Language Processing
No ratings yet
Natural Language Processing
70 pages
NLP Cheatsheet
100% (2)
NLP Cheatsheet
18 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Bag of Words Algorithm: Paragraph
No ratings yet
Bag of Words Algorithm: Paragraph
3 pages
Chatgpt Slides
100% (2)
Chatgpt Slides
112 pages
A Tutorial On: Linguistic Data Analysis
No ratings yet
A Tutorial On: Linguistic Data Analysis
99 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
NLP Applications and Chatbots Guide
No ratings yet
NLP Applications and Chatbots Guide
71 pages
NLP Revision Notes
No ratings yet
NLP Revision Notes
6 pages
NLP DeepNLP
No ratings yet
NLP DeepNLP
61 pages
Building A Simple Chatbot From Scratch in Python1
No ratings yet
Building A Simple Chatbot From Scratch in Python1
8 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
NLP Session 1
No ratings yet
NLP Session 1
4 pages
Chat Bot
No ratings yet
Chat Bot
3 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
13 pages
Artificial Intelligence - NLP
No ratings yet
Artificial Intelligence - NLP
32 pages
Module III
No ratings yet
Module III
42 pages
12.1. NLP Intro
No ratings yet
12.1. NLP Intro
53 pages
Natural Language Processing Guide
No ratings yet
Natural Language Processing Guide
16 pages
Module 8 - Text - Update
No ratings yet
Module 8 - Text - Update
42 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
14 pages
Natural Language Processing Notes
No ratings yet
Natural Language Processing Notes
5 pages
MindMate AI Model Training Guide - Complete Beginner's Tutorial
No ratings yet
MindMate AI Model Training Guide - Complete Beginner's Tutorial
41 pages
Grade 10 Unit 6 - Natural Language Processing
No ratings yet
Grade 10 Unit 6 - Natural Language Processing
33 pages
University Elective ChatGPT Assignments
No ratings yet
University Elective ChatGPT Assignments
8 pages
Final Summary NLP
No ratings yet
Final Summary NLP
446 pages
NLP Essentials for AI Enthusiasts
No ratings yet
NLP Essentials for AI Enthusiasts
33 pages
Ai NLP
No ratings yet
Ai NLP
21 pages
Chapter2 PDF
No ratings yet
Chapter2 PDF
22 pages
The Learning Chatbot
No ratings yet
The Learning Chatbot
12 pages
WWW Scribd
No ratings yet
WWW Scribd
1 page
ChatGPT Prompts For Studying
No ratings yet
ChatGPT Prompts For Studying
13 pages
Class X Unit VI Natural Language Processing
No ratings yet
Class X Unit VI Natural Language Processing
42 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
Maths Sec 2023-24
No ratings yet
Maths Sec 2023-24
6 pages
Image Kernels
No ratings yet
Image Kernels
1 page
Moral Machine
No ratings yet
Moral Machine
3 pages
2023 School Sports Day Highlights
No ratings yet
2023 School Sports Day Highlights
1 page
Bempong Kwasi Gyimah 5862816 Assignment 2
No ratings yet
Bempong Kwasi Gyimah 5862816 Assignment 2
8 pages
Organization of Data Using Graphs
No ratings yet
Organization of Data Using Graphs
1 page
Data Structures and Algorithms
No ratings yet
Data Structures and Algorithms
24 pages
Data Stream Sampling Techniques
No ratings yet
Data Stream Sampling Techniques
3 pages
Renormalisation in Quantum Field Theory
No ratings yet
Renormalisation in Quantum Field Theory
127 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
Cse3004 Design-Analysis-Of-Algorithm LT 1.0 1 Cse3004
No ratings yet
Cse3004 Design-Analysis-Of-Algorithm LT 1.0 1 Cse3004
2 pages
Image Classification Guide
No ratings yet
Image Classification Guide
12 pages
Modelling Mechanisms For Measurable and Detection Based On Artificial Intelligence
No ratings yet
Modelling Mechanisms For Measurable and Detection Based On Artificial Intelligence
6 pages
Direct and Inverse Variation Review
No ratings yet
Direct and Inverse Variation Review
1 page
Week 3c - Phylogenetic - Tree - ConstructionMai PDF
No ratings yet
Week 3c - Phylogenetic - Tree - ConstructionMai PDF
19 pages
Stata Practical Multilevel
No ratings yet
Stata Practical Multilevel
23 pages
Forecasting Stability Categories Using Neural Networks
No ratings yet
Forecasting Stability Categories Using Neural Networks
5 pages
Department of Electronics and Communication Engineering: Kuppam Engineering College, Kuppam-517425
No ratings yet
Department of Electronics and Communication Engineering: Kuppam Engineering College, Kuppam-517425
3 pages
OM-Chapter 5
No ratings yet
OM-Chapter 5
38 pages
Short Review of Tony Hutchins' Book "Modern Financial Computation"
No ratings yet
Short Review of Tony Hutchins' Book "Modern Financial Computation"
1 page
Report - Numerical Analysis
No ratings yet
Report - Numerical Analysis
7 pages
A Comparative Study of Existing Machine Learning Approaches For Parkinson's Disease Detection
No ratings yet
A Comparative Study of Existing Machine Learning Approaches For Parkinson's Disease Detection
12 pages
Module 3.4 Jacobian
No ratings yet
Module 3.4 Jacobian
1 page
Ak Mathematics Iii Unit 1
No ratings yet
Ak Mathematics Iii Unit 1
6 pages
Econometrics1 Syllabus Handout
No ratings yet
Econometrics1 Syllabus Handout
3 pages
500 Quadratic Equation Questions Worksheet
No ratings yet
500 Quadratic Equation Questions Worksheet
94 pages
Simulating A CRCW Algorithm With An EREW Algorithm: Efficient Parallel Algorithms COMP308
No ratings yet
Simulating A CRCW Algorithm With An EREW Algorithm: Efficient Parallel Algorithms COMP308
11 pages
Recurrent Convolutional Neural Networks For Text Classification
No ratings yet
Recurrent Convolutional Neural Networks For Text Classification
7 pages
Feature Extraction Identifying Condition Indicators With Matlab PDF
No ratings yet
Feature Extraction Identifying Condition Indicators With Matlab PDF
23 pages
Post-Quantum Lattice-Based Secure Reconciliation Enabled Key Agreement Protocol For IoT
No ratings yet
Post-Quantum Lattice-Based Secure Reconciliation Enabled Key Agreement Protocol For IoT
13 pages
AI Class 10 Sample Paper-1 - 2024
90% (10)
AI Class 10 Sample Paper-1 - 2024
7 pages
Deep Learning Unit I II MCQ
No ratings yet
Deep Learning Unit I II MCQ
2 pages
FLOWCHART Lecture
No ratings yet
FLOWCHART Lecture
7 pages
2022 Errachdi IntroToDigiC
No ratings yet
2022 Errachdi IntroToDigiC
30 pages

NLP - Worksheet Solved

Uploaded by

NLP - Worksheet Solved

Uploaded by

NLP Worksheet

Text processing, bag of words, tf-idf activity

Step 1: Sentence Segmentation

1 We can use health chatbots for treating stress.

Stopwords, special characters, and numbers

Step 4: Converting text to a common case

Stemming List out the stem words. Stem words

Lemmatization List out the root words/ lemma. Lemma

Congratulations, you’ve managed to process the data!

1 We can use health chatbots for treating stress

3 Health chatbots cannot replace human counsellors now

Step 2: Create dictionary

Step 3:Create document vectors

Step 4: Get your inverse document frequency.

After log operation:

You might also like