0% found this document useful (0 votes)

335 views6 pages

NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity

The document provides instructions for performing various natural language processing (NLP) tasks on a corpus of 3 health-related documents, including sentence segmentation, tokenization, removing stopwords/special characters, stemming, lemmatization, bag-of-words modeling, and calculating term frequency-inverse document frequency (tf-idf). The tasks are broken down into clear steps but no examples or results are provided.

Uploaded by

Gayathri . M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

335 views6 pages

NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity

Uploaded by

Gayathri . M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

NLP Worksheet

Text processing, bag of words, tf-idf activity

Suppose you have obtained these information and you would like to analyse it. Let’s start by making it ready for the
computer!

Corpus
Document 1: We can use health chatbots for treating stress.
Document 2: We can use NLP to create chatbots and we will be making health chatbots now!
Document 3: Health Chatbots cannot replace human counsellors now. Yay >< !! @1nteLA!4Y

Step 1: Sentence Segmentation

No. Sentence

Step 2: Tokenization
Separate your sentences into tokens. How many tokens do you have?

Tokens

Number of tokens: ________

Step 3: Remove stopwords, special characters, numbers
List out the stopwords, special characters, and numbers that you want to remove!

Stopwords, special characters, and numbers

Step 4: Converting text to a common case

Which text do you need to modify? What is the modified form?

Modified form

Step 5: Stemming
List out the stem words.

Stem words

Step 6: Lemmatization
List out the root words/ lemma.

Lemma
Final data
List out the final, processed data.

Processed data

Congratulations, you’ve managed to process the data!

Bag of words
Step 1: Collect data and process it
For this exercise, we can use the sentences without processing it so that it is easier for us to read the sentences.

No. Sentence

1 We can use health chatbots for treating stress

2 We can use NLP to create chatbots and we will be making health chatbots now

3 Health chatbots cannot replace human counsellors now

Step 2: Create dictionary

Make a list of all the different words in the text.

Dictionary

Step 3:Create document vectors

Use the next page to create your document vector!
Tf-idf
You’ve obtained your bag of words. Now let’s continue with the tf-idf!

Step 1 - 3: Count the number of documents where the word appears at least once & write that
number down next to the word in your vocabulary to get your document frequency. Draw your
own table for this!

Example of a document frequency:

aman and Anil are stressed went to a therapist download health chatbot

2 1 2 1 1 2 2 2 1 1 1 1

Your document frequency:

Step 4: Get your inverse document frequency.

Example of an inverse document frequency:

aman and anil are stressed went to a therapist download health chatbot

3/2 3/1 3/2 3/1 3/1 3/2 3/2 3/2 3/1 3/1 3/1 3/1

Your inverse document frequency:

Step 5: Get your tf-idf
Example of a tf-idf:

After log operation:

Your tf-idf:

Lecture Note
No ratings yet
Lecture Note
163 pages
NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
100% (1)
NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
86 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity
No ratings yet
NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity
6 pages
NLP for Chatbot Development
No ratings yet
NLP for Chatbot Development
5 pages
Unit 1
No ratings yet
Unit 1
99 pages
Nlp-Unit-I Final
No ratings yet
Nlp-Unit-I Final
31 pages
NLP Notes
No ratings yet
NLP Notes
203 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
NLP Unit Iv
No ratings yet
NLP Unit Iv
24 pages
Salesforce Developer Interview Q&A
No ratings yet
Salesforce Developer Interview Q&A
8 pages
Revit Structure 4 User Guide
No ratings yet
Revit Structure 4 User Guide
728 pages
(Ebook) Speech and Language Processing: An Introduction To Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky, James H. Martin Download
100% (1)
(Ebook) Speech and Language Processing: An Introduction To Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky, James H. Martin Download
80 pages
910-6854-001 Rev B PDF
No ratings yet
910-6854-001 Rev B PDF
22 pages
Software Architecture 2019
No ratings yet
Software Architecture 2019
295 pages
Cloudscheduling Backfills
No ratings yet
Cloudscheduling Backfills
19 pages
NLP MODULE 1 Chapter1 &2
100% (1)
NLP MODULE 1 Chapter1 &2
83 pages
Linguistics & NLP: Morphology Basics
No ratings yet
Linguistics & NLP: Morphology Basics
14 pages
? Class 12 Python Notes
No ratings yet
? Class 12 Python Notes
5 pages
NLP Lab Tasks for Students
No ratings yet
NLP Lab Tasks for Students
16 pages
Cyber Security Trends in Modern Automobile Industry/sector
100% (1)
Cyber Security Trends in Modern Automobile Industry/sector
51 pages
Wolf Crypt
No ratings yet
Wolf Crypt
1 page
19CSE453 - Natural Language Processing: Part of Speech Tagging
No ratings yet
19CSE453 - Natural Language Processing: Part of Speech Tagging
59 pages
NLP MQP Solved
No ratings yet
NLP MQP Solved
26 pages
Semantics, Pragmatics, and Logic
No ratings yet
Semantics, Pragmatics, and Logic
105 pages
NLP Question Bank: Chapter-Wise Practice Problems With Solutions
No ratings yet
NLP Question Bank: Chapter-Wise Practice Problems With Solutions
45 pages
16.tuple in Python
No ratings yet
16.tuple in Python
6 pages
NLP UNIT 5 Part B
100% (2)
NLP UNIT 5 Part B
31 pages
Mobile Computing Thesis PDF
100% (2)
Mobile Computing Thesis PDF
4 pages
Lemmatization Stemming Presentation
No ratings yet
Lemmatization Stemming Presentation
11 pages
NLP Question Bank Answers (Jagmeet)
No ratings yet
NLP Question Bank Answers (Jagmeet)
31 pages
Resume MayurVora
No ratings yet
Resume MayurVora
3 pages
AI Adoption Frameworks
No ratings yet
AI Adoption Frameworks
5 pages
Restaurant Delivery Challenges
No ratings yet
Restaurant Delivery Challenges
2 pages
Question Bank - NLP
No ratings yet
Question Bank - NLP
3 pages
1 Intro To NLP
100% (1)
1 Intro To NLP
46 pages
NLP-1 (Tokenization)
100% (1)
NLP-1 (Tokenization)
10 pages
Service News 12 Liasys New ENGL
No ratings yet
Service News 12 Liasys New ENGL
5 pages
Chapter 6
100% (1)
Chapter 6
28 pages
NLP Sem Imp
No ratings yet
NLP Sem Imp
46 pages
2-Regular Expressions, Text Normalization, Edit Distance
No ratings yet
2-Regular Expressions, Text Normalization, Edit Distance
42 pages
12 IT Sample Question Papper 01
No ratings yet
12 IT Sample Question Papper 01
3 pages
Word Embedding & Language Modelling
No ratings yet
Word Embedding & Language Modelling
111 pages
Unit 3 Notes UDS23201J Query Processing
No ratings yet
Unit 3 Notes UDS23201J Query Processing
38 pages
Project PPT
No ratings yet
Project PPT
47 pages
M.Tech NLP Course Overview
No ratings yet
M.Tech NLP Course Overview
2 pages
Introduction To NLP: Natural Language Processing
No ratings yet
Introduction To NLP: Natural Language Processing
21 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
33 pages
NLP
No ratings yet
NLP
2 pages
NLP Unit-V
No ratings yet
NLP Unit-V
30 pages
IS 7118 Unit1 Introduction
No ratings yet
IS 7118 Unit1 Introduction
58 pages
Intro to Topic Modeling
No ratings yet
Intro to Topic Modeling
120 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
103 pages
2 Marks
No ratings yet
2 Marks
11 pages
NLP Assignment Answer
No ratings yet
NLP Assignment Answer
4 pages
Neuron XT Compressor Manual
No ratings yet
Neuron XT Compressor Manual
3 pages
6CS4 AI Unit-5
No ratings yet
6CS4 AI Unit-5
65 pages
IS 7118 Unit-5 POS Tagging
No ratings yet
IS 7118 Unit-5 POS Tagging
89 pages
IS 7118 Unit-9 Semantics
No ratings yet
IS 7118 Unit-9 Semantics
82 pages
Langauage Model
No ratings yet
Langauage Model
148 pages
Unit-1 Aim 502
No ratings yet
Unit-1 Aim 502
15 pages
Early Detection of Lung Cancer Using AI and ML
No ratings yet
Early Detection of Lung Cancer Using AI and ML
6 pages
Bhawini NLP File
No ratings yet
Bhawini NLP File
100 pages
Unit-2 Aim 502
No ratings yet
Unit-2 Aim 502
6 pages
R&D Project Proposal Submission Guide
No ratings yet
R&D Project Proposal Submission Guide
7 pages
NLP End Sem Paper - Evaluation Scheme
No ratings yet
NLP End Sem Paper - Evaluation Scheme
14 pages
CC Unit 4 MCQ
100% (1)
CC Unit 4 MCQ
10 pages
Lecture 1: Introduction To NLP: Understand Concepts Applications
No ratings yet
Lecture 1: Introduction To NLP: Understand Concepts Applications
32 pages
Electrocount lcr2
No ratings yet
Electrocount lcr2
60 pages
Unit-8: Natural Language: Processing
No ratings yet
Unit-8: Natural Language: Processing
16 pages
Question Bank
No ratings yet
Question Bank
13 pages
NLP Unit 1 Answers
No ratings yet
NLP Unit 1 Answers
7 pages
Word Sense Disambiguation: by Under The Guidance of
No ratings yet
Word Sense Disambiguation: by Under The Guidance of
99 pages
NLP Course for Students
No ratings yet
NLP Course for Students
25 pages
COM284 - Computer Organization Project
No ratings yet
COM284 - Computer Organization Project
4 pages
Firewall Report 45
No ratings yet
Firewall Report 45
9 pages
NLP Unit 5
No ratings yet
NLP Unit 5
10 pages
General Instructions:: Section A: Objective Type Questions
No ratings yet
General Instructions:: Section A: Objective Type Questions
8 pages
Integrating Images and External Materials
No ratings yet
Integrating Images and External Materials
2 pages
Practical Exam STD 12
No ratings yet
Practical Exam STD 12
4 pages
CrewAI Vs LangChain - The Clash of AI Titans in The LLM Arena - by Cogni Down Under - Nov, 2024 - Medium
No ratings yet
CrewAI Vs LangChain - The Clash of AI Titans in The LLM Arena - by Cogni Down Under - Nov, 2024 - Medium
13 pages
Bba 3 - Iit - U1
No ratings yet
Bba 3 - Iit - U1
5 pages
Asterisk vs. ShoreTel vs. Cisco PBX Comparison
No ratings yet
Asterisk vs. ShoreTel vs. Cisco PBX Comparison
4 pages
10G SFP+ Switch Quickstart Guide
No ratings yet
10G SFP+ Switch Quickstart Guide
6 pages
NLP Assignment 2
No ratings yet
NLP Assignment 2
2 pages
Mis-406 Assignment
No ratings yet
Mis-406 Assignment
8 pages
NLP & Word Vectors: SVD and Word2Vec
No ratings yet
NLP & Word Vectors: SVD and Word2Vec
14 pages
M.Tech CSE Distributed Computing Lab Report
No ratings yet
M.Tech CSE Distributed Computing Lab Report
58 pages
Solutions To NLP I Mid Set A
100% (1)
Solutions To NLP I Mid Set A
8 pages

NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity

Uploaded by

NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity

Uploaded by

NLP Worksheet

Text processing, bag of words, tf-idf activity

Step 1: Sentence Segmentation

Number of tokens: ________

Stopwords, special characters, and numbers

Step 4: Converting text to a common case

Congratulations, you’ve managed to process the data!

1 We can use health chatbots for treating stress

3 Health chatbots cannot replace human counsellors now

Step 2: Create dictionary

Step 3:Create document vectors

Example of a document frequency:

Your document frequency:

Step 4: Get your inverse document frequency.

Your inverse document frequency:

After log operation:

You might also like