NLPEXP3

Uploaded by

Saif Madre

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views3 pages

NLPEXP3

Uploaded by

Saif Madre

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

EXPERIMENT NO: 3

Aim: Apply various other text preprocessing techniques for any given text: Stop
Word Removal, Lemmatization / Stemming.

I. Abstract
This experiment demonstrates fundamental text preprocessing techniques in Natural Language
Processing (NLP), specifically focusing on stop word removal, lemmatization, and stemming.
These techniques are vital for cleaning and simplifying textual data before applying advanced
analysis or machine learning models. Stop word removal filters out common, insignificant
words, thereby reducing noise in the text. Lemmatization converts words to their dictionary base
form by considering grammatical context, while stemming reduces words to their root form
using heuristic rules. By applying these methods to a sample sentence, the experiment highlights
how each technique impacts tokenized text, aiding in better understanding and representation of
data. These preprocessing steps are essential in enhancing the accuracy and efficiency of NLP
systems.

II. Introduction
1. Stop Word Removal: Stop words are commonly used words in a language that carry little
semantic meaning on their own, such as "the," "is," "in," "and," etc. Removing stop words from a
text can reduce the size of the data and focus on the more significant words that carry the
essential meaning. This step is crucial in text preprocessing, especially in natural language
processing (NLP), as it helps in simplifying the data and improving the efficiency of machine
learning models.

2. Lemmatization: Lemmatization is a text normalization technique that reduces words to their

base or root form, known as the lemma. For instance, the words "running," "ran," and "runs"
would all be reduced to the lemma "run." Lemmatization considers the context of the word and
uses a dictionary to find the correct base form. This process helps in reducing the complexity of
the text data and ensuring that different forms of a word are treated as a single entity, thus
improving the accuracy of NLP tasks.

3. Stemming: Stemming is another text normalization technique that reduces words to their root
form by removing suffixes. Unlike lemmatization, stemming does not consider the context or use
a dictionary, and therefore, it may produce non-existent words. For example, "running,"
"runner," and "ran" might all be reduced to "run" or "runn." While stemming is faster and
simpler than lemmatization, it is often less accurate.
II. Implementation

import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer

# Download necessary NLTK data (run once)

# nltk.download('punkt')
# nltk.download('punkt_tab')
# nltk.download('stopwords')
# nltk.download('wordnet')

def preprocess_text(text):
print(f"Sentence= \"{text}\"")

# Tokenization with explicit language

tokens = word_tokenize(text, language='english')
print(f"Tokens: {tokens}")

# Stopword removal
stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word.lower() not in
stop_words and word.isalpha()]
print(f"Stopword removal: {filtered_tokens}")

# Lemmatization
lemmatizer = WordNetLemmatizer()
lemmatized_tokens = [lemmatizer.lemmatize(word) for word in
filtered_tokens]
print(f"Lemmatization: {lemmatized_tokens}")

# Stemming
stemmer = PorterStemmer()
stemmed_tokens = [stemmer.stem(word) for word in filtered_tokens]
print(f"Stemming: {stemmed_tokens}")

# Example text in plural tense

text = "How can a clam cram in a clean cream can?"
preprocess_text(text)
Output:

II. Conclusion

Applying stop word removal, lemmatization, and stemming are essential preprocessing
techniques in NLP that help simplify and normalize text data. Stop word removal eliminates
irrelevant words, while lemmatization and stemming reduce words to their base forms, making
the text easier to analyze and improving the performance of machine learning models. These
techniques are foundational for preparing text data for further analysis or model training.

NLP Applications and Preprocessing
No ratings yet
NLP Applications and Preprocessing
56 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
Module 1 Updated Final
No ratings yet
Module 1 Updated Final
45 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
7 pages
Bucket Bag
100% (1)
Bucket Bag
8 pages
AMLTA
No ratings yet
AMLTA
17 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
4 pages
NLP Core Using NLTK: Dr. Muhammad Nouman Durrani
No ratings yet
NLP Core Using NLTK: Dr. Muhammad Nouman Durrani
42 pages
NLP - Exp 1 11
No ratings yet
NLP - Exp 1 11
29 pages
Web and Social Media Analytics Lab
No ratings yet
Web and Social Media Analytics Lab
34 pages
Token Ization
No ratings yet
Token Ization
5 pages
Ir Manual
No ratings yet
Ir Manual
53 pages
NLP Lab1
No ratings yet
NLP Lab1
2 pages
NLP Exp2
No ratings yet
NLP Exp2
2 pages
Natural Language Pre-Processing: Prepared By: Syed Afroz Ali
No ratings yet
Natural Language Pre-Processing: Prepared By: Syed Afroz Ali
81 pages
NLP Record
No ratings yet
NLP Record
23 pages
Text Preprocessing
No ratings yet
Text Preprocessing
3 pages
LEED AP ID+C Candidate Handbook
No ratings yet
LEED AP ID+C Candidate Handbook
32 pages
Text Preprocessing For NLP
No ratings yet
Text Preprocessing For NLP
15 pages
NLP Preprocessing Steps
No ratings yet
NLP Preprocessing Steps
20 pages
NLP Exp 2
No ratings yet
NLP Exp 2
4 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
11 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
NLP Lab Manual - Final
No ratings yet
NLP Lab Manual - Final
15 pages
NLP Preprocessing Steps 1740444240
No ratings yet
NLP Preprocessing Steps 1740444240
20 pages
NLP Techniques for Students
No ratings yet
NLP Techniques for Students
55 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
123 NLP 456
No ratings yet
123 NLP 456
4 pages
NLP Exp 4 - B707
No ratings yet
NLP Exp 4 - B707
4 pages
NLP Lab Codes Till Mod3
No ratings yet
NLP Lab Codes Till Mod3
7 pages
Lab 2
No ratings yet
Lab 2
4 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
21 pages
CS-875-Lecture 4
No ratings yet
CS-875-Lecture 4
47 pages
NLP Lab 2
No ratings yet
NLP Lab 2
6 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
19 pages
Adnan Amin
No ratings yet
Adnan Amin
19 pages
AFES English Manual
100% (7)
AFES English Manual
290 pages
NLP Notebook
No ratings yet
NLP Notebook
20 pages
ANLP semVI Labmanual
No ratings yet
ANLP semVI Labmanual
33 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
NLP Experiment 1
No ratings yet
NLP Experiment 1
13 pages
NLB Final Lab Manual
No ratings yet
NLB Final Lab Manual
23 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
54 pages
NLP Exp 3
No ratings yet
NLP Exp 3
24 pages
Viva Questions
No ratings yet
Viva Questions
6 pages
NLP Exp-123
No ratings yet
NLP Exp-123
6 pages
Text Preprocessing & NLTK Guide
No ratings yet
Text Preprocessing & NLTK Guide
8 pages
NLP 3-6
No ratings yet
NLP 3-6
20 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
Aiml P4
No ratings yet
Aiml P4
12 pages
VO - MCA - SEM 4 - Text Mining - U2
No ratings yet
VO - MCA - SEM 4 - Text Mining - U2
15 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
SL-3 - Assignment No 7
No ratings yet
SL-3 - Assignment No 7
14 pages
NLP Practical
No ratings yet
NLP Practical
27 pages
Date: Practical No.4:: Foundation of AI and ML (4351601)
No ratings yet
Date: Practical No.4:: Foundation of AI and ML (4351601)
10 pages
Experiment 3 Manual
No ratings yet
Experiment 3 Manual
7 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
Kappu Potet'o Brief Background of The Business
No ratings yet
Kappu Potet'o Brief Background of The Business
3 pages
Chapter-1 Introduction To NLP
No ratings yet
Chapter-1 Introduction To NLP
12 pages
Declaration of Trust
83% (6)
Declaration of Trust
3 pages
Emerging Trends in Sales Management
100% (7)
Emerging Trends in Sales Management
14 pages
Polity (Articles Compilation June2024-Jan2025) M IE Explained - All Subjects (Dec 2025)
No ratings yet
Polity (Articles Compilation June2024-Jan2025) M IE Explained - All Subjects (Dec 2025)
23 pages
School Space Allocation Guide
No ratings yet
School Space Allocation Guide
5 pages
Malayala Manorama Company Limited
100% (2)
Malayala Manorama Company Limited
31 pages
Property Dispute: No Forgery Found
No ratings yet
Property Dispute: No Forgery Found
1 page
VoLTE (Voice Over LTE)
No ratings yet
VoLTE (Voice Over LTE)
15 pages
HIRA Night Works
No ratings yet
HIRA Night Works
13 pages
Active Fire Protectiondocx
No ratings yet
Active Fire Protectiondocx
51 pages
Cafe
No ratings yet
Cafe
25 pages
Exp 01 MP
No ratings yet
Exp 01 MP
1 page
Dr. Naheed Zamani Clinic Lahore - Top Doctors, Fees, Contact Number
No ratings yet
Dr. Naheed Zamani Clinic Lahore - Top Doctors, Fees, Contact Number
1 page
File Chinh Thuc - HSG 2020 - Vòng 2
No ratings yet
File Chinh Thuc - HSG 2020 - Vòng 2
17 pages
Understanding Resistance to Change
No ratings yet
Understanding Resistance to Change
19 pages
The Sentinels
No ratings yet
The Sentinels
11 pages
Dark We Bits Impact On The Internet and Done
No ratings yet
Dark We Bits Impact On The Internet and Done
10 pages
Industrial Reseller Pricelist July-2023
No ratings yet
Industrial Reseller Pricelist July-2023
1 page
EXPERIMENT 1 - Dark Web - A Study On Its Structure, Risks, and Societal Impact (1) (1) Final
No ratings yet
EXPERIMENT 1 - Dark Web - A Study On Its Structure, Risks, and Societal Impact (1) (1) Final
3 pages
Experiment No 2
No ratings yet
Experiment No 2
6 pages
Experiment No 2
No ratings yet
Experiment No 2
6 pages
Experiment No 2
No ratings yet
Experiment No 2
6 pages
Experiment No 2
No ratings yet
Experiment No 2
6 pages
CRD-L: Direct Acting Pressure Reducing Valve
No ratings yet
CRD-L: Direct Acting Pressure Reducing Valve
4 pages
Non Paper Asylum Policy
No ratings yet
Non Paper Asylum Policy
2 pages
Pci Leasing and Finance
No ratings yet
Pci Leasing and Finance
6 pages
Financial Systems & Cheque Clearing
No ratings yet
Financial Systems & Cheque Clearing
5 pages
C Node Presen3.0
No ratings yet
C Node Presen3.0
33 pages
FELICIANO MALIWAT, Petitioner, vs. HON. COURT OF APPEALS, Former Special First Division, and The REPUBLIC OF THE PHILIPPINES, Respondents
100% (1)
FELICIANO MALIWAT, Petitioner, vs. HON. COURT OF APPEALS, Former Special First Division, and The REPUBLIC OF THE PHILIPPINES, Respondents
7 pages
Saif DH
No ratings yet
Saif DH
2 pages
Heavy Vehicle Tire Safety Guide
No ratings yet
Heavy Vehicle Tire Safety Guide
12 pages
BRS Embryology 6th Edition by Ronald W. Dudek ISBN 1469873702 9781469873701 - Get Instant Access To The Full Ebook Content
100% (18)
BRS Embryology 6th Edition by Ronald W. Dudek ISBN 1469873702 9781469873701 - Get Instant Access To The Full Ebook Content
68 pages
Rugved DH
No ratings yet
Rugved DH
3 pages
Saif Rsa1
No ratings yet
Saif Rsa1
3 pages
Exp 7
No ratings yet
Exp 7
20 pages
Ruzz Exp 2
No ratings yet
Ruzz Exp 2
4 pages
Peluang Kewirausahaan AUC 0324 Samarinda
No ratings yet
Peluang Kewirausahaan AUC 0324 Samarinda
19 pages
Compression: DMET501 - Introduction To Media Engineering
No ratings yet
Compression: DMET501 - Introduction To Media Engineering
26 pages
Rajneeti: Council of Ministers S. No. Name Department Office
No ratings yet
Rajneeti: Council of Ministers S. No. Name Department Office
20 pages
Bhatti 062014
No ratings yet
Bhatti 062014
41 pages
Local Media7707301369137256841
No ratings yet
Local Media7707301369137256841
33 pages
US NAVY Aeromedical Reference and Waiver Guide-2014
No ratings yet
US NAVY Aeromedical Reference and Waiver Guide-2014
317 pages

NLPEXP3

Uploaded by

NLPEXP3

Uploaded by

EXPERIMENT NO: 3

2. Lemmatization: Lemmatization is a text normalization technique that reduces words to their

# Download necessary NLTK data (run once)

# Tokenization with explicit language

# Example text in plural tense

You might also like