0% found this document useful (0 votes)

22 views15 pages

NLP 1 Week Tutorial NLTK

The document provides an overview of Natural Language Processing (NLP) and its applications, highlighting the use of Python's NLTK library for various NLP tasks such as text preprocessing, tokenization, stemming, lemmatization, and text classification. It also covers advanced concepts like TF-IDF, Word2Vec, and sentiment analysis, explaining their significance and implementation. Additionally, it lists popular NLP libraries and tools, emphasizing the importance of understanding human language data for computational tasks.

Uploaded by

Dharna Ahuja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views15 pages

NLP 1 Week Tutorial NLTK

Uploaded by

Dharna Ahuja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

NLP with Python using NLTK

What is NLP (Natural Language Processing)?

• NLP (Natural Language Processing) is a field at the intersection of
computer science, artificial intelligence (AI), and linguistics. It enables
computers to understand, interpret, and generate human language.
• You’ve used NLP if you've:
– Spoken to Alexa, Siri, or Google Assistant
– Typed something and used autocorrect or autocomplete
– Seen spam filters in your email
– Used chatbots or language translation tools
• Popular NLP libraries:
– NLTK (Natural Language Toolkit) – beginner-friendly
– spaCy – fast and industrial-strength
– TextBlob – simple, useful for sentiment analysis
– transformers (by HuggingFace) – for deep learning-based NLP (e.g., BERT, GPT)
What is NLTK?
• NLTK (Natural Language Toolkit) is a powerful Python library
used for working with human language data (text). It provides
easy-to-use tools and resources to process, analyze, and
understand natural language.
Text Preprocessing
• Install NLTK: pip install nltk
• import nltk
• nltk.download('punkt')
• nltk.download('stopwords')
• nltk.download('wordnet’)

• Tokenization: Breaking text into words or sentences.

• Stopwords: Common words (like "the", "is") that are removed before
analysis
import nltk
nltk.download('punkt')
nltk.download('stopwords')
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

text = "NLP is fun and powerful!"

tokens = word_tokenize(text)
filtered = [w for w in tokens if w.lower() not in stopwords.words('english')]
print(filtered)
This removes unimportant words so that your analysis focuses on meaningful
content.
Tokenization & Stopwords
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

text = "NLTK is a powerful Python library for NLP."

tokens = word_tokenize(text)
filtered = [word for word in tokens if word.lower() not in
stopwords.words('english')]
print(filtered)
Stemming & Lemmatization
• Stemming: Strips suffixes ("playing" → "play").
• Lemmatization: Reduces to dictionary form ("better" → "good").
• Both are used to normalize text. Lemmatization is more accurate but
slower.
• from nltk.stem import PorterStemmer, WordNetLemmatizer
• nltk.download('wordnet')

• stemmer = PorterStemmer()
• print(stemmer.stem("playing")) # play

• lemmatizer = WordNetLemmatizer()
• print(lemmatizer.lemmatize("playing", pos='v')) # play
POS Tagging & Named Entity Recognition
• POS Tagging: Labels each word (noun, verb, etc.)
• NER: Detects entities like names and places.
• nltk.download('averaged_perceptron_tagger')
• nltk.download('maxent_ne_chunker')
• nltk.download('words')

• sentence = "Steve Jobs founded Apple in California."

• tokens = word_tokenize(sentence)
• tags = nltk.pos_tag(tokens)
• ner_tree = nltk.ne_chunk(tags)
• print(ner_tree)
• This helps in identifying structure and important entities in a
sentence.
Text Classification (Naive Bayes)

• Text Classification: Predicts labels for input text (e.g.,

sentiment).
• Naive Bayes: A simple probabilistic classifier.
• from nltk.classify import NaiveBayesClassifier

def format_sentence(sent):
return {'text': sent.lower()}

train = [(format_sentence("I love this movie"), 'pos'),

(format_sentence("I hate this product"), 'neg')]

classifier = NaiveBayesClassifier.train(train)
print(classifier.classify(format_sentence("love product")))
TF-IDF (Term Frequency-Inverse Document Frequency)

• TF-IDF is a statistical measure used to evaluate how important

a word is in a document relative to a collection of documents
(called a corpus).
• Formula:
• TF-IDF(t,d)=TF(t,d)×IDF(t)
• TF (Term Frequency): How often term t appears in document d
– TF(t,d)=Total terms in d/No. of times t appears in d
– Repeats as long as a condition is True.
• IDF (Inverse Document Frequency): How rare the term is across all
documents
– IDF(t)=log(Number of documents with term t/
Total number of documents))
Why Use TF-IDF?

• Words like “the”, “is”, “and” appear in all documents and carry little
meaning.
• TF-IDF downweights common words and upweights rare, important
ones.
• Example:
• If the word “excellent” appears 3 times in a review but rarely in other
reviews, it will get a high TF-IDF score, showing it's significant for that
specific document.
– from sklearn.feature_extraction.text import TfidfVectorizer

– docs = ["I love NLP", "NLP is fun and useful", "I love machine learning"]
– vectorizer = TfidfVectorizer()
– tfidf_matrix = vectorizer.fit_transform(docs)

– print(tfidf_matrix.toarray())
– print(vectorizer.get_feature_names_out())
Word2Vec

• Word2Vec is a technique to convert words into vectors

(numbers) so that a machine can understand their meaning
based on context. It’s used in NLP for tasks like similarity
detection, text classification, and more.
• Word2Vec trains a shallow neural network to learn word
embeddings using one of two models:
– CBOW (Continuous Bag of Words) – Predicts a word from its
surrounding context.
– Skip-Gram – Predicts context from the target word (works better with
small data).
– Words with similar meanings end up having similar vectors.
Install Required Library

• pip install genism

from gensim.models import Word2Vec

# Example corpus
sentences = [
["i", "love", "nlp"],
["nlp", "is", "fun"],
["i", "enjoy", "machine", "learning"]
]

# Train Word2Vec model

model = Word2Vec(sentences, vector_size=50, window=2, min_count=1, sg=1) # sg=1 uses skip-gram

# Get vector for a word

print("Vector for 'nlp':")
print(model.wv['nlp'])

# Find similar words

print("\nWords similar to 'nlp':")
print(model.wv.most_similar('nlp'))
Explanation of Parameters

• vector_size: Dimension of word embeddings (usually 50–300)

• window: Context window size (how many words to the left/right
to consider)
• min_count: Ignores words that appear less than this number
• sg: 1 for skip-gram, 0 for CBOW
• After training, model.wv['king'] -
model.wv['man'] + model.wv['woman'] gives a
vector close to 'queen’.
• Why Use Word2Vec?
– Captures semantic relationships between words.
– Great for text classification, sentiment analysis,
chatbot development, etc.
Sentiment Analysis

• Sentiment Analysis is the process of

identifying and classifying emotions or
opinions in text — typically as:
– Positive
– Negative
– Neutral
• It's widely used in:
– Product reviews
– Social media monitoring
– Customer feedback analysis

NLP Lab
No ratings yet
NLP Lab
63 pages
NLP Notes Unit 1
No ratings yet
NLP Notes Unit 1
179 pages
NLP Pipeline
No ratings yet
NLP Pipeline
58 pages
Natural Language Processing in Data Science
No ratings yet
Natural Language Processing in Data Science
7 pages
NLP Core Using NLTK: Dr. Muhammad Nouman Durrani
No ratings yet
NLP Core Using NLTK: Dr. Muhammad Nouman Durrani
42 pages
21ML1601 NLP QB
No ratings yet
21ML1601 NLP QB
34 pages
NLP Applications and Preprocessing
No ratings yet
NLP Applications and Preprocessing
56 pages
NLP 160709201345
No ratings yet
NLP 160709201345
61 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
Module 1 Updated Final
No ratings yet
Module 1 Updated Final
45 pages
AP For NLP-LO1
No ratings yet
AP For NLP-LO1
61 pages
Natural Language Processing Tasks
No ratings yet
Natural Language Processing Tasks
5 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
21 pages
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
No ratings yet
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
83 pages
Natural Language Processing: Presented By
No ratings yet
Natural Language Processing: Presented By
22 pages
NLP Midsem Paper Jan 2024 Regular Exam
No ratings yet
NLP Midsem Paper Jan 2024 Regular Exam
4 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
NLTK Package Training
No ratings yet
NLTK Package Training
17 pages
AP For NLP-Word 2 Vec
No ratings yet
AP For NLP-Word 2 Vec
33 pages
Lecture 8 - Text Analytics NLP
No ratings yet
Lecture 8 - Text Analytics NLP
24 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
19 pages
Search Engine
No ratings yet
Search Engine
42 pages
Natural Language Processing
No ratings yet
Natural Language Processing
19 pages
TextFeatureEnginerring-NLP Lec2
No ratings yet
TextFeatureEnginerring-NLP Lec2
60 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
Statistical Learning and Text Classification With NLTK and Scikit-Learn
No ratings yet
Statistical Learning and Text Classification With NLTK and Scikit-Learn
24 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
Module III
No ratings yet
Module III
42 pages
NLP DeepNLP
No ratings yet
NLP DeepNLP
61 pages
Natural Language Processing With Python
No ratings yet
Natural Language Processing With Python
7 pages
NLP Record
No ratings yet
NLP Record
16 pages
BLC 2 BLC 1nlp12erged
No ratings yet
BLC 2 BLC 1nlp12erged
11 pages
NLP PDF
No ratings yet
NLP PDF
3 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
NLP Record300
No ratings yet
NLP Record300
24 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
NLP Notebook
No ratings yet
NLP Notebook
20 pages
Natural Language Processing With Python's NLTK Package - Real Python
No ratings yet
Natural Language Processing With Python's NLTK Package - Real Python
27 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
NLP Text Classification Week4
No ratings yet
NLP Text Classification Week4
26 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
No ratings yet
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
37 pages
Big Data Analytics Chap 11
No ratings yet
Big Data Analytics Chap 11
8 pages
NLP 101 - Machine Learning Seminar 2017
100% (1)
NLP 101 - Machine Learning Seminar 2017
30 pages
NLP Techniques for Students
No ratings yet
NLP Techniques for Students
55 pages
Building A Simple Chatbot From Scratch in Python1
No ratings yet
Building A Simple Chatbot From Scratch in Python1
8 pages
NLP Lab Manual-1
No ratings yet
NLP Lab Manual-1
18 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
NLP 9
No ratings yet
NLP 9
44 pages
Mining The Web Searching and Integration
No ratings yet
Mining The Web Searching and Integration
5 pages
Intro To Natural Language Processing (NLP)
No ratings yet
Intro To Natural Language Processing (NLP)
13 pages
NLTK Cheatsheet
No ratings yet
NLTK Cheatsheet
27 pages
NLP Prep
No ratings yet
NLP Prep
14 pages
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
No ratings yet
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
11 pages
Minorproject Ishant
No ratings yet
Minorproject Ishant
18 pages
NLP & Machine Learning Techniques Guide
No ratings yet
NLP & Machine Learning Techniques Guide
8 pages
NLP Tutorial with Python NLTK
No ratings yet
NLP Tutorial with Python NLTK
19 pages
A Guide To Text Classification (NLP)
No ratings yet
A Guide To Text Classification (NLP)
17 pages
ML7 - Text Classification
No ratings yet
ML7 - Text Classification
13 pages
A Tutorial On: Linguistic Data Analysis
No ratings yet
A Tutorial On: Linguistic Data Analysis
99 pages
Empowering Youth: Skill India Impact
No ratings yet
Empowering Youth: Skill India Impact
6 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Automated Chatbot Implemented Using Natural Language Processing PDF
No ratings yet
Automated Chatbot Implemented Using Natural Language Processing PDF
5 pages
Feature Extraction Techniques in NLP
No ratings yet
Feature Extraction Techniques in NLP
10 pages
All Paths Lead To Philosophy: Dmitriy Brezhnev, Stephen Trushiem, Vikas Yendluri
No ratings yet
All Paths Lead To Philosophy: Dmitriy Brezhnev, Stephen Trushiem, Vikas Yendluri
11 pages
A Project Report: in Partial Fulfillment For The Award of The Degree
No ratings yet
A Project Report: in Partial Fulfillment For The Award of The Degree
50 pages
CS 3308 Learning Journal 4
No ratings yet
CS 3308 Learning Journal 4
3 pages
Lucene Solr
No ratings yet
Lucene Solr
52 pages
Spam Detection
No ratings yet
Spam Detection
39 pages
Ir Mod4 Notes
No ratings yet
Ir Mod4 Notes
19 pages
Computational Law for Scholars
100% (1)
Computational Law for Scholars
41 pages
Aspect Based Sentiment Analysis in Urdu Language: Resource Creation and Evaluation
No ratings yet
Aspect Based Sentiment Analysis in Urdu Language: Resource Creation and Evaluation
17 pages
Paper 15
No ratings yet
Paper 15
6 pages
Text Analytics
No ratings yet
Text Analytics
30 pages
One-Hot Encoding for ML & NLP
No ratings yet
One-Hot Encoding for ML & NLP
13 pages
Text Similarity Cosine BOW TF-IDF Lecture
No ratings yet
Text Similarity Cosine BOW TF-IDF Lecture
6 pages
Ia Dissertação
No ratings yet
Ia Dissertação
132 pages
LICALITY-Likelihood and Criticality: Vulnerability Risk Prioritization Through Logical Reasoning and Deep Learning
No ratings yet
LICALITY-Likelihood and Criticality: Vulnerability Risk Prioritization Through Logical Reasoning and Deep Learning
15 pages
Enhancing Fake News Detection by Multi-Feature Classification
No ratings yet
Enhancing Fake News Detection by Multi-Feature Classification
13 pages
Group Document
No ratings yet
Group Document
56 pages
EagleBot - A Chatbot Based Multi-Tier Question Answering System Fo
No ratings yet
EagleBot - A Chatbot Based Multi-Tier Question Answering System Fo
54 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
4 pages
2022 IJIE Template and Article Guide For Author V.17.08.10.22
No ratings yet
2022 IJIE Template and Article Guide For Author V.17.08.10.22
12 pages
Question Bank (Problems)
No ratings yet
Question Bank (Problems)
6 pages
Topic Extraction From News Archive Using TF PDF Algorithm
No ratings yet
Topic Extraction From News Archive Using TF PDF Algorithm
10 pages
On Stopwords, Filtering and Data Sparsity For Sentiment Analysis of Twitter
No ratings yet
On Stopwords, Filtering and Data Sparsity For Sentiment Analysis of Twitter
8 pages
DoddiKiran
No ratings yet
DoddiKiran
47 pages

NLP 1 Week Tutorial NLTK

Uploaded by

NLP 1 Week Tutorial NLTK

Uploaded by

NLP with Python using NLTK

What is NLP (Natural Language Processing)?

• Tokenization: Breaking text into words or sentences.

text = "NLP is fun and powerful!"

text = "NLTK is a powerful Python library for NLP."

• sentence = "Steve Jobs founded Apple in California."

• Text Classification: Predicts labels for input text (e.g.,

train = [(format_sentence("I love this movie"), 'pos'),

• TF-IDF is a statistical measure used to evaluate how important

• Word2Vec is a technique to convert words into vectors

• pip install genism

# Train Word2Vec model

# Get vector for a word

# Find similar words

• vector_size: Dimension of word embeddings (usually 50–300)

• Sentiment Analysis is the process of

You might also like