Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
17 views2 pages

Exercise 2

The document provides a step-by-step guide on using the NLTK library for natural language processing. It covers installation, text searching, vocabulary counting, frequency distribution, and collocation analysis. Each section includes code snippets and expected outputs to illustrate the functionality.

Uploaded by

SUJITHA M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views2 pages

Exercise 2

The document provides a step-by-step guide on using the NLTK library for natural language processing. It covers installation, text searching, vocabulary counting, frequency distribution, and collocation analysis. Each section includes code snippets and expected outputs to illustrate the functionality.

Uploaded by

SUJITHA M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

PROGRAM:

1. Installing NLTK:
pip install nltk
2. Importing NLTK and downloading data:
import nltk
nltk.download('punkt')
nltk.download('stopwords')

OUTPUT:
True

3. Searching text:
text="natural language processing with NLTK is fun and eductional."
word="NLTK"
if word in text:
print(f"'{word}'found in the text!")
else:
print(f"'{word}'not found in the text in the text")

OUTPUT:
'NLTK'found in the text!

4. Counting Vocabulary:
from nltk.tokenize import word_tokenize
text="natural language processing with NLTK is fun and eductional."
tokens=word_tokenize(text)
vocabulary=set(tokens)
print(f"vocabulary size:{len(vocabulary)}")
OUTPUT:
vocabulary size:10

5. Frequency Distribution:
from nltk import FreqDist
fdist=FreqDist(tokens)
print(f"Most Common Words:{fdist.most_common(5)}")

OUTPUT:
Most Common Words:[('natural', 1), ('language', 1), ('processing', 1), ('with', 1), ('NLTK',
1)]

6. Collocation:
import nltk from nltk.collocations
import BigramCollocationFinder
from nltk.metrics import BigramAssocMeasures
text="Natural Language Processing with NLTK is fun and educational."
tokens=nltk.word_tokenize(text.lower())
finder=BigramCollocationFinder.from_words(tokens)
bigrams=finder.nbest(BigramAssocMeasures.likelihood_ratio,5)
print(f"Top 5 bigrams:{bigrams}")

OUTPUT:
Top 5 bigrams:[('and', 'educational'), ('educational', '.'), ('fun', 'and'), ('is', 'fun'), ('language',
'processing')]

You might also like