PROGRAM:
1. Installing NLTK:
pip install nltk
2. Importing NLTK and downloading data:
import nltk
nltk.download('punkt')
nltk.download('stopwords')
OUTPUT:
True
3. Searching text:
text="natural language processing with NLTK is fun and eductional."
word="NLTK"
if word in text:
print(f"'{word}'found in the text!")
else:
print(f"'{word}'not found in the text in the text")
OUTPUT:
'NLTK'found in the text!
4. Counting Vocabulary:
from nltk.tokenize import word_tokenize
text="natural language processing with NLTK is fun and eductional."
tokens=word_tokenize(text)
vocabulary=set(tokens)
print(f"vocabulary size:{len(vocabulary)}")
OUTPUT:
vocabulary size:10
5. Frequency Distribution:
from nltk import FreqDist
fdist=FreqDist(tokens)
print(f"Most Common Words:{fdist.most_common(5)}")
OUTPUT:
Most Common Words:[('natural', 1), ('language', 1), ('processing', 1), ('with', 1), ('NLTK',
1)]
6. Collocation:
import nltk from nltk.collocations
import BigramCollocationFinder
from nltk.metrics import BigramAssocMeasures
text="Natural Language Processing with NLTK is fun and educational."
tokens=nltk.word_tokenize(text.lower())
finder=BigramCollocationFinder.from_words(tokens)
bigrams=finder.nbest(BigramAssocMeasures.likelihood_ratio,5)
print(f"Top 5 bigrams:{bigrams}")
OUTPUT:
Top 5 bigrams:[('and', 'educational'), ('educational', '.'), ('fun', 'and'), ('is', 'fun'), ('language',
'processing')]