0% found this document useful (0 votes)

27 views24 pages

Natural Language Processing Lab Manual

The document is a lab manual for Natural Language Processing (NLP) that outlines various tasks to be implemented using Python, including tokenization, stop word removal, stemming, word analysis, and word generation. It also covers Word Sense Disambiguation (WSD) using the Lesk algorithm and provides instructions for installing the NLTK toolkit. Sample code snippets and explanations for each task are included to guide users in performing NLP tasks effectively.

Uploaded by

vinodparwatham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views24 pages

Natural Language Processing Lab Manual

Uploaded by

vinodparwatham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Natural Language Processing Lab Manual

1. Implement Python program to perform following tasks on text

a) Tokenization b) Stop word Removal
2. Implement Python program to implement Porter stemmer algorithm for
stemming.
3. Implement Python Program for
a) Word Analysis b) Word Generation
4. Create a Sample list for at least 5 words with ambiguous sense and
Implement WSD using Python
5. Install NLTK tool kit and perform stemming
6. Create Sample list of at least 10 words POS tagging and find the POS for any
given word
7. Write a Python program to
a) Perform Morphological Analysis using NLTK library
b) Generate n-grams using NLTK N-Grams library
c) Implement N-Grams Smoothing
8. Using NLTK package to convert audio file to text and text file to audio files.
1. Implement Python program to perform following tasks on text

b) Tokenization b) Stop word Removal

a. Python Program to Perform Tokenization:

import nltk
nltk.download('punkt') # Download necessary NLTK data files

from nltk.tokenize import word_tokenize, sent_tokenize

# Example text
text = "Natural language processing (NLP) is a field of artificial intelligence
that helps computers understand, interpret, and manipulate human language.
It enables tasks like speech recognition, sentiment analysis, and machine
translation."

# Sentence Tokenization
sentences = sent_tokenize(text)
print("Sentence Tokenization:")
print(sentences)
print()

# Word Tokenization
words = word_tokenize(text)
print("Word Tokenization:")
print(words)

Explanation:

 Sentence Tokenization: The sent_tokenize function breaks the text into

individual sentences.
 Word Tokenization: The word_tokenize function splits the text into
individual words (or tokens).

Word Tokenization: ['Hello', '!', 'My', 'name', 'is', 'Guru', '.', 'How', 'can', 'I', 'assist',
'you', 'today', '?']

Sentence Tokenization: ['Hello! My name is Guru.', 'How can I assist you today?'] n

b. Python Program to Perform Stop word Removal

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Download NLTK stopwords and punkt tokenizer models

nltk.download('punkt')
nltk.download('stopwords')

def remove_stopwords(text):
# Tokenize the input text into words
words = word_tokenize(text)

# Get the set of stopwords in English

stop_words = set(stopwords.words('english'))

# Remove stopwords from the tokenized words

filtered_words = [word for word in words if word.lower() not in stop_words]

# Join the filtered words back into a string

return ' '.join(filtered_words)

# Example text
text = "This is an example sentence where we will remove stop words."

# Remove stopwords
filtered_text = remove_stopwords(text)
print("Original Text:", text)
print("Filtered Text:", filtered_text)

Explanation:

 word_tokenize: Tokenizes the input text into individual words.

 stopwords.words('english'): Provides a list of common stop words in English
(like "the", "is", etc.).
 A list comprehension is used to filter out words that are found in the
stop_words list.
 The filtered_words are then joined back into a string without the stop words.

Output:

Original Text: This is an example sentence where we will remove stop words.
Filtered Text: example sentence remove stop words .

2. Implement Python program to implement Porter stemmer algorithm

for stemming.
# Stemming

import nltk

from nltk.stem import PorterStemmer

from nltk.tokenize import word_tokenize

# Download NLTK tokenizer and stemmer models

nltk.download('punkt')

def stem_text(text):

# Initialize the Porter

Stemmer porter_stemmer = PorterStemmer()

# Tokenize the text into words

words = word_tokenize(text)

# Apply stemming to each word

stemmed_words = [porter_stemmer.stem(word) for word in words]

# Join the stemmed words back into a single string

stemmed_text = ' '.join(stemmed_words)

return stemmed_text

# Example text

text = "NLTK is a leading platform for building Python programs to work with human
language data."

# Perform stemming

stemmed_text = stem_text(text)

# Print stemmed text

print(stemmed_text)

OUTPUT:

3. Implement Python Program for

b) Word Analysis b) Word Generation

 Word Analysis:
 Analyze character frequency in a given text.
 Analyze word frequency and length.

 Word Generation:

 Generate random words based on character frequency.

 Use a Markov Chain approach for more context-aware word generation.

import random

from collections import Counter, defaultdict

class WordAnalyzer:

def init(self, text):

self.text = text

self.word_list = self.text.split()

self.char_freq = self.analyze_char_frequency()

self.word_freq = self.analyze_word_frequency()

def analyze_char_frequency(self):

return Counter(self.text.replace(" ", ""))

def analyze_word_frequency(self):

return Counter(self.word_list)

def analyze_word_lengths(self):

return Counter(len(word) for word in self.word_list)

def display_analysis(self):

print("Character Frequency:")

for char, freq in self.char_freq.items():

print(f" {char}: {freq}")

print("\nWord Frequency:")

for word, freq in self.word_freq.items():

print(f" {word}: {freq}")

print("\nWord Length Frequency:")

for length, freq in self.analyze_word_lengths().items():

print(f" {length} characters: {freq}")

class WordGenerator:

def init(self, char_freq, word_list):

self.char_freq = char_freq

self.word_list = word_list

self.transition_matrix = self.build_transition_matrix()

def build_transition_matrix(self):

matrix = defaultdict(Counter)

for word in self.word_list:

for i in range(len(word) - 1):

matrix[word[i]][word[i + 1]] += 1

for char, transitions in matrix.items():

total = sum(transitions.values())

for next_char in transitions:

transitions[next_char] /= total

return matrix
def generate_word(self, length):

if not self.char_freq:

return ""

start_char = random.choice(list(self.char_freq.keys()))

word = start_char

for _ in range(length - 1):

if start_char not in self.transition_matrix:

break

next_char = random.choices(

list(self.transition_matrix[start_char].keys()),

list(self.transition_matrix[start_char].values())

)[0]

word += next_char

start_char = next_char

return word

def random_word(self, length):

return ''.join(random.choices(

list(self.char_freq.keys()),

weights=list(self.char_freq.values()),

k=length

))

if __name__ == "__main__":

# Example Text
text = "hello world this is a simple example of word analysis and generation"

# Word Analysis

analyzer = WordAnalyzer(text)

analyzer.display_analysis()

# Word Generation

generator = WordGenerator(analyzer.char_freq, analyzer.word_list)

print("\nGenerated Words:")

print("Markov Chain Based:", generator.generate_word(6))

print("Random Word:", generator.random_word(6))

1. Class: WordAnalyzer

This class is responsible for analyzing a given text. It extracts meaningful statistics
about the words and characters in the text.

Methods:

1. __init__(self, text)
o Initializes the WordAnalyzer class with a text input.
o Splits the text into words (self.word_list) and computes:
 Character frequencies: How often each character appears.
 Word frequencies: How often each word appears.
2. analyze_char_frequency(self)
o Returns a Counter object with the frequency of each character in the
text (excluding spaces).
3. analyze_word_frequency(self)
o Returns a Counter object with the frequency of each unique word in
the text.
4. analyze_word_lengths(self)
o Analyzes the lengths of all the words in the text.
o Returns a Counter object where the keys are word lengths (in
characters) and values are their frequencies.
5. display_analysis(self)
o Prints the analysis results in a readable format, including:
 Character frequency
 Word frequency
 Word length frequency

2. Class: WordGenerator

This class is responsible for generating new words based on the analyzed data.

Initialization:

 Takes two inputs:

1. char_freq: A dictionary of character frequencies from WordAnalyzer.
2. word_list: The list of words from the text.
 It also builds a transition matrix:

o Tracks how likely one character is to follow another, based on the input
text.

Methods:

1. build_transition_matrix(self)
o Creates a Markov Chain-like transition matrix for characters.
o For each character in the words, it calculates:
 The frequency of every possible "next character."
 Normalizes these frequencies to probabilities.
o Example:
 For the word hello, the transitions would be:

rust
Copy code
h -> e
e -> l
l -> l
l -> o

2. generate_word(self, length)
o Uses the transition matrix to generate a word of a specified length.
o Starts with a random character and iteratively adds the next character
based on probabilities in the transition matrix.
3. random_word(self, length)
o Generates a completely random word of the specified length using
character frequencies.
o Characters are chosen independently of one another.

3. Main Script

This section ties everything together and demonstrates how to use the classes.
Steps:

1. Input text:
o The text is provided as a string: "hello world this is a simple example of
word analysis and generation".
2. Analyze the text:
o The WordAnalyzer class computes the following:
 Frequency of each character (e.g., h appears twice, e appears 5
times, etc.).
 Frequency of each word (e.g., hello appears once, world appears
once, etc.).
 Distribution of word lengths (e.g., 5-character words appear
twice, etc.).
o Results are printed via display_analysis.
3. Generate new words:
o The WordGenerator class creates words using two approaches:
 Markov Chain-based (generate_word):
 Uses the character transition matrix to create more
context-aware words.
 Random character-based (random_word):
 Uses character frequency to randomly pick characters,
independent of sequence.

Example Outputs

Word Analysis:

yaml
Copy code
Character Frequency:
h: 2
e: 5
l: 7
o: 5
w: 2
r: 2
d: 3
t: 2
i: 5
s: 7
a: 5
m: 1
p: 1
n: 3
x: 1
f: 1
g: 1
Word Frequency:
hello: 1
world: 1
this: 1
is: 1
a: 1
simple: 1
example: 1
of: 1
word: 1
analysis: 1
and: 1
generation: 1

Word Length Frequency:

5 characters: 2
4 characters: 1
2 characters: 2
1 characters: 1
...

Word Generation:

yaml
Copy code
Generated Words:
Markov Chain Based: heilo
Random Word: inaso

4. Create a Sample list for at least 5 words with ambiguous sense and
Implement WSD using Python

Word Sense Disambiguation (WSD) is a task in natural language processing (NLP) to

determine the correct sense of a word in context when the word has multiple
meanings. Below is a sample list of 5 ambiguous words and a Python
implementation of WSD using the Lesk algorithm, which is a popular algorithm for
WSD.

Sample List of Ambiguous Words

1. Bank (e.g., river bank, financial institution)

2. Plant (e.g., factory, living organism)
3. Bat (e.g., flying mammal, sports equipment)
4. Light (e.g., illumination, not heavy)
5. Mouse (e.g., computer device, animal)
Python Implementation

Below is the code that uses the Lesk algorithm from the nltk library for WSD:

python
Copy code
from nltk.corpus import wordnet
from nltk.wsd import lesk
from nltk.tokenize import word_tokenize

# Sample sentences with ambiguous words

sentences = [
"The bank of the river was flooded after the heavy rain.",
"He deposited the money in the bank yesterday.",
"The plant in the garden is blooming.",
"The plant was shut down due to maintenance.",
"A bat flew out of the cave at dusk.",
"He hit the ball with his bat during the game.",
"The light from the lamp was very bright.",
"This box is very light and easy to carry.",
"I saw a mouse running across the floor.",
"He clicked the button on the mouse to open the file."
]

# Function to perform Word Sense Disambiguation

def disambiguate_sentence(sentence, target_word):
# Tokenize the sentence
tokens = word_tokenize(sentence)
# Use the Lesk algorithm to determine the sense of the target word
sense = lesk(tokens, target_word)
return sense

# Ambiguous words and their sentences

ambiguous_words = ["bank", "plant", "bat", "light", "mouse"]

# Disambiguating senses
for sentence in sentences:
for word in ambiguous_words:
if word in sentence:
sense = disambiguate_sentence(sentence, word)
print(f"Sentence: {sentence}")
print(f"Word: {word}")
if sense:
print(f"Sense: {sense.name()}")
print(f"Definition: {sense.definition()}")
else:
print("Sense: Not found.")
print("-" * 50)

Explanation of the Code

1. Input Sentences: Each sentence contains one of the ambiguous words.
2. Tokenization: The nltk.tokenize.word_tokenize function splits sentences into
tokens.
3. Lesk Algorithm: The nltk.wsd.lesk function is used to find the appropriate
WordNet sense of the word based on the sentence context.
4. Output: For each ambiguous word, its sense and definition are printed.

Sample Output

For the sentence "The bank of the river was flooded after the heavy rain.", the
output might be:

makefile
Copy code
Sentence: The bank of the river was flooded after the heavy rain.
Word: bank
Sense: bank.n.01
Definition: sloping land (especially the slope beside a body of water)

Ensure you have the necessary NLTK resources downloaded using:

python
Copy code
import nltk
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('punkt')
5. Install NLTK tool kit and perform stemming

Step 1: Install NLTK

1. Using pip: Open your terminal or command prompt and run:

bash
Copy code
pip install nltk

2. Verify Installation: After installation, you can verify it by running the

following command in Python:

python
Copy code
import nltk
print(nltk.__version__) # Check if it's installed and print the version

Step 2: Download NLTK Data

NLTK provides various datasets and models for natural language processing tasks.
To download the required data, follow these steps:

1. Open a Python shell or script.

2. Run:

python
Copy code
import nltk
nltk.download('punkt') # Required for tokenization
nltk.download('wordnet') # Optional: Required for lemmatization

Step 3: Perform Stemming

Stemming is the process of reducing words to their root form. NLTK provides several
stemming algorithms. Here's an example using the Porter Stemmer:

Code Example:

python
Copy code
from nltk.stem import PorterStemmer

# Initialize the stemmer

stemmer = PorterStemmer()
# List of words to stem
words = ["running", "jumps", "easily", "happiness"]

# Perform stemming
stemmed_words = [stemmer.stem(word) for word in words]

print("Original Words:", words)

print("Stemmed Words:", stemmed_words)

Output:

less
Copy code
Original Words: ['running', 'jumps', 'easily', 'happiness']
Stemmed Words: ['run', 'jump', 'easili', 'happi']

Optional: Using Lancaster Stemmer

If you want a more aggressive stemmer, use the Lancaster Stemmer:

python
Copy code
from nltk.stem import LancasterStemmer

# Initialize the Lancaster Stemmer

lancaster_stemmer = LancasterStemmer()

# Perform stemming
stemmed_words = [lancaster_stemmer.stem(word) for word in words]

print("Lancaster Stemmed Words:", stemmed_words)

6. Create Sample list of at least 10 words POS tagging and find the POS for any
given word

Sample List of Words

1. Run
2. Beautiful
3. Cat
4. Slowly
5. Play
6. Happiness
7. Beneath
8. Quickly
9. Book
10.Intelligent

POS Tags for the Words

Here’s the POS (Part of Speech) tagging for each word:

Word POS
Run Verb/Noun
Beautiful Adjective
Cat Noun
Slowly Adverb
Play Verb/Noun
Happiness Noun
Beneath Preposition
Quickly Adverb
Book Noun/Verb
Intelligent Adjective

Function to Get POS for a Given Word

You can implement a simple function in Python to find the POS of any word based
on this list.

python
Copy code
# Sample POS Dictionary
pos_dict = {
"run": ["Verb", "Noun"],
"beautiful": ["Adjective"],
"cat": ["Noun"],
"slowly": ["Adverb"],
"play": ["Verb", "Noun"],
"happiness": ["Noun"],
"beneath": ["Preposition"],
"quickly": ["Adverb"],
"book": ["Noun", "Verb"],
"intelligent": ["Adjective"]
}

# Function to find POS

def get_pos(word):
word = word.lower()
return pos_dict.get(word, "Word not found in the list")

# Example Usage
word = input("Enter a word: ")
pos = get_pos(word)
print(f"POS for '{word}': {pos}")

7. Write a Python program to

a. Perform Morphological Analysis using NLTK library
b. Generate n-grams using NLTK N-Grams library
c. Implement N-Grams Smoothing

a.Perform Morphological Analysis using NLTK library

import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.corpus import wordnet
from nltk import pos_tag

# Download necessary NLTK data

nltk.download('punkt')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')
nltk.download('omw-1.4')

# Input text for analysis

text = "The quick brown fox jumps over the lazy dog."
# Tokenize the text
tokens = word_tokenize(text)
print("Tokens:", tokens)

# Part-of-speech tagging
pos_tags = pos_tag(tokens)
print("\nPOS Tags:", pos_tags)

# Initialize Stemmer and Lemmatizer

stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

# Perform stemming and lemmatization

print("\nMorphological Analysis:")
for word, tag in pos_tags:
stem = stemmer.stem(word)
wordnet_pos = get_wordnet_pos(tag) or wordnet.NOUN # Default to noun
lemma = lemmatizer.lemmatize(word, pos=wordnet_pos)
print(f"Word: {word:12} | Stem: {stem:12} | Lemma: {lemma:12} | POS: {tag}")

Explanation:

1. Tokenization: The text is broken into individual words using word_tokenize.

2. Part-of-Speech Tagging: Each token is tagged with its part of speech using
pos_tag.
3. Stemming: The PorterStemmer reduces words to their root form.
4. Lemmatization: The WordNetLemmatizer reduces words to their base form,
considering the context (POS tags).
5. WordNet POS Conversion: The get_wordnet_pos function maps POS tags
from pos_tag to WordNet-compatible tags for accurate lemmatization.

Output:

For the input sentence "The quick brown fox jumps over the lazy dog.", the program
produces tokens, POS tags, stems, and lemmas.

Example output:

yaml
Copy code
Tokens: ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

POS Tags: [('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'),
('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]

b. Generate n-grams using NLTK N-Grams library

import nltk
from nltk.util import ngrams
from nltk.tokenize import word_tokenize

# Ensure necessary NLTK data is downloaded

nltk.download('punkt')

def generate_ngrams(text, n):

"""
Generate n-grams from a given text.

Args:
text (str): The input text to process.
n (int): The size of n-grams to generate (e.g., 2 for bigrams, 3 for trigrams).

Returns:
list: A list of n-grams, each represented as a tuple.
"""
# Tokenize the text into words
tokens = word_tokenize(text)

# Generate n-grams
n_grams = list(ngrams(tokens, n))
return n_grams

# Example usage
if __name__ == "__main__":
sample_text = "This is a simple example to generate n-grams using NLTK."
n = 2 # For bigrams; change this value for different n-grams

n_grams = generate_ngrams(sample_text, n)
print(f"{n}-grams:")
for gram in n_grams:
print(gram)
How It Works:

1. Tokenization: The text is tokenized into words using NLTK's word_tokenize.

2. Generating N-Grams: The nltk.util.ngrams function is used to generate n-
grams.
3. Output: The resulting n-grams are displayed as tuples.

Example Output:

For the input text: "This is a simple example to generate n-grams using NLTK." and
n=2:

arduino
Copy code
2-grams:
('This', 'is')
('is', 'a')
('a', 'simple')
('simple', 'example')
('example', 'to')
('to', 'generate')
('generate', 'n-grams')
('n-grams', 'using')
('using', 'NLTK')
('.')

C.Implement N-Grams Smoothing

from collections import defaultdict

import math

class NGramModel:
def __init__(self, n):
self.n = n # The 'n' in n-grams
self.ngram_counts = defaultdict(int)
self.context_counts = defaultdict(int)
self.vocabulary = set()

def train(self, corpus):

"""
Trains the model on the provided corpus.
:param corpus: A list of sentences, where each sentence is a list of words.
"""
for sentence in corpus:
sentence = ['<s>'] * (self.n - 1) + sentence + ['</s>']
for i in range(len(sentence) - self.n + 1):
ngram = tuple(sentence[i:i + self.n])
context = ngram[:-1]
word = ngram[-1]

self.ngram_counts[ngram] += 1
self.context_counts[context] += 1
self.vocabulary.add(word)

def probability(self, context, word, smoothing=True):

"""
Calculates the probability of a word given its context.
:param context: A tuple of words representing the context.
:param word: The word whose probability is to be calculated.
:param smoothing: Whether to apply Laplace smoothing.
:return: The probability of the word given the context.
"""
ngram = context + (word,)
if smoothing:
# Laplace (Add-One) Smoothing
numerator = self.ngram_counts[ngram] + 1
denominator = self.context_counts[context] + len(self.vocabulary)
else:
# No smoothing
numerator = self.ngram_counts[ngram]
denominator = self.context_counts[context]

return numerator / denominator if denominator > 0 else 0.0

def generate_sentence(self, max_length=20):

"""
Generates a sentence using the model.
:param max_length: The maximum length of the sentence to generate.
:return: A generated sentence as a list of words.
"""
context = ('<s>',) * (self.n - 1)
sentence = []

for _ in range(max_length):
word_probabilities = {word: self.probability(context, word) for word in
self.vocabulary}
next_word = max(word_probabilities, key=word_probabilities.get)
if next_word == '</s>':
break
sentence.append(next_word)
context = context[1:] + (next_word,)
return sentence

# Example usage
if __name__ == "__main__":
corpus = [
["the", "cat", "sat"],
["the", "cat", "sat", "on", "the", "mat"],
["the", "dog", "barked"]
]

ngram_model = NGramModel(2) # Bigram model

ngram_model.train(corpus)

print("Probability of 'cat' given 'the':", ngram_model.probability(('the',), 'cat'))

print("Probability of 'sat' given 'cat':", ngram_model.probability(('cat',), 'sat'))
print("Generated sentence:", " ".join(ngram_model.generate_sentence()))

Key Features:

1. Training on Corpus: The train method processes a corpus of sentences and

builds counts for n-grams and contexts.
2. Laplace Smoothing: Adds one to the numerator and adjusts the
denominator by adding the vocabulary size to avoid zero probabilities.
3. Sentence Generation: Generates sentences using the trained model by
choosing the most probable word iteratively.

8. Using NLTK package to convert audio file to text and text file to audio files

The NLTK (Natural Language Toolkit) package is primarily used for natural language
processing tasks such as tokenization, stemming, lemmatization, and sentiment
analysis. However, it does not have built-in support for converting audio to text or
vice versa. For these tasks, you can use other specialized libraries:

1. Converting Audio to Text:

o Use SpeechRecognition, a Python library for speech-to-text conversion.
2. Converting Text to Audio:
o Use gTTS (Google Text-to-Speech) or pyttsx3 for text-to-speech
conversion.

Here's an example script for both tasks:

Requirements
Install the required libraries using pip:

bash
Copy code
pip install SpeechRecognition gTTS pydub

Script

python
Copy code
import os
import speech_recognition as sr
from gtts import gTTS

# Function to convert audio to text

def audio_to_text(audio_file_path, text_file_path):
recognizer = sr.Recognizer()
with sr.AudioFile(audio_file_path) as source:
audio_data = recognizer.record(source)
try:
text = recognizer.recognize_google(audio_data)
with open(text_file_path, 'w') as file:
file.write(text)
print(f"Transcription saved to {text_file_path}")
except sr.UnknownValueError:
print("Audio is not clear enough to transcribe.")
except sr.RequestError as e:
print(f"Error with the SpeechRecognition service: {e}")

# Function to convert text to audio

def text_to_audio(text_file_path, output_audio_file_path):
with open(text_file_path, 'r') as file:
text = file.read()
tts = gTTS(text=text, lang='en')
tts.save(output_audio_file_path)
print(f"Audio saved to {output_audio_file_path}")

# Example usage
audio_file = "example_audio.wav" # Replace with your audio file
text_file = "output_text.txt"
output_audio_file = "output_audio.mp3"

# Convert audio to text

audio_to_text(audio_file, text_file)

# Convert text to audio

text_to_audio(text_file, output_audio_file)

Explanation
1. Audio to Text:
o Uses speech_recognition to transcribe speech from an audio file (e.g.,
WAV).
o Saves the transcription to a text file.
2. Text to Audio:
o Reads the content of a text file.
o Uses gTTS to convert the text to speech and saves it as an audio file
(e.g., MP3).

Grammar Itep
100% (1)
Grammar Itep
40 pages
ABCD Objectives
No ratings yet
ABCD Objectives
6 pages
Class 7 English Conjunctions
No ratings yet
Class 7 English Conjunctions
15 pages
Majorship English Blessings
100% (9)
Majorship English Blessings
13 pages
Ccs339 Text and Speech Analysis Lab Manual
No ratings yet
Ccs339 Text and Speech Analysis Lab Manual
51 pages
English Grammar Notes
No ratings yet
English Grammar Notes
151 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
English for Business Students Guide
100% (1)
English for Business Students Guide
139 pages
Academic Writing With Corpora: A Resource Book For Data-Driven Learning 1st Edition Tatyana Karpenko-Seccombe Download
100% (5)
Academic Writing With Corpora: A Resource Book For Data-Driven Learning 1st Edition Tatyana Karpenko-Seccombe Download
59 pages
Ccs369 - Text and Speech Analysis - Lab Manual
100% (1)
Ccs369 - Text and Speech Analysis - Lab Manual
23 pages
NLP - (Natural Language Processing Lab Manual)
No ratings yet
NLP - (Natural Language Processing Lab Manual)
12 pages
English For Nurse 2019
No ratings yet
English For Nurse 2019
151 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
11 pages
NLP Lab Codes Till Mod3
No ratings yet
NLP Lab Codes Till Mod3
7 pages
Natural Language Processing in Python - Exploring Word Frequencies With NLTK
No ratings yet
Natural Language Processing in Python - Exploring Word Frequencies With NLTK
5 pages
Syntactic Analysis of Tiv Possessives. by Dyako Aondonguter Leo
100% (1)
Syntactic Analysis of Tiv Possessives. by Dyako Aondonguter Leo
64 pages
TSA Lab Manual New
No ratings yet
TSA Lab Manual New
14 pages
NLP Practical Journal
No ratings yet
NLP Practical Journal
36 pages
Ai&Ml Bai601 NLP Lab Manual
No ratings yet
Ai&Ml Bai601 NLP Lab Manual
48 pages
NLP Lab Manual - Final
No ratings yet
NLP Lab Manual - Final
15 pages
Module 5
No ratings yet
Module 5
69 pages
NLP Day1
No ratings yet
NLP Day1
4 pages
NLP - Record (Weeks 1-12)
No ratings yet
NLP - Record (Weeks 1-12)
41 pages
NLP Exp4
No ratings yet
NLP Exp4
10 pages
NLP Record
No ratings yet
NLP Record
23 pages
Ass 3
No ratings yet
Ass 3
3 pages
7 Exp
No ratings yet
7 Exp
6 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
6 pages
NLP Exp3
No ratings yet
NLP Exp3
3 pages
CCS369 - Text and Speech Analysis
No ratings yet
CCS369 - Text and Speech Analysis
31 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
Tsarecord
No ratings yet
Tsarecord
22 pages
NLP Exp 4
No ratings yet
NLP Exp 4
5 pages
Tsa Labmanual
No ratings yet
Tsa Labmanual
26 pages
NLPPractical
No ratings yet
NLPPractical
12 pages
Aim - Procedure - Result - Single Side
No ratings yet
Aim - Procedure - Result - Single Side
18 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
7 pages
NLP Lab Manual for CSE Students
No ratings yet
NLP Lab Manual for CSE Students
28 pages
Batch 2
No ratings yet
Batch 2
13 pages
NLP Practical Journal 2023-24
No ratings yet
NLP Practical Journal 2023-24
22 pages
Bling
No ratings yet
Bling
7 pages
R22 NLP Python Programs
No ratings yet
R22 NLP Python Programs
15 pages
CS Practical File
No ratings yet
CS Practical File
47 pages
N Gram Presentation
No ratings yet
N Gram Presentation
29 pages
Ai & ML Week-11
No ratings yet
Ai & ML Week-11
32 pages
Practical File by Aksh Jaiswal
No ratings yet
Practical File by Aksh Jaiswal
48 pages
NLP EXP 3 (B) - Word Generation
No ratings yet
NLP EXP 3 (B) - Word Generation
2 pages
DSBDL Assn 07
No ratings yet
DSBDL Assn 07
4 pages
Jal Patel NLP
No ratings yet
Jal Patel NLP
32 pages
NLP MTE Syllabus and Practice Problems
No ratings yet
NLP MTE Syllabus and Practice Problems
2 pages
1 - Write A Python Program To Perform Following Tasks On Text A) Tokenization
No ratings yet
1 - Write A Python Program To Perform Following Tasks On Text A) Tokenization
13 pages
The 8 Parts of Speech: Noun Pronoun Adjective Verb Adverb Preposition Conjuntion Interjerction
No ratings yet
The 8 Parts of Speech: Noun Pronoun Adjective Verb Adverb Preposition Conjuntion Interjerction
7 pages
All Practicals
No ratings yet
All Practicals
33 pages
Exp-2 NLP
No ratings yet
Exp-2 NLP
4 pages
NLP Text Processing Techniques
No ratings yet
NLP Text Processing Techniques
6 pages
TSA Student
No ratings yet
TSA Student
20 pages
MA English (2008 Pattern)
No ratings yet
MA English (2008 Pattern)
65 pages
Text Processing
No ratings yet
Text Processing
16 pages
Dovahzul Print Dictionary 2nd Edition
No ratings yet
Dovahzul Print Dictionary 2nd Edition
150 pages
NLP Program 1
No ratings yet
NLP Program 1
3 pages
Grammar Land The Play
No ratings yet
Grammar Land The Play
18 pages
Python NLP Tasks with NLTK
No ratings yet
Python NLP Tasks with NLTK
17 pages
Parts of Speech Quiz
No ratings yet
Parts of Speech Quiz
10 pages
Structure of The TOEIC® Test
No ratings yet
Structure of The TOEIC® Test
32 pages
Lab1 IR
No ratings yet
Lab1 IR
14 pages
Nkasi Mock STD (DRS) 7 Feb 2025
No ratings yet
Nkasi Mock STD (DRS) 7 Feb 2025
22 pages
E 3 L 1 L 2 Partsofspeech
No ratings yet
E 3 L 1 L 2 Partsofspeech
34 pages
ENGLISH F4 Ms
No ratings yet
ENGLISH F4 Ms
3 pages
Soundarya 256 NLP Practs
No ratings yet
Soundarya 256 NLP Practs
14 pages
Python NLP Techniques Guide
No ratings yet
Python NLP Techniques Guide
18 pages
Project Assignment 4: Markov Chain
No ratings yet
Project Assignment 4: Markov Chain
10 pages
Https Raw - Githubusercontent.com Joelgrus Data-Science-From-Scratch Master Code Natural Language Processing
No ratings yet
Https Raw - Githubusercontent.com Joelgrus Data-Science-From-Scratch Master Code Natural Language Processing
5 pages
G11 - ENGLISH - CAPS - M1-Language Focus Parts of Speech, Sentence Structure, Concord, Pronouns - Student - Resource
No ratings yet
G11 - ENGLISH - CAPS - M1-Language Focus Parts of Speech, Sentence Structure, Concord, Pronouns - Student - Resource
5 pages
Interjections: Interjection Inter (Into) + Jection (Throwing)
No ratings yet
Interjections: Interjection Inter (Into) + Jection (Throwing)
2 pages
Graded Lessons in English
No ratings yet
Graded Lessons in English
124 pages
? Class 10 AI Part B Most Important Questions For Board Exam Barkha
No ratings yet
? Class 10 AI Part B Most Important Questions For Board Exam Barkha
233 pages
The Eight Parts of Speech
No ratings yet
The Eight Parts of Speech
10 pages
Closed-Class Words in Sentence Production: Evidence From A Modality-Specific Dissociation
No ratings yet
Closed-Class Words in Sentence Production: Evidence From A Modality-Specific Dissociation
34 pages
M.E.S Indian School, Doha - Qatar: Translation
No ratings yet
M.E.S Indian School, Doha - Qatar: Translation
5 pages
Novel Report
No ratings yet
Novel Report
4 pages
Grade 6 End of Term 1 Assessment Preparation Letter
No ratings yet
Grade 6 End of Term 1 Assessment Preparation Letter
3 pages
Really Basic English Grammar
No ratings yet
Really Basic English Grammar
1 page
Academic Vocabulary Exercises
No ratings yet
Academic Vocabulary Exercises
4 pages

Natural Language Processing Lab Manual

Uploaded by

Natural Language Processing Lab Manual

Uploaded by

Natural Language Processing Lab Manual

1. Implement Python program to perform following tasks on text

b) Tokenization b) Stop word Removal

a. Python Program to Perform Tokenization:

from nltk.tokenize import word_tokenize, sent_tokenize

 Sentence Tokenization: The sent_tokenize function breaks the text into

b. Python Program to Perform Stop word Removal

# Download NLTK stopwords and punkt tokenizer models

# Get the set of stopwords in English

# Remove stopwords from the tokenized words

# Join the filtered words back into a string

 word_tokenize: Tokenizes the input text into individual words.

2. Implement Python program to implement Porter stemmer algorithm

from nltk.stem import PorterStemmer

from nltk.tokenize import word_tokenize

# Initialize the Porter

Stemmer porter_stemmer = PorterStemmer()

# Tokenize the text into words

# Apply stemming to each word

stemmed_words = [porter_stemmer.stem(word) for word in words]

# Join the stemmed words back into a single string

stemmed_text = ' '.join(stemmed_words)

# Print stemmed text

3. Implement Python Program for

 Generate random words based on character frequency.

from collections import Counter, defaultdict

def __init__(self, text):

return Counter(self.text.replace(" ", ""))

return Counter(len(word) for word in self.word_list)

for char, freq in self.char_freq.items():

for word, freq in self.word_freq.items():

print(f" {word}: {freq}")

print("\nWord Length Frequency:")

for length, freq in self.analyze_word_lengths().items():

print(f" {length} characters: {freq}")

def __init__(self, char_freq, word_list):

for word in self.word_list:

for i in range(len(word) - 1):

for char, transitions in matrix.items():

for next_char in transitions:

for _ in range(length - 1):

if start_char not in self.transition_matrix:

def random_word(self, length):

generator = WordGenerator(analyzer.char_freq, analyzer.word_list)

print("Markov Chain Based:", generator.generate_word(6))

print("Random Word:", generator.random_word(6))

 Takes two inputs:

Word Length Frequency:

Word Sense Disambiguation (WSD) is a task in natural language processing (NLP) to

Sample List of Ambiguous Words

1. Bank (e.g., river bank, financial institution)

# Sample sentences with ambiguous words

# Function to perform Word Sense Disambiguation

# Ambiguous words and their sentences

Explanation of the Code

Ensure you have the necessary NLTK resources downloaded using:

Step 1: Install NLTK

1. Using pip: Open your terminal or command prompt and run:

2. Verify Installation: After installation, you can verify it by running the

Step 2: Download NLTK Data

1. Open a Python shell or script.

Step 3: Perform Stemming

# Initialize the stemmer

print("Original Words:", words)

Optional: Using Lancaster Stemmer

If you want a more aggressive stemmer, use the Lancaster Stemmer:

# Initialize the Lancaster Stemmer

print("Lancaster Stemmed Words:", stemmed_words)

Sample List of Words

POS Tags for the Words

Here’s the POS (Part of Speech) tagging for each word:

Function to Get POS for a Given Word

# Function to find POS

7. Write a Python program to

a.Perform Morphological Analysis using NLTK library

# Download necessary NLTK data

def init(self, text):

def init(self, char_freq, word_list):