Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
23 views5 pages

NLP Lab Manual

The Natural Language Processing Lab Manual (AIP-101) outlines a course designed to teach students various NLP programming techniques using NLTK and spaCy. The manual includes a series of experiments focusing on tasks such as tokenization, N-gram modeling, word relationships, and text preprocessing, with specific objectives and outcomes for each experiment. By the end of the course, students will have developed practical skills in NLP application development and text analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views5 pages

NLP Lab Manual

The Natural Language Processing Lab Manual (AIP-101) outlines a course designed to teach students various NLP programming techniques using NLTK and spaCy. The manual includes a series of experiments focusing on tasks such as tokenization, N-gram modeling, word relationships, and text preprocessing, with specific objectives and outcomes for each experiment. By the end of the course, students will have developed practical skills in NLP application development and text analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Natural Language Processing Lab Manual (AIP-101)

L:T:P: 0:0:2 Credits: 1

Course Outcomes

At the end of the course, the student will be able to:

1. Use the NLTK and spaCy toolkit for NLP programming.


2. Analyze various corpora for developing programs.
3. Develop various pre-processing techniques for a given corpus.
4. Develop programming logic using NLTK functions.
5. Build applications using various NLP techniques for a given corpus.

List of Programs

Experiment 1: Installation and exploring features of NLTK and spaCy tools.


Download Word Cloud and few corpora.

Objective

 To install and configure the NLTK and spaCy libraries.


 To explore and utilize basic features such as downloading corpora and generating word
clouds.

Outcomes

 Understand the process of installing and setting up NLP libraries in Python.


 Learn how to download and use corpora for further NLP tasks.
 Explore the concept of word clouds and how to generate them for corpus analysis.

Experiment 2: (i) Write a program to implement word Tokenizer, Sentence,


and Paragraph Tokenizers.
(ii) Check how many words are there in any corpus. Also, check how many
distinct words are there.

Objective

 To implement word, sentence, and paragraph tokenization using NLTK and spaCy.
 To analyze the number of total and distinct words in a given corpus.

Outcomes

 Learn how tokenization works in NLP and how to break text into smaller units.
 Gain knowledge about counting word frequencies and understanding the diversity of
vocabulary in a corpus.
Experiment 3: (i) Write a program to implement both user-defined and pre-
defined functions to generate (a) Uni-grams (b) Bi-grams (c) Tri-grams (d) N-
grams.
(ii) Write a program to calculate the highest probability of a word (w2)
occurring after another word (w1).

Objective

 To implement various N-gram models to generate unigrams, bigrams, trigrams, and general
N-grams.
 To calculate the conditional probability of one word occurring after another.

Outcomes

 Understand N-gram models and their application in NLP.


 Learn how to calculate and work with word probabilities for language modeling.

Experiment 4: (i) Write a program to identify word collocations.


(ii) Write a program to print all words beginning with a given sequence of
letters.
(iii) Write a program to print all words longer than four characters.

Objective

 To identify word collocations and patterns in a corpus.


 To extract words based on specific criteria (prefix and word length).

Outcomes

 Learn how to find common word pairs or collocations.


 Develop skills to filter words based on given patterns or word length constraints.

Experiment 5: (i) Write a program to identify the mathematical expression in


a given sentence.
(ii) Write a program to identify different components of an email address.

Objective

 To write regular expressions to identify mathematical expressions and components of email


addresses.

Outcomes

 Learn to use regular expressions for pattern matching.


 Identify structured data within unstructured text, such as email components and
mathematical expressions.

Experiment 6: (i) Write a program to identify all antonyms and synonyms of a


word.
(ii) Write a program to find hyponymy, homonymy, polysemy for a given word.
Objective

 To identify synonyms and antonyms using lexical databases like WordNet.


 To explore word relationships such as hyponymy, homonymy, and polysemy.

Outcomes

 Understand how to find relationships between words using NLP libraries.


 Explore word semantics and develop an understanding of lexical ambiguity.

Experiment 7: (i) Write a program to find all the stop words in any given text.
(ii) Write a function that finds the 50 most frequently occurring words of a text
that are not stopwords.

Objective

 To identify and remove stop words from a given text.


 To analyze the frequency distribution of non-stop words.

Outcomes

 Learn how to filter stopwords from text data.


 Understand word frequency analysis and its importance in text mining.

Experiment 8: Write a program to implement various stemming techniques and


prepare a chart with the performance of each method.

Objective

 To implement and compare different stemming techniques (e.g., Porter Stemmer, Lancaster
Stemmer).

Outcomes

 Understand stemming and its role in text preprocessing.


 Evaluate the performance of various stemming algorithms based on accuracy and efficiency.

Experiment 9: Write a program to implement various lemmatization


techniques and prepare a chart with the performance of each method.

Objective

 To implement lemmatization techniques and compare their effectiveness.

Outcomes

 Understand lemmatization and its importance in reducing words to their base form.
 Compare lemmatization techniques based on performance.
Experiment 10: (i) Write a program to implement Conditional Frequency
Distributions (CFD) for any corpus.
(ii) Find all the four-letter words in any corpus. With the help of a frequency
distribution (FreqDist), show these words in decreasing order of frequency.
(iii) Define a conditional frequency distribution over the names corpus that
allows you to see which initial letters are more frequent for males versus
females.

Objective

 To implement conditional frequency distributions and analyze text corpus.


 To explore frequency distribution and patterns in text data.

Outcomes

 Understand the concept of conditional frequency distributions and its applications.


 Analyze the frequency of words and patterns in specific corpora.

Experiment 11: (i) Write a program to implement Part-of-Speech (PoS) tagging


for any corpus.
(ii) Write a program to identify which word has the greatest number of distinct
tags. What are they, and what do they represent?
(iii) Write a program to list tags in order of decreasing frequency and what do
the 20 most frequent tags represent?
(iv) Write a program to identify which tags are nouns most commonly found
after. What do these tags represent?

Objective

 To implement part-of-speech tagging and analyze word classifications.


 To explore the distribution of PoS tags and understand their significance.

Outcomes

 Understand part-of-speech tagging and its application in NLP.


 Analyze the distribution and frequency of different PoS tags in a corpus.

Experiment 12: Write a program to implement TF-IDF for any corpus.

Objective

 To implement Term Frequency-Inverse Document Frequency (TF-IDF) and use it for text
vectorization.

Outcomes

 Understand how TF-IDF is used for text representation.


 Implement the TF-IDF algorithm for text classification or retrieval.
Experiment 13: Write a program to implement chunking and chinking for any
corpus.

Objective

 To implement chunking and chinking for identifying specific structures in text, such as noun
phrases or verb phrases.

Outcomes

 Understand the concepts of chunking and chinking in text analysis.


 Learn how to extract syntactic structures from text data.

Experiment 14: (i) Write a program to find all the mis-spelled words in a
paragraph.
(ii) Write a program to prepare a table with the frequency of mis-spelled tags
for any given text.

Objective

 To identify and correct spelling errors in text.


 To analyze mis-spelled words and their frequency distribution.

Outcomes

 Learn to detect and handle spelling mistakes in NLP tasks.


 Develop skills in error analysis and frequency distribution.

Experiment 15: Write a program to implement all the NLP Pre-Processing


Techniques required to perform further NLP tasks.

Objective

 To implement common pre-processing techniques such as tokenization, stemming,


lemmatization, stop-word removal, etc.

Outcomes

 Gain a comprehensive understanding of the essential pre-processing steps in NLP.


 Develop a pipeline for preparing text for downstream NLP tasks.

You might also like