ML4291 NATURAL LANGUAGE LTPC
PROCESSING 2023
COURSE OBJECTIVES:
To understand basics of linguistics, probability and statistics
To study statistical approaches to NLP and understand sequence labeling
To outline different parsing techniques associated with NLP
To explore semantics of words and semantic role labeling of sentences
To understand discourse analysis, question answering and chatbots
UNIT I INTRODUCTION 6
Natural Language Processing – Components - Basics of Linguistics and Probability and
Statistics – Words-Tokenization-Morphology-Finite State Automata
UNIT II STATISTICAL NLP AND 6
SEQUENCE LABELING
N-grams and Language models –Smoothing -Text classification- Naïve Bayes classifier –
Evaluation - Vector Semantics – TF-IDF - Word2Vec- Evaluating Vector Models -Sequence
Labeling – Part of Speech – Part of Speech Tagging -Named Entities –Named Entity Tagging
UNIT III CONTEXTUAL 6
EMBEDDING
Constituency –Context Free Grammar –Lexicalized Grammars- CKY Parsing – Earley's
algorithm-Evaluating Parsers -Partial Parsing – Dependency Relations- Dependency Parsing -
Transition Based - Graph Based
UNIT IV COMPUTATIONAL 6
SEMANTICS
Word Senses and WordNet – Word Sense Disambiguation – Semantic Role Labeling –
Proposition Bank- FrameNet- Selectional Restrictions - Information Extraction - Template
Filling
UNIT V DISCOURSE ANALYSIS 6
AND SPEECH
PROCESSING
Discourse Coherence – Discourse Structure Parsing – Centering and Entity Based Coherence
– Question Answering –Factoid Question Answering – Classical QA Models – Chatbots and
Dialogue systems – Frame-based Dialogue Systems – Dialogue–State Architecture
TOTAL : 30 PERIODS
SUGGESTED ACTIVITIES:
1. Probability and Statistics for NLP Problems
2. Carry out Morphological Tagging and Part-of-Speech Tagging for a sample text
3. Design a Finite State Automata for more Grammatical Categories
4. Problems associated with Vector Space Model
5. Hand Simulate the working of a HMM model
6. Examples for different types of work sense disambiguation
7. Give the design of a Chatbot
PRACTICAL EXERCISES: PERIODS : 30
1. Download nltk and packages. Use it to print the tokens in a document and the sentences
from it.
2. Include custom stop words and remove them and all stop words from a given document
using nltk or spaCY package
3. Implement a stemmer and a lemmatizer program.
4. Implement a simple Part-of-Speech Tagger
5. Write a program to calculate TFIDF of documents and find the cosine similarity between any
two documents.
6. Use nltk to implement a dependency parser.
7. Implement a semantic language processor that uses WordNet for semantic tagging.
8. Project - (in Pairs) Your project must use NLP concepts and apply them to some data.
a. Your project may be a comparison of several existing systems, or it may propose a new
system in which case you still must compare it to at least one other approach.
b. You are free to use any third-party ideas or code that you wish as long as it is publicly
available.
c. You must properly provide references to any work that is not your own in the write-up.
d. Project proposal You must turn in a brief project proposal. Your project proposal should
describe the idea behind your project. You should also briefly describe software you will need
to write, and papers (2-3) you plan to read.
List of Possible Projects
1. Sentiment Analysis of Product Reviews
2. Information extraction from News articles
3. Customer support bot
4. Language identifier
5. Media Monitor
6. Paraphrase Detector
7. Identification of Toxic Comment
8. Spam Mail Identification
COURSE OUTCOMES:
CO1: Understand basics of linguistics, probability and statistics associated with NLP
CO2: Implement a Part-of-Speech Tagger
CO3: Design and implement a sequence labeling problem for a given domain
CO4: Implement semantic processing tasks and simple document indexing and searching
system using the concepts of NLP
CO5:: Implement a simple chatbot using dialogue system concepts
TOTAL : 60 PERIODS
REFERENCES
1. Daniel Jurafsky and James H.Martin, “Speech and Language Processing: An Introduction to
Natural Language Processing, Computational Linguistics and Speech Recognition” (Prentice
Hall Series in Artificial Intelligence), 2020
2. Jacob Eisenstein. “Natural Language Processing “, MIT Press, 2019
3. Samuel Burns “Natural Language Processing: A Quick Introduction to NLP with Python and
NLTK, 2019
4. Christopher Manning, “Foundations of Statistical Natural Language Processing”, MIT Press,
2009.
5. Nitin Indurkhya,Fred J. Damerau, “Handbook of Natural Language Processing”, Second
edition, Chapman & Hall/CRC: Machine Learning & Pattern Recognition, Hardcover,2010
6. Deepti Chopra, Nisheeth Joshi, “Mastering Natural Language Processing with Python”,
Packt Publishing Limited, 2016
7. Mohamed Zakaria Kurdi “Natural Language Processing and Computational Linguistics:
Speech, Morphology and Syntax (Cognitive Science)”, ISTE Ltd., 2016
8. Atefeh Farzindar,Diana Inkpen, “Natural Language Processing for Social Media
(Synthesis Lectures on Human Language Technologies)”, Morgan and Claypool Life
Sciences, 2015