0% found this document useful (0 votes)

505 views540 pages

NLP StudyMaterial

This document provides an overview of natural language processing (NLP) techniques including rule-based NLP, statistical model-based NLP using Penn Treebank and Conditional Random Fields (CRFs), and modern approaches like Word2Vec, sequence-to-sequence models, and Transformers. It discusses tasks like named entity recognition, part-of-speech tagging, parsing and compares different NLP methods.

Uploaded by

tegeje5009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

505 views540 pages

NLP StudyMaterial

Uploaded by

tegeje5009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 540

Natural Language

Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 1
Syllabus

Syllabus
3
Module 1:
Introduction: Knowledge in Speech and Language
Processing- Ambiguity- Models and Algorithms-
Language, Thought, and Understanding- The
State of the Art and the Near-Term Future – 4

Regular Expressions-Basic Regular Expression

1990s: The use of large corpora and the 11

development of the Penn Treebank

revolutionize NLP. Introduction of part-of-speech
tagging and syntactic parsing.

2000s: More sophisticated statistical models like

Conditional Random Fields (CRFs) and word
embeddings (Word2Vec, GloVe) emerge. Shift
towards data-driven approaches.
CSA4006-Dr. Anirban Bhowmick
Brief History of NLP
Deep Learning and Modern NLP (2010s-Present)

2010s: Deep Learning redefines NLP with neural network

architectures like Recurrent Neural Networks (RNNs) and
Convolutional Neural Networks (CNNs).

2013: Introduction of Word2Vec by Mikolov et al., which learns word

embeddings from large text corpora. 12

2014: "Sequence to Sequence" models enable breakthroughs in

machine translation.

2018: Transformers, exemplified by the BERT model, revolutionize

NLP tasks by learning contextualized word representations.

Present: State-of-the-art models like GPT-3.5 achieve remarkable

performance across a wide range of NLP tasks using massive
amounts of data and computation.
CSA4006-Dr. Anirban Bhowmick
NLP-Rule based
Rule-based Natural Language Processing (NLP) is an approach to language processing that relies on a
set of predefined rules and patterns to analyze and extract information from text data. It contrasts with
machine learning-based NLP, which uses algorithms and models to learn patterns and make predictions
from data

Rule: If a text contains a date in the format

"dd/mm/yyyy" or "dd-mm-yyyy," extract it.
13

Example Text: "The project deadline is 25/09/2023,

and the meeting is scheduled for 30-09-2023."

Rule-Based NLP Output:

Extracted Date: "25/09/2023"

Extracted Date: "30-09-2023"

CSA4006-Dr. Anirban Bhowmick

NLP- Statistical model based
Statistical model-based Natural Language Processing (NLP) relies on the use of statistical techniques
and machine learning algorithms to analyze and understand text data. Unlike rule-based NLP, which relies
on predefined rules and patterns, statistical model-based NLP learns patterns and relationships from data

Task: Text Classification

Statistical Model: Support Vector Machine (SVM)

Example: Sentiment Analysis

CSA4006-Dr. Anirban Bhowmick

NLP-Penn Treebank based
The Penn Treebank is a widely used dataset in Natural Language Processing (NLP) that provides
annotated syntactic and structural information for English text. It uses a tree structure to represent the
grammatical and syntactic relationships within sentences. One common application of Penn Treebank-
based NLP is parsing sentences to analyze their grammatical structure

Task: Sentence Parsing Part-of-Speech (POS) Tagging: Each

token is assigned a POS tag that
"The quick brown fox jumps over the lazy dog." represents its grammatical category (e.g., 15

Tokenization: The sentence is first tokenized into noun, verb, adjective). Here is an example
individual words and punctuation marks. In this case, of the sentence with POS tags:
the sentence is tokenized as follows: [("The", "DT"), ("quick", "JJ"), ("brown",
["The", "quick", "brown", "fox", "jumps", "over", "the", "JJ"), ("fox", "NN"), ("jumps",
"lazy", "dog", "."]

CSA4006-Dr. Anirban Bhowmick

NLP-Penn Treebank based
Parsing: The Penn Treebank-based NLP system uses syntactic rules and information to parse the
sentence into a tree structure that represents its grammatical and syntactic relationships. The resulting
parse tree for the example sentence might look like this:

(S
(NP (DT The) (JJ quick) (JJ brown) (NN fox))
(VP (VBZ jumps) (PP (IN over) (NP (DT the) (JJ lazy) (NN dog))))
16
(. .))

In this parse tree, "S" represents the sentence, "NP" represents a noun phrase, "VP" represents a verb
phrase, "DT" represents a determiner, "JJ" represents an adjective, "NN" represents a noun, "VBZ"
represents a verb, and "IN" represents a preposition. The tree structure captures the hierarchical
relationships between the words in the sentence.

CSA4006-Dr. Anirban Bhowmick

NLP-CRFs
Conditional Random Fields (CRFs) are a popular machine learning model used in Natural Language
Processing (NLP) for sequence labeling tasks, such as named entity recognition (NER), part-of-speech
tagging (POS), and chunking. CRFs are particularly effective at capturing dependencies between adjacent
labels in a sequence.
Example Sentence:
"Apple Inc. is headquartered in Cupertino, California."
Label Sequence (NER Tags): 17

Topic: Introduction
Applications of NLP
Communication With Machines

CSA4006-Dr. Anirban Bhowmick

Applications of NLP
Conversational Agents Conversational agents contain:
Building AI systems that can engage ● Speech recognition
in natural-sounding conversations ● Language analysis
with users. Used in customer ● Dialogue processing
support, virtual companions, and ● Information retrieval
mental health apps. ● Text to speech

Question Answering Text Generation

Developing systems that can Creating human-like text using
understand and answer questions models like OpenAI's GPT-3.
posed in natural language. Used Applications range from creative
in chatbots, virtual assistants, and writing to chatbots.
information retrieval.

CSA4006-Dr. Anirban Bhowmick

Applications of NLP
Machine Translation Sentiment Analysis
Automatically translating text from Analyzing text to determine the
one language to another. Google sentiment (positive, negative, neutral)
Translate and other translation expressed by the author. Applications
services heavily rely on NLP include brand monitoring, customer
techniques. feedback analysis, and social media
sentiment tracking.
10

Information Retrieval Named Entity Recognition (NER)

Improving search engines by Identifying entities like names, dates,
understanding user queries and locations, and more within a text. Used in
retrieving relevant information from a information extraction, chatbots, and language
large dataset. translation.

CSA4006-Dr. Anirban Bhowmick

Level Of Linguistic Knowledge
1. Phonetics and Phonology
At this level, NLP systems consider the sounds of speech. It involves understanding the
phonemes (distinct speech sounds) and the rules governing their pronunciation, as well as the
intonation patterns and stress in spoken language.

2. Morphology
Morphology deals with the internal structure of words and how they are formed from smaller units
11
called morphemes. Morphological analysis helps in tasks like stemming (reducing words to their
base form) and lemmatization (reducing words to their dictionary form).

3. Syntax
Syntax involves the rules governing the structure of sentences. It includes understanding how
words combine to form phrases and sentences, and the relationships between different parts of
speech. Parsing techniques are used to analyze sentence structure.

CSA4006-Dr. Anirban Bhowmick

Level Of Linguistic Knowledge
4. Semantics
Semantics is the study of meaning in language. NLP systems at this level aim to understand the
meaning of individual words, phrases, and sentences. This can involve tasks like word sense
disambiguation (determining the correct meaning of a word based on context) and semantic role
labeling (identifying the roles of words in a sentence, e.g., subject, object).
12
5. Pragmatics
Pragmatics refers to the use of language in context. It involves understanding implied meaning,
indirect speech acts, and the intentions behind statements. This level is crucial for understanding
sarcasm, irony, and other forms of figurative language.

6. Discourse
Discourse refers to the structure and organization of connected text or speech. NLP systems at this level
consider how sentences relate to each other and form coherent paragraphs or dialogues. Coreference
resolution (identifying which words refer to the same entity) is an important task in discourse analysis.

CSA4006-Dr. Anirban Bhowmick

Why NLP is Hard?

1. Ambiguity
2. Scale
3. Sparsity
4. Variation
13
5. Expressivity
6. Unmodeled Variables
7. Unknown representations

CSA4006-Dr. Anirban Bhowmick

Ambiguity
Ambiguity at multiple levels

Word senses: bank (finance or river ?)

Part of speech: chair (noun or verb ?)
Syntactic structure: I can see a man with a telescope 14

Multiple: I made her duck

Semantic: Time flies like an arrow; fruit flies like a banana
Phonological: I scream, you scream, we all scream for ice cream."
(The words "I scream" and "ice cream

CSA4006-Dr. Anirban Bhowmick

Ambiguity

15
These different meanings are caused by a number of ambiguities.
First, the words duck and her are morphologically or syntactically ambiguous in their part-of-
speech. Duck can be a verb or a noun, while her can be a dative pronoun or a possessive
pronoun. Second, the word make is semantically ambiguous; it can mean create or cook. Finally,
the verb make is syntactically ambiguous in a different way. Make can be transitive, that is, taking
a single direct object, or it can be ditransitive, that is, taking two objects, meaning that the first
object (her) was made into the second object (duck). Finally, make can take a direct object and a
verb, meaning that the object (her) was caused to perform the verbal action (duck). Furthermore, in
a spoken sentence, there is an even deeper kind of ambiguity; the first word could have been eye or
the second word maid.
CSA4006-Dr. Anirban Bhowmick
Ambiguity
We often introduce the models and
algorithms we present throughout the book
as ways to resolve or disambiguate these
ambiguities. For example, deciding whether
duck is a verb or a noun can be solved by
part-of-speech tagging. Deciding whether 16
make means “create” or “cook” can be
solved by word sense disambiguation.
Resolution of part-of-speech and word
sense ambiguities are two important kinds of
lexical disambiguation

Note: Word Sense Disambiguation (WSD) is a natural language

processing (NLP) task that focuses on determining the correct meaning or
sense of a word in a given context.

CSA4006-Dr. Anirban Bhowmick

Scale
Scale in NLP refers to the challenges and opportunities posed by the vast amounts of linguistic data
available for analysis. The scale of data in NLP presents both technical and computational challenges,
but it also enables the development of more sophisticated models and applications.

Challenges of Scale
Data Collection: Gathering and annotating large-scale linguistic data is resource-intensive and time-
consuming. 17

Computational Resources: Processing and analyzing massive datasets require significant

computational power and memory.

Model Complexity: More data often leads to larger and more complex models, which may require
specialized hardware and efficient training techniques.

Noise and Quality: As datasets grow, ensuring data quality becomes crucial, as noise can negatively
impact model performance.

CSA4006-Dr. Anirban Bhowmick

Scale
Opportunities of Scale
Improved Models: Large datasets enable the training
of more accurate and robust NLP models that can
capture subtle linguistic nuances.

Generalization: Models trained on extensive data have

the potential to generalize better across various 18
domains and languages.

Transfer Learning: Pretrained models on massive

datasets can be fine-tuned for specific tasks, reducing
the need for extensive task-specific data.

Multilingualism: Large-scale data allows models to

learn from multiple languages, enabling multilingual
applications.

CSA4006-Dr. Anirban Bhowmick

Sparsity
Sparsity is a common challenge in Natural Language Processing (NLP) that arises due to the vast and
diverse nature of human language. In NLP, sparsity refers to the phenomenon where the data space is
extremely large, but the actual data available for any specific point in that space is very limited. This can
have significant implications for various NLP tasks and models.

Causes of Sparsity in NLP

Vocabulary Size: Natural languages have extensive vocabularies with numerous words, many of which are 19
rare or domain-specific. The majority of words appear infrequently in any given text corpus.

Long Tail Distribution: The frequency distribution of words follows a "long tail" pattern, where a few
common words appear frequently, while the majority of words occur rarely.

Named Entities: Entities like names, locations, dates, and specialized terms are sparse in most text data.

CSA4006-Dr. Anirban Bhowmick

4. Advances in understanding of language in social

context

CSA4006-Dr. Anirban Bhowmick

Regular Expressions
Regular expressions (regex) are powerful tools used in Natural Language Processing
(NLP) to match and manipulate text patterns. They provide a concise and flexible way to
search, extract, and manipulate textual data.

Imagine you needed to search a string for a term, such as “Phone”.

16
“phone” in “Is the phone here?”
>>> True

Imagine you needed to search a Phone number, “91-98765-43210”, we can do the same:

“91-98765-43210” in “Her phone number is 91-98765-43210”

>>> True

Negations [^Ss] Carat means negation only when first in []

CSA4006-Dr. Anirban Bhowmick

Regular Expression
Pattern Matches
colou?r Optional previous char color colour

oo*h! 0 or more of previous char oh! ooh! oooh! ooooh!

o+h! 1 or more of previous char oh! ooh! oooh! ooooh!

baa+ baa baaa baaaa baaaaa

20
beg.n any character between beg and n begin begun begun beg3n

Regular Expressions: Anchors ^ $

• Identiﬁer: letterfollowed by ≥0 letters or digits.

[a-z][a-z0-9]* i count1 number2go
• TATA box: TATxyT where x or y is A
TAT(A.|.A)T TATAAT TATAgT TATcAT
• Number: one or more digits with optional
decimal point, exponent.
\d+\.?\d*(E[+-]?\d+)? 3.14 6.02E+23
Another Example
Repressed binding sites in regular Python

# assume we have a genome sequence in string variable myDNA

for index in range(0,len(myDNA)-20) :
if (myDNA[index] == "A" or myDNA[index] == "G") and
(myDNA[index+1] == "A" or myDNA[index+1] == "G") and
(myDNA[index+2] == "A" or myDNA[index+2] == "G") and
(myDNA[index+3] == "C") and
(myDNA[index+4] == "C") and
# and on and on!
(myDNA[index+19] == "C" or myDNA[index+19] == "T") :
print "Match found at ",index
break

6
Example

re.ﬁndall(r"[AG]{3,3}CATG[TC]{4,4}[AG]{2,2}C[AT]TG[CT][CG][TC]", myDNA)
RegExprs in Python

http://docs.python.org/library/re.html
Simple RegExpr Testing
>>> import re
>>> str1 = 'what foot or hand fell fastest'
>>> re.findall(r'f[a-z]*', str1)
['foot', 'fell', 'fastest'] Deﬁnitely
recommend trying
>>> str2 = "I lack e's successor" this with examples
>>> re.findall(r'f[a-z]*',str2) to follow, & more
[]

Returns list of all matching substrings.

Exercise: change it to ﬁnd strings
starting with f and ending with t
Exercise: In honor of the
winter Olympics, “-ski-ing”
• download & save war_and_peace.txt
• write py program to read it line-by-line, use
re.ﬁndall to see whether current line contains
one or more proper names ending in “...ski”;
print each. ['Bolkonski']
['Bolkonski']
['Bolkonski']
• mine begins: ['Bolkonski']
['Bolkonski']
['Razumovski']
['Razumovski']
['Bolkonski']
['Spasski']
...
['Nesvitski', 'Nesvitski']
RegExpr Syntax

They’re strings
Most punctuation is special; needs to be
escaped by backslash (e.g., “\.” instead of “.”) to
get non-special behavior
So, “raw” string literals (r’C:\new\.txt’) are
generally recommended for regexprs
Unless you double your backslashes judiciously
Patterns “Match” Text

Pattern: TAT(A.|.A)T [a-z][a-z0-9]*

Text: RATATaAT TAT! count1

RegExpr Semantics, 1
Characters

RexExprs are patterns; they “match” sequences

of characters
Letters, digits (& escaped punctuation like ‘\.’)
match only themselves, just once
r’TATAAT’ ‘ACGTTATAATGGTATAAT’
RegExpr Semantics, 2
Character Groups
Character groups [abc], [a-zA-Z], [^0-9] also
match single characters, any of the characters
in the group.
Shortcuts (2 of many):
. – (just a dot) matches any letter (except newline)
\s ≡ [ \n\t\r\f\v] (“s” for “space”)

r’T[AG]T[^GC].T’‘ACGTTGTAATGGTATnCT’
Matching one of several alternatives

• Square brackets mean that any of the listed characters will do

• [ab] means either ”a” or ”b”

• You can also give a range:

• [a-d] means ”a” ”b” ”c” or ”d”

• Negation: caret means ”not”

[^a-d] # anything but a, b, c or d

8
RegExpr Semantics, 3:
Concatenation, Or, Grouping
You can group subexpressions with parens
If R, S are RegExprs, then
RS matches the concatenation of strings matched
by R, S individually
R | S matches the union–either R or S

?
r’TAT(A.|.A)T’’TATCATGTATACTCCTATCCT’
RegExpr Semantics, 4
Repetition
If R is a RegExpr, then
R* matches 0 or more consecutive strings
(independently) matching R
R+ 1 or more
R{n} exactly n
R{m,n} any number between m and n, inclusive
R? 0 or 1
Beware precedence (* > concat > |) ?
r’TAT(A.|.A)*T’‘TATCATGTATACTATCACTATT’
RegExprs in Python

By default
Case sensitive, line-oriented (\n treated specially)
Matching is generally “greedy”
Finds longest version of earliest starting match
Next “ﬁndall()” match will not overlap

r".+\.py" "Two files: hw3.py and upper.py."

r"\w+\.py" "Two files: hw3.py and UPPER.py."

Exercise 3

Suppose “ﬁlenames” are upper or lower case

letters or digits, starting with a letter, followed
by a period (“.”) followed by a 3 character
extension (again alphanumeric). Scan a list of
lines or a ﬁle, and print all “ﬁlenames” in it,
without their extensions. Hint: use paren
groups.
Solution 3

import sys
import re
filename = sys.argv[1]
filehandle = open(filename,"r")
filecontents = filehandle.read()
myrule = re.compile(
r"([a-zA-Z][a-zA-Z0-9]*)\.[a-zA-Z0-9]{3}")
#Finds skidoo.bar amidst 23skidoo.barber; ok?
match = myrule.findall(filecontents)
print match
Basics of regexp construction

• Letters and numbers match themselves

• Normally case sensitive

• Watch out for punctuation–most of it has special meanings!

7
Wild cards

• ”.” means ”any character”

• If you really mean ”.” you must use a backslash

• WARNING:
– backslash is special in Python strings
– It’s special again in regexps
– This means you need too many backslashes
– We will use ”raw strings” instead
– Raw strings look like r"ATCGGC"

9
Using . and backslash

• To match ﬁle names like ”hw3.pdf” and ”hw5.txt”:

hw.\....

10
Zero or more copies

• The asterisk repeats the previous character 0 or more times

• ”ca*t” matches ”ct”, ”cat”, ”caat”, ”caaat” etc.

• The plus sign repeats the previous character 1 or more times

• ”ca+t” matches ”cat”, ”caat” etc. but not ”ct”

11
Repeats

• Braces are a more detailed way to indicate repeats

• A{1,3} means at least one and no more than three A’s

• A{4,4} means exactly four A’s

12
simple testing

>>> import re
>>> string = 'what foot or hand fell fastest'
>>> re.findall(r'f[a-z]*', string)
['foot', 'fell', 'fastest']
Practice problem 1

• Write a regexp that will match any string that starts with ”hum” and
ends with ”001” with any number of characters, including none, in
between

• (Hint: consider both ”.” and ”*”)

13
Practice problem 2

• Write a regexp that will match any Python (.py) ﬁle.

• There must be at least one character before the ”.”

• ”.py” is not a legal Python ﬁle name

• (Imagine the problems if you imported it!)

14
Using the regexp

First, compile it:

import re
myrule = re.compile(r".+\.py")
print myrule
<_sre.SRE_Pattern object at 0xb7e3e5c0>

The result of compile is a Pattern object which represents your regexp

15
Using the regexp

Next, use it:

mymatch = myrule.search(myDNA)
print mymatch
None
mymatch = myrule.search(someotherDNA)
print mymatch
<_sre.SRE_Match object at 0xb7df9170>

The result of match is a Match object which represents the result.

16
All of these objects! What can they do?

Functions offered by a Pattern object:

• match()–does it match the beginning of my string? Returns None or a

match object

• search()–does it match anywhere in my string? Returns None or a

match object

• findall()–does it match anywhere in my string? Returns a list of

strings (or an empty list)

• Note that findall() does NOT return a Match object!

17
All of these objects! What can they do?

Functions offered by a Match object:

• group()–return the string that matched

group()–the whole string
group(1)–the substring matching 1st parenthesized sub-pattern
group(1,3)–tuple of substrings matching 1st and 3rd parenthesized
sub-patterns

• start()–return the starting position of the match

• end()–return the ending position of the match

• span()–return (start,end) as a tuple

18
A practical example

Does this string contain a legal Python filename?

import re
myrule = re.compile(r".+\.py")
mystring = "This contains two files, hw3.py and uppercase.py."
mymatch = myrule.search(mystring)
print mymatch.group()
This contains two files, hw3.py and uppercase.py
# not what I expected! Why?

19
Matching is greedy

• My regexp matches ”hw3.py”

• Unfortunately it also matches ”This contains two ﬁles, hw3.py”

• And it even matches ”This contains two ﬁles, hw3.py and uppercase.py”

• Python will choose the longest match

• I could break my ﬁle into words ﬁrst

• Or I could specify that no spaces are allowed in my match

20
A practical example

Does this string contain a legal Python filename?

import re
myrule = re.compile(r"[^ ]+\.py")
mystring = "This contains two files, hw3.py and uppercase.py."
mymatch = myrule.search(mystring)
print mymatch.group()
hw3.py
allmymatches = myrule.findall(mystring)
print allmymatches
[’hw3.py’,’uppercase.py’]

21
Practice problem 3

• Create a regexp which detects legal Microsoft Word ﬁle names

• The ﬁle name must end with ”.doc” or ”.DOC”

• There must be at least one character before the dot.

• We will assume there are no spaces in the names

• Print out a list of all the legal ﬁle names you ﬁnd

• Test it on testre.txt (on the web site)

22
Practice problem 4

• Create a regexp which detects legal Microsoft Word file names that do
not contain any numerals (0 through 9)

• Print out the start location of the first such filename you encounter

• Test it on testre.txt

23
Practice problem

• Create a regexp which detects legal Microsoft Word ﬁle names that do
not contain any numerals (0 through 9)

• Print out the “base name”, i.e., the ﬁle name after stripping of the .doc
extension, of each such ﬁlename you encounter. Hint: use parenthesized
sub patterns.

• Test it on testre.txt

24
Practice problem 1 solution

Write a regexp that will match any string that starts with ”hum” and ends
with ”001” with any number of characters, including none, in between

myrule = re.compile(r"hum.*001")

25
Practice problem 2 solution

Write a regexp that will match any Python (.py) file.

myrule = re.compile(r".+\.py")

# if you want to find filenames embedded in a bigger

# string, better is:
myrule = re.compile(r"[^ ]+\.py")
# this version does not allow whitespace in file names

26
Practice problem 3 solution

Create a regexp which detects legal Microsoft Word file names, and use it
to make a list of them

import sys
import re
filename = sys.argv[1]
filehandle = open(filename,"r")
filecontents = filehandle.read()
myrule = re.compile(r"[^ ]+\.[dD][oO][cC]")
matchlist = myrule.findall(filecontents)
print matchlist

27
Practice problem 4 solution

Create a regexp which detects legal Microsoft Word file names which do
not contain any numerals, and print the location of the first such filename
you encounter

import sys
import re
filename = sys.argv[1]
filehandle = open(filename,"r")
filecontents = filehandle.read()
myrule = re.compile(r"[^ 0-9]+\.[dD][oO][cC]")
match = myrule.search(filecontents)
print match.start()

28
Regular expressions summary

• The re module lets us use regular expressions

• These are fast ways to search for complicated strings

• They are not essential to using Python, but are very useful

• File format conversion uses them a lot

• Compiling a regexp produces a Pattern object which can then be used

to search

• Searching produces a Match object which can then be asked for

information about the match

29
Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 4
Syllabus

Regular Expressions-Basic Regular Expression

CSA4006-Dr. Anirban Bhowmick

Formally

State Transition Table for SheepTalk

CSA4006-Dr. Anirban Bhowmick

Recognition and Rejection
The machine starts in the start state (q0), and iterates the following process: Check the next letter of
the input. If it matches the symbol on an arc leaving the current state, then cross that arc, move to the
next state, and also advance one symbol in the input. If we are in the accepting state (q4) when we
run out of input, the machine has successfully recognized an instance of sheeptalk. If the machine
never gets to the final state, either because it runs out of input, or it gets some input that doesn’t match
an arc or if it just happens to get stuck in some non-final state, we say the machine rejects or fails to
accept an input 14

Tape metaphor: a rejected input

CSA4006-Dr. Anirban Bhowmick

D-Recognize
The algorithm is called D-RECOGNIZE for
“deterministic recognizer”. D-RECOGNIZE
begins by setting the variable index to the
beginning of the tape, and current-state to
the machine’s initial state. D-RECOGNIZE
then enters a loop that drives the rest of the
15
algorithm. It first checks whether it has
reached the end of its input. If so, it either
accepts the input (if the current state is an
accept state) or rejects the input
(if not).

CSA4006-Dr. Anirban Bhowmick

D-Recognize
Before examining the beginning of the tape, the machine is in
state q0. Finding a b
on input tape, it changes to state q1 as indicated by the contents
of transition-table[q0,b]. It then finds an a and switches to state q2,
another a puts it in state q3, a third a leaves it in state q3, where it
reads the “!”, and switches to state q4. Since there is no more
input, the End of input condition at the beginning of the loop is
satisfied for the first time and the machine halts in q4. State q4 is 16
an accepting state, and so the machine has accepted the string
baaa! as a sentence in the sheep language. The algorithm will fail
whenever there is no legal transition for a given combination of
state and input. The input abc will fail to be recognized since there
is no legal transition out of state q0 on the input a. Even if the
automaton had allowed an initial a it would have certainly failed on
c, since c isn’t even in the sheeptalk alphabet! We can think of
these “empty” elements in the table as if they all pointed at one
“empty” state, which we might call the fail state or sink state.

CSA4006-Dr. Anirban Bhowmick

Formal Language
A formal language is a set of strings, each string composed of symbols from a finite
symbol-set called an alphabet (the same alphabet used above for defining an
automaton!). The alphabet for the sheep language is the set ∑ = {a,b, !}. Given a model m
(such as a particular FSA), we can use L(m) to mean “the formal language characterized
by m”. So the formal language defined by our sheeptalk automaton m in is the infinite set:
17

Concatenative Morphology & Non Concatenative
Morphology
Prefixes and suffixes are often called concatenative morphology since a word is composed of a
number of morphemes concatenated together
 Circumfixes (Not in English)
 Eg: In German, for example
 The past participle of some verbs formed by adding ge to the beginning of the stem and t to the
end
 so the past participle of the verb sagen (to say) is gesagt (said). 24

A number of languages have extensive non concatenative morphology, in which morphemes are
combined in more complex ways
 Another kind of non concatenative morphology is called templatic morphology or root and pattern
morphology This is very common in Arabic, Hebrew, and other Semitic languages

CSA4006-Dr. Anirban Bhowmick

Non Concatenative Morphology
In Hebrew, for example, a verb is constructed using two components a root, consisting usually of three
consonants ( and carrying the basic meaning, and a template, which gives the ordering of consonants
and vowels and specifies more semantic information about the resulting verb, such as the semantic
voice (e g active, passive, middle)

The Hebrew tri consonantal root lmd meaning ‘learn’ or ‘study’ can be combined with the active voice 25
CaCaC template to produce the word lamad,‘he studied’
 The intensive CiCeC template to produce the word limed, ‘he taught’
 The intensive passive template CuCaC to produce the word lumad ‘he was taught’

CSA4006-Dr. Anirban Bhowmick

EEE1001-Dr. Anirban Bhowmick

Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 6
Syllabus

Syllabus
3
Module 2:
Morphology And Finite-State Transducers:
Inflectional Morphology -Derivational
Morphology- Finite-State Morphological Parsing-
The Lexicon and Morphotactics - Morphological 4

Parsing with Finite-State Transducers-

Combining FST Lexicon and Rules- Lexicon-free
FSTs: The Porter Stemmer- Human
Morphological Processing- Speech Sounds
and Phonetic Transcription- The Phoneme and
Phonological Rules
Text Books:
Daniel Jurafsky and James H. Martin "Speech and
Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and
Speech recognition", Prentice Hall, 2nd edition, 2008. 5
Reference Books:
1. Roland R. Hausser "Foundations of Computational
Linguistics: Human- Computer Communication in
Natural Language", Paperback, MIT Press, 2011.
2. Christopher D. Manning and Hinrich Schuetze,
6
"Foundations of Statistical Natural Language
Processing" by MIT Press.
Module 1:
Introduction

Topic: Regular Expression

Review

CSA4006-Dr. Anirban Bhowmick

Module 2:
Morphology And Finite-State Transducers:
Inflectional Morphology -Derivational
Morphology- Finite-State Morphological Parsing-
The Lexicon and Morphotactics - Morphological 9

Parsing with Finite-State Transducers-

10
Orthographic rules are general rules used when breaking a word into its stem and modifiers. An
example would be: singular English words ending with -y, when pluralized, end with -ies. Contrast this
to morphological rules which contain corner cases to these general rules. Both of these types of rules
are used to construct systems that can do morphological parsing

Morphological rules tell us the plural of goose is formed by changing the vowel.

CSA4006-Dr. Anirban Bhowmick

Morphemes
Morphemes: Morphemes are the smallest units of meaning in a language.

For example the word fox consists of a single morpheme (the morpheme fox) while the word cats
consists of two the morpheme cat and the morpheme s

Inflection
In English, only nouns, verbs, and sometimes adjectives can be inflected, and the number of affixes
is quite small.

English nouns have only two kinds of inflection: an affix that marks plural and an affix that marks
possessive. For example, many (but not all) English nouns can either appear in the bare stem or
singular form, or take a plural suffix. Here are examples of the regular plural suffix -s (also spelled -es),
and irregular plurals: 16

CSA4006-Dr. Anirban Bhowmick

Inflection

The irregular verbs are those that

have some more or less
idiosyncratic forms of inflection

CSA4006-Dr. Anirban Bhowmick

Inflection
An irregular verb can inflect in the past form (also called the preterite) by changing its vowel (eat/ate), or
its vowel and some consonants (catch/caught), or with no ending at all (cut/cut).

Orthographic Rules and FSTs

These spelling changes can be thought as taking as input a simple concatenation of morphemes and
producing as output a slightly-modified concatenation of morphemes

CSA4006-Dr. Anirban Bhowmick

Orthographic Rules and FSTs
We note that concatenating the morphemes can work to parse the words like “dog”, “cat”, “fox”, but this
simple method does not work when there is spelling change, like “foxes” is to be parsed into lexicons “fox
+N +PL” or “cats” is to be parsed into “cat +N +PL”, etc. This requires introduction of spelling rules
(also called orthographic rules). To account for the spelling rules, we introduce another tape, called
intermediate tape, which produces the output slightly modified, thus going from 2-level to 3-level
morphology. Such a rule maps from intermediate tape to surface tape. For plural nouns, the rule states,
“insert e on the surface tape just when lexical tape has a morpheme ending in x or z or s and next 17
morpheme is -s”. The examples are ox to oxes, and fox to foxes. The rule is stated as,

The above equation is called Chomsky and Hall notation. A rule of the form a → b/c − d means rewrite
a as b, when it occurs between c and d. Since symbol " is null, replacing it means inserting some thing.
The symbol ∧ indicates morpheme boundary. These boundaries are deleted by including the symbol ∧ :
" in the default pairs for the transducer.

CSA4006-Dr. Anirban Bhowmick

Orthographic Rules and FSTs

● Lexical: foxes +N +Pl

● Intermediate: fox^s# 18
● Surface: foxes

The transducer for the E insertion rule

• Core task – speech recognition acoustic waveform  output a string of words

–Text to speech synthesis
Sequence of text words  output an acoustic waveform

A speech recognition system needs to have a pronunciation for every word it can recognize, and a
text-to-speech system needs to have a pronunciation for every word it can say

CSA4006-Dr. Anirban Bhowmick

Contd.
• The science of phonetics aims to describe all the sounds of all the world’s languages

– Acoustic phonetics: focuses on the physical properties of the sounds of language

– Auditory phonetics: focuses on how listeners perceive the sounds of language

– Articulatory phonetics: focuses on how the vocal tract produces the sounds of language

 Phonetic alphabets: Pronunciation part of the field of phonetics

EEE1001-Dr. Anirban Bhowmick

Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 9
Syllabus

Syllabus
3
Module 2:
Morphology And Finite-State Transducers:
Inflectional Morphology -Derivational
Morphology- Finite-State Morphological Parsing-
The Lexicon and Morphotactics - Morphological 4

Parsing with Finite-State Transducers-

Combining FST Lexicon and Rules- Lexicon-free
FSTs: The Porter Stemmer- Human
Morphological Processing- Speech Sounds
and Phonetic Transcription- The Phoneme and
Phonological Rules
Text Books:
Daniel Jurafsky and James H. Martin "Speech and
Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and
Speech recognition", Prentice Hall, 2nd edition, 2008. 5
Reference Books:
1. Roland R. Hausser "Foundations of Computational
Linguistics: Human- Computer Communication in
Natural Language", Paperback, MIT Press, 2011.
2. Christopher D. Manning and Hinrich Schuetze,
6
"Foundations of Statistical Natural Language
Processing" by MIT Press.
Module 2:
Morphology

Topic: FSA and FST

Vocal Organ Most speech sounds are produced by pushing air through the
vocal cords
– Glottis = the opening between the vocal cords
– Larynx = ‘voice box’

– Pharynx = tubular part of the throat above the larynx

– Oral cavity = mouth

– Nasal cavity = nose and the passages connecting it to the

CSA4006-Dr. Anirban Bhowmick

Consonants: Place of Articulation
Consonants are sounds produced with some restriction or closure in the vocal tract
• Consonants are classified based in part on where in the vocal tract the airflow is being restricted (the
place of articulation)
• The major places of articulation are bilabial, labiodental, interdental, alveolar, palatal, velar, uvular,
and glottal

CSA4006-Dr. Anirban Bhowmick

Consonants: Place of Articulation
1.Bilabial: The airflow is obstructed by bringing both lips together.
Example: /p/ in "pat," /b/ in "bat," /m/ in "mat."
2.Labiodental: The airflow is obstructed by placing the upper teeth against the lower lip.
Example: /f/ in "fan," /v/ in "van."
3.Interdental: The airflow is obstructed by placing the tip of the tongue between the teeth.
Example: /θ/ in "think," /ð/ in "this."
4.Alveolar: The airflow is obstructed by raising the front part of the tongue to the alveolar ridge, which is the bony
ridge just behind the upper front teeth.
Example: /t/ in "top," /d/ in "dog," /s/ in "sock."
5.Alveopalatal (or Palatoalveolar): The airflow is obstructed by raising the front part of the tongue to the area just
behind the alveolar ridge.
Example: /ʃ/ in "shoe," /ʒ/ in "measure," /tʃ/ in "cheese," /dʒ/ in "judge."
6.Palatal: The airflow is obstructed by raising the middle part of the tongue to the hard palate, which is the roof of the
mouth right behind the alveolar ridge.
Example: /j/ in "yes," /ʎ/ in some dialects of Spanish.
7.Velar: The airflow is obstructed by raising the back part of the tongue to the soft part of the palate (the velum).
Example: /k/ in "cat," /g/ in "go," /ŋ/ in "sing."
8.Glottal: The airflow is obstructed by closing or nearly closing the space between the vocal cords in the larynx.
Example: /h/ in "hat," the glottal stop /ʔ/ in some dialects, as in "uh-oh."

CSA4006-Dr. Anirban Bhowmick

Consonants: Manner of Articulation
Consonants can also be classified by their manner of articulation, which describes how the airflow is
obstructed or modified as they are produced. Here are some common manners of articulation for
consonants with examples:

Plosive (or Stop): These consonants are produced by a complete closure of the vocal tract, causing a
momentary halt in the airflow before releasing it.

Example: /p/ in "pat," /b/ in "bat," /t/ in "top," /d/ in "dog," /k/ in "cat," /g/ in "go.“

Fricative: Fricatives are produced by narrowing the vocal tract, creating turbulent airflow and a continuous,
hissing sound.

Example: /f/ in "fan," /v/ in "van," /s/ in "sock," /z/ in "zebra," /ʃ/ in "shoe," /ʒ/ in "measure."

Affricate: Affricates begin with a stop-like closure and then transition into a fricative sound.

Example: /tʃ/ in "cheese," /dʒ/ in "judge."

CSA4006-Dr. Anirban Bhowmick

Contd.
Nasal: Nasal consonants are produced by lowering the velum (soft part of the roof of the mouth),
allowing air to flow through the nasal cavity.

Example: /m/ in "mat," /n/ in "net," /ŋ/ in "sing."

Liquid: Liquids involve a relatively free airflow, with slight constriction in the vocal tract.

Lateral Liquid: /l/ in "let."

Retroflex Liquid: /ɹ/ in "red" (Note: The pronunciation of this sound can vary regionally.)
Glide (Semivowel): Glides are produced with a slight constriction in the vocal tract but are more
vowel-like in nature.

Example: /j/ in "yes," /w/ in "we."

Approximant: Approximants have a less constricted airflow than fricatives but more than glides.

Example: /ɹ/ in "red" (in some dialects), /ʋ/ in some languages.

These are the main manners of articulation for consonants.

CSA4006-Dr. Anirban Bhowmick

Vowel

Vowels are classified by how high or low the

tongue is, if the tongue is in the front or back of
the mouth, and whether or not the lips are
rounded
High vowels: [i] [ɪ] [u] [ʊ]
Mid vowels: [e] [ɛ] [o] [ə] [ʌ] [ɔ]
Low vowels: [æ] [a]
Front vowels: [i] [ɪ] [e] [ɛ] [æ]
Central vowels: [ə] [ʌ]
Back vowels: [u] [ɔ] [o] [æ] [a]

CSA4006-Dr. Anirban Bhowmick

EEE1001-Dr. Anirban Bhowmick

Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 10
Syllabus

Syllabus
3
Module 3:
Syntax Parsing: Tagsets for English - Part of
Speech Tagging- Rule based Part-of-speech
Tagging- Stochastic Part-of speech Tagging-
4
Transformation-Based Tagging- Context-Free
Grammars for English - Context-Free Rules and
Trees- The Noun Phrase. The Verb Phrase and
Subcategorization- Grammar Equivalence
&Normal Form- Finite State & Context-Free
Grammars.
Text Books:
Daniel Jurafsky and James H. Martin "Speech and
Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and
Speech recognition", Prentice Hall, 2nd edition, 2008. 5
Reference Books:
1. Roland R. Hausser "Foundations of Computational
Linguistics: Human- Computer Communication in
Natural Language", Paperback, MIT Press, 2011.
2. Christopher D. Manning and Hinrich Schuetze,
6
"Foundations of Statistical Natural Language
Processing" by MIT Press.
Module 3: Syntax
Parsing

If the word ends in “-ment,” assign the tag “noun.”

If the word is all uppercase, assign the tag “proper noun.”
If the word is a verb ending in “-ing,” assign the tag “verb.”

HMM-POS Tagging
N M V <E>
Next, we have to calculate the transition probabilities,
so define two more tags <S> and <E>. <S> is placed at <S> 3 1 0 0
the beginning of each sentence and <E> at the end as N 1 3 1 4
shown in the figure below. M 1 0 3 0
V 4 0 0 0

In the above figure, we can see that the <S> tag is 22

followed by the N tag three times, thus the first entry is

3.The modal tag follows the <S> just once, thus the
second entry is 1. In a similar manner, the rest of the
table is filled.
Next, we divide each term in a row of the table by the
total number of co-occurrences of the tag in
consideration, for example, The Model tag is followed
by any other tag four times as shown below, thus we
divide each element in the third row by four.

CSA4006-Dr. Anirban Bhowmick

HMM-POS Tagging

N M V <E>
<S> 3/4 1/4 0 0
N 1/9 3/9 1/9 4/9
M 1/4 0 3/4 0
V 4/4 0 0 0 23

CSA4006-Dr. Anirban Bhowmick

HMM-POS Tagging
Take a new sentence and tag them with wrong tags.
Let the sentence, ‘ Will can spot Mary’ be tagged
as-

Will as a modal
Can as a verb
Spot as a noun
24
Mary as a noun

Now calculate the probability of this sequence being

correct in the following manner.
The probability of the tag Model (M) comes after the tag <S> is ¼ as seen in the table. Also, the
probability that the word Will is a Model is 3/4. In the same manner, we calculate each and every
probability in the graph. Now the product of these probabilities is the likelihood that this sequence is
right. Since the tags are not correct, the product is zero.

EEE1001-Dr. Anirban Bhowmick

Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 11
Syllabus

Topic: Introduction
Optimizing HMM with Viterbi Algorithm
The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden
states—called the Viterbi path—that results in a sequence of observed events, especially in the context of
Markov information sources and hidden Markov models (HMM).

In the previous section, we optimized the HMM and bought our calculations down from 81 to just two. Now
we are going to further optimize the HMM by using the Viterbi algorithm. Let us use the same example we
used before and apply the Viterbi algorithm to it
8

CSA4006-Dr. Anirban Bhowmick

CSA4006-Dr. Anirban Bhowmick

Process
5. Apply Transformation Rules:
Iterate through the sentence and apply transformation rules to modify the POS tags generated by the
baseline tagger.

6. Evaluate the Updated Tags:

After applying a set of transformation rules to a sentence, evaluate the updated POS tags. If the tagging
accuracy improves, keep the updated tags; otherwise, revert to the previous tagging.
15

7. Repeat:
Continue applying transformation rules and evaluating the tagging accuracy until a stopping criterion is
met, such as reaching a maximum number of iterations or achieving a desired level of accuracy.

8. Finalize Tags:
Once the iterative process is complete, the final POS tags are used as the output for the sentence.

 Note that a word is a constituent (a little one). Sometimes words also act as phrases. In:
Joe grew potatoes.

Joe and potatoes are both nouns and noun phrases.

CSA4006-Dr. Anirban Bhowmick

Evidence constituency exists
1. They appear in similar environments (before a verb)
Kermit the frog comes on stage
They come to Massachusetts every summer
December twenty-sixth comes after Christmas
The reason he is running for president comes out only now.

But not each individual word in the constituent 19

*The comes our... *is comes out... *for comes out...

2. The constituent can be placed in a number of different locations

Constituent = Prepositional phrase: On December twenty-sixth
On December twenty-sixth I’d like to fly to Florida.
I’d like to fly on December twenty-sixth to Florida.
I’d like to fly to Florida on December twenty-sixth.
But not split apart
*On December I’d like to fly twenty-sixth to Florida.
*On I’d like to fly December twenty-sixth to Florida.
CSA4006-Dr. Anirban Bhowmick
20

EEE1001-Dr. Anirban Bhowmick

Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 12
Syllabus

Topic: Introduction
Syntax
By syntax, we mean various aspects of how words are strung together to form components of
sentences and how those components are strung together to form sentences. syntax comes from the
Greek sy´ntaxis, meaning “setting out together or arrangement”,

• that and after year last

• I saw you yesterday
• colorless green ideas sleep furiously
8

Why should you care?

Grammar checkers
Question answering
Information extraction
Machine translation

CSA4006-Dr. Anirban Bhowmick

Constituency
The idea: Groups of words may behave as a single unit or phrase, called a constituent.

E.g. Noun Phrase

Kermit the frog
they
December twenty-sixth
the reason he is running for president
9

CSA4006-Dr. Anirban Bhowmick

Constituent Phrases
For constituents, we usually name them as phrases based on the word that
heads the constituent:

 Note that a word is a constituent (a little one). Sometimes words also act as phrases. In:
Joe grew potatoes.

Joe and potatoes are both nouns and noun phrases.

CSA4006-Dr. Anirban Bhowmick

But not each individual word in the constituent 11

*The comes our... *is comes out... *for comes out...

2. The constituent can be placed in a number of different locations

CFG = Context-Free Grammar = Phrase Structure Grammar

= BNF = Backus-Naur Form

The idea of basing a grammar on constituent structure dates back to Wilhem Wundt (1890), but not 12
formalized until Chomsky (1956), and, independently, by Backus (1959).
Consist of:

Terminals: We’ll take these to be words

Top-Down Parsing and Bottom-Up Parsing are used for parsing a tree to reach the starting node of
the tree. Both the parsing techniques are different from each other. The most basic difference between
the two is that top-down parsing starts from top of the parse tree, while bottom-up parsing starts from
the lowest level of the parse tree.

CSA4006-Dr. Anirban Bhowmick

Top-down parsing
Top-down parsing is goal-directed

A top-down parser starts with a list of constituents to be built.

• It rewrites the goals in the goal list by matching one against the LHS of the grammar rules, and
expanding it with the RHS,
• attempting to match the sentence to be derived 20

If you end up with only the Start symbol on the stack, then success!

Noun Phase
Noun phrases can begin with a determiner, as follows:

a stop, the flights, that fare, this flight, those flights, any flights, some flight
Word classes that appear in the NP before the determiner are called
predeterminers .
A number of different kinds of word classes can appear in the NP between the 30
determiner and the head noun.
• Cardinal numbers Eg two friends, one stop
• Ordinal numbers include first, second, third etc but also words like next, last,
past, other, and another Eg the first one, the next day, the second leg, the last
flight, the other American flight, any other fares.
• Quantifiers many, few, several occur only with plural count nouns Eg many
fares
• The quantifiers much and a little occur only with noncount nouns

CSA4006-Dr. Anirban Bhowmick

Noun Phase
Noun phrases can start with determiners...
Determiners can be

Simple lexical items: the, this, a, an, etc.

A car 31
Or simple possessives

John’s car

Or complex recursive versions of that

John’s sister’s husband’s son’s car

Noun Phase
Rules for GerundVP constituents by duplicating all of our VP productions, substituting GerundV
for V.
• GerundVP -- GerundV NP
• GerundV PP
• GerundV
• GerundV NP PP
35
GerundV can then be defined as:
GerundV being/preferring/arriving /leaving/…

Assistant Professor
VIT Bhopal
Lecture : 13
Syllabus

Syllabus
3
Module 4:
Semantics: Computational Desiderata for
Representations- Meaning Structure of
Language- First Order Predicate Calculus-
Elements of FOPC- The Semantics of FOPC- 4

Syntax-Driven Semantic Analysis- Attachments

for a Fragment of English.
Text Books:
Daniel Jurafsky and James H. Martin "Speech and
Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and
Speech recognition", Prentice Hall, 2nd edition, 2008. 5
Reference Books:
1. Roland R. Hausser "Foundations of Computational
Linguistics: Human- Computer Communication in
Natural Language", Paperback, MIT Press, 2011.
2. Christopher D. Manning and Hinrich Schuetze,
6
"Foundations of Statistical Natural Language
Processing" by MIT Press.
Module 4:
Semantic

Topic: Introduction
Semantic Analysis
Semantic analysis in natural language processing (NLP) refers to the process of understanding the
meaning of words, phrases, sentences, or even entire documents. It goes beyond syntactic analysis,
which focuses on the grammatical structure of language, to extract the underlying meaning and
context.
Here are some key aspects of semantic analysis in NLP:

Word Sense Disambiguation (WSD): Words often have multiple meanings depending on the context
in which they are used. WSD is the task of determining the correct sense of a word in a given 8
context. For example, the word "bank" could refer to a financial institution or the side of a river.

Named Entity Recognition (NER): NER involves identifying and classifying entities such as names of
people, organizations, locations, dates, and other specific terms in a text. This helps in
understanding the key entities and their relationships within a document.

Semantic Role Labeling (SRL): SRL aims to identify the roles of different components of a sentence,
such as the subject, object, and predicate. It helps in understanding the relationships between
entities and their actions in a given context.

CSA4006-Dr. Anirban Bhowmick

Semantic Analysis
Coreference Resolution: This involves determining when two or more expressions in a text refer to the
same entity. For example, in the sentence "John went to the store. He bought some groceries," resolving
the pronoun "He" to refer to "John" requires coreference resolution.

Sentiment Analysis: While often associated more with the emotional aspect of language, sentiment
analysis also involves understanding the underlying meaning of text. It helps determine whether a piece
of text expresses a positive, negative, or neutral sentiment.
9
Semantic Similarity: This involves measuring the degree of similarity between words, phrases, or
sentences in terms of meaning. It is useful in tasks like information retrieval, document clustering, and
question answering.

Word Embeddings and Vector Representations: Techniques like word embeddings (e.g., Word2Vec,
GloVe, and BERT) represent words in a continuous vector space where semantically similar words are
closer in the vector space. This allows algorithms to capture semantic relationships between words.

Does Maharani serve vegetarian food? 13

Serves(Maharani; Vegetarian Food)
 Input matched against the knowledge base of facts about a set of restaurants
 Matching the input proposition in its knowledge base, it can return an affirmative answer
 Otherwise, it must either say No if its knowledge of local restaurants is complete, or say that it does
not know

CSA4006-Dr. Anirban Bhowmick

Unambiguous Representations
 The domain of semantics is subject to ambiguity
 Single linguistic inputs can legitimately have different meaning representations assigned to them
based on the circumstances in which they occur.

The cat is on the mat

Ambiguity:
The phrase "on the mat" might have multiple interpretations, as it could refer to a physical location or 14
imply a scolding or disciplinary action.

Unambiguous representations are crucial for NLP tasks to enhance the accuracy and reliability of
natural language understanding systems.

CSA4006-Dr. Anirban Bhowmick

Vagueness
A concept closely related to ambiguity is vagueness
 Like ambiguity, vagueness can make it difficult to determine what to do with a particular input
based on its meaning representation
 Vagueness, however, does not give rise to multiple representations
 Consider the following request as an example
I want to eat Italian food
Use of the phrase Italian food may provide enough information for a restaurant advisor to provide 15
reasonable recommendations
 It is nevertheless quite vague as to what the user really wants to eat
 A vague representation of the meaning of this phrase may be appropriate for some purposes,
while a more specific representation may be needed for other purposes

CSA4006-Dr. Anirban Bhowmick

Canonical Form
The notion that single sentences can be assigned multiple meanings leads to the related phenomenon of
distinct inputs that should be assigned the same meaning representation

 Does Maharani have vegetarian dishes?

 Do they have vegetarian food at Maharani?
 Are vegetarian dishes served at Maharani? 16
 Does Maharani serve vegetarian fare?

CSA4006-Dr. Anirban Bhowmick

Inference and Variables
Can vegetarians eat at Maharani?

 The term inference to refer generically to a system’s ability to draw valid conclusions based on the
meaning representation of inputs and its store of background knowledge
 It must be possible for the system to draw conclusions about the truth of propositions that are not
explicitly represented in the knowledge base, but are nevertheless logically derivable from the
propositions that are present 17
 I’d like to find a restaurant where I can get vegetarian food.
 In this examples, this request does not make reference to any particular restaurant
 The user is stating that they would like information about an unknown and unnamed entity that is a
restaurant that serves vegetarian food
 Answering this request requires a more complex kind of matching that involves the use of variables
 A representation containing such variables as follows

Serves(x; Vegetarian Food)

CSA4006-Dr. Anirban Bhowmick

Expressiveness
Expressiveness in meaning representation in NLP refers to the ability of a representation system to
capture the richness and diversity of meanings present in natural language. An expressive
representation should be able to convey nuanced relationships, distinctions, and semantic intricacies
inherent in human language. Here's an example to illustrate expressiveness
The conference room echoed with the enthusiastic applause of the audience.

This representation captures the expressiveness of the sentence by not only representing the basic 18

actions and entities but also incorporating additional details about the manner of applause and the
specific location of the event. It goes beyond a simple surface-level representation and delves into the
nuanced aspects of the sentence's meaning.

CSA4006-Dr. Anirban Bhowmick

Meaning Structure of Language
These include a variety of conventional form
 Meaning associations
 Word order regularities
 Tense systems
 Conjunctions and quantifiers
 A fundamental predicate argument structure
19
A predicate is a statement about a subject that either is true or false. It expresses a property or a
relation. Predicates often use verbs to convey actions or states.
Examples:
The cat is on the mat.
Predicate: "is on the mat"
Subject: "The cat"

 Predicates: Primarily Verbs , VPs , Sentences, sometimes Nouns and NPs

 Arguments: Primarily Nouns, Nominals, NPs, PPs.

CSA4006-Dr. Anirban Bhowmick

Meaning Structure of Language
Argument:
An argument is a value that is applied to a function or, in logic, a subject that satisfies a predicate. In
simpler terms, it is what the predicate is about.

Examples:
1.In "The cat is on the mat," "The cat" is the argument of the predicate "is on the mat."
2.In "She likes to read books," "She" is the argument of the predicate "likes to read books." 20
3.In "The sun sets in the west," "The sun" is the argument of the predicate "sets in the west."

 Predicates: Primarily Verbs , VPs , Sentences, sometimes Nouns and NPs

 Arguments: Primarily Nouns, Nominals, NPs, PPs.

CSA4006-Dr. Anirban Bhowmick

Contd.
These examples can be classified as having one of the three syntactic argument frames
I want Italian food  NP want NP
I want to spend less than five dollars  NP want Inf VP
I want it to be close by here  NP want NP Inf VP
 These syntactic frames specify the number, position and syntactic category of the arguments that are
expected.
 The frame for the variety of want that appears in Example 1 specifies the following facts 21
 There are two arguments to this predicate.
 Both arguments must be NPs.
 The first argument is pre verbal and plays the role of the subject.
 The second argument is post verbal and plays the role of the direct object.

CSA4006-Dr. Anirban Bhowmick

Contd.
 Semantic roles and Semantic restrictions on these roles
 The notion of a semantic role can be understood by looking at the similarities among the arguments in
Examples 1 to 4.
 The study of roles associated with specific verbs and across classes of verbs is usually referred to as
thematic role or case role
 The notion of semantic restrictions arises directly from these semantic roles

 Consider the following phrases from the BERP corpus 22

An Italian restaurant under fifteen dollars

 In this example, the meaning representation associated with the preposition under can be seen as
having
 something like the following structure
Under(Italian Restaurant ; $15)
 Prepositions can be characterized as two argument predicates where the first argument is an object that
is being placed in some relation to the second argument

CSA4006-Dr. Anirban Bhowmick

Contd.
Another non verb based predicate argument structure example

Make a reservation for this evening for a table for two persons at 8

There are two types of Propositions:

Atomic Propositions
Compound propositions

CSA4006-Dr. Anirban Bhowmick

Propositional Logic
Atomic Propositions:
Definition: An atomic proposition is one whose truth or falsity does not depend on the truth or
falsity of any other proposition
Example:
"The Sun is cold“
2+2 is 4
15

Compound Propositions:
Compound propositions are constructed by combining simpler or atomic propositions, using
parenthesis and logical connectives.
Example:
"It is raining today, and street is wet."
"Ankit is a doctor, and his clinic is in Mumbai."

CSA4006-Dr. Anirban Bhowmick

Propositional Logic
Logical Connectives:

Implication: In propositional logic, we have a connective that combines two propositions into a new 16
proposition called the conditional

If it is raining, then the street is wet.

Let P= It is raining, and Q= Street is wet, so it is
represented as P → Q

If x is a variable, the existential quantifier will be ∃x:

For some x 23
There exists an x
For at least one x

Example
Some people like Football. ∃x: people(x) ∧ likes Football(x)

CSA4006-Dr. Anirban Bhowmick

Scope and Free & Bound Variables
∀x[Person(x)] ∧ Happy(x)

(Every x is a person) and x is happy

Everyone is a person and he is happy

∀x[Person(x) ∧ Happy(x)] 24

(Every x is a person and every x is happy)

Everyone is happy

CSA4006-Dr. Anirban Bhowmick

Examples
1. Some boys hate football

∃x: boys(x) ∧ hate( x, Football)

2. Every person who buys a Policy is smart

∀x ∀y: Person(x) ∧ Policy(y)^buys(x,y)Smart(x)

25
3. No person buys expensive Policy
∀x ∀y: Person(x) ∧ Policy(y)^expensive(y) ¬ buys(x,y)
4. Mary loves everyone
∀x: (person(x) → love (Mary, x))

5. Everyone loves everyone except himself

∀x ∀y: (x ≠y → L(x,y))

CSA4006-Dr. Anirban Bhowmick

Scope Ambiguity
Every student loves some teacher

(Every student)x loves (some teacher)y

One way : (Every student)x (some teacher)y x loves y

∀x [student(x)  ∃y[teacher(y) ∧ loves(x,y)]] 26

Another way : (some teacher)y (Every student)x x loves y

∃y[teacher(y)] ^ ∀x [student(x)  loves(x,y)]]

CSA4006-Dr. Anirban Bhowmick

Variables and Quantifiers
Consider the following example.

A restaurant that serves Mexican food near ICSI.

The following would be a reasonable representation of the

meaning of such a phrase. 27

Restaurant(x) ∧ Serves(x; Mexican Food) ∧ Near((Location of (x); Location of (ICSI))

CSA4006-Dr. Anirban Bhowmick

Contd.
For example, if AyCaramba is a Mexican restaurant near ICSI, then substituting
AyCaramba for x results in the following logical formula

Restaurant(AyCaramba) ∧ Serves(AyCaramba; Mexican Food) ∧ Near((Location of

John likes oranges but he doesn’t like apples

Mary is studying pharmacy or medicine

CSA4006-Dr. Anirban Bhowmick

FOPL more examples
Everyone likes Venice

Horses are mammals which are animals

10
All that John inherited was a book

John inherited all of the books

CSA4006-Dr. Anirban Bhowmick

FOPL more examples
Existential quantifier- ∃x p(x) and is read as: there exists one x such as p(x) or there is atleast one x
such as p(x)

There is at least one bird in the forest

John and Mary are siblings

11
There is one person who likes salad

Everyone likes someone and no one likes everyone

CSA4006-Dr. Anirban Bhowmick

FOPL more examples
The negation connectives and the quantifiers have the highest priority. Then come the connectives of
conjunction and disjunction. After that, implication, and finally the biconditional has the lowest priority.

Similar formulae:

∀x ¬ P  ¬ ∃x P
12
Example:
Nobody likes John : ∀x ¬ like(x,John)  ¬ ∃x like(x,John)

¬ ∀x P  ∃x ¬ P
Example:
There is at least one person who doesnot like John: ¬ ∀x like (x,John)  ∃x ¬ like
(x,John)

CSA4006-Dr. Anirban Bhowmick

FOPL more examples
Similar formulae:

Parse tree to Meaning Representation
How is the mapping from parse tree to meaning representation done?

Augment the lexicon and grammar rules with semantic attachment – devise a mapping between
rules of the grammar and rules of semantic representation (rule to rule hypothesis)

An augmented rule can take the form

The text appearing within brackets specifies the meaning representation assigned to A as a function of
the semantic attachment of A’s constituents

CSA4006-Dr. Anirban Bhowmick

Contd.
President nominates speaker

Noun  President {President}

Noun  Speaker {speaker}

{President} and {speaker} are meaning associated with the augmented rules
18

NP -> Noun {𝑵𝒐𝒖𝒏𝒔𝒆𝒎 }

Verb  nominates (∃e,x,y nomination (e) ∧ nominator (e,x) ∧ nominee (e,y))

Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 16
Syllabus

 Replaces the variable x with Taj and removes λ 13

 With λ calculus, VP semantics problem can be solved

CSA4006-Dr. Anirban Bhowmick

Beta reduction
(λx.love(x, mary)) (john)

1. Strip off the λ prefix

(love(x, mary)) (john)

2. Remove the argument

14
love(x, mary)

3. Replace all occurrences of λ-bound variable by argument

love(john, mary)

CSA4006-Dr. Anirban Bhowmick

Rules
Rule  1 If ∝ is a terminal node, then [| ∝ |] is specified in the lexicon

Rule  2 if [| ∝ |] is a non-branching node, and 𝛽 is its daughter node then [| ∝ |] = [| 𝛽 |]

Rule  3 if ∝ is a branching node, {𝛽, 𝛾} is the set of daughters and [| 𝛽 |] is a function whose domain
contain [| 𝛾 |], then [| ∝ |] = [|𝛽|] ([|𝛾|] )
15

Specifically, the following operators will be applied to the FOPC representations

 DCL declaratives
 IMP imperatives
 YNQ yes no questions
 WHQ wh question 10

• The normal interpretation for a representation headed by the DCL operator would be as a factual
statement to be added to the current knowledge base.

• Imperative sentences begin with a verb phrase and lack an overt subject. Because of the missing subject,
the meaning representation for the main verb phrase will consist of a λ expression with an unbound λ
variable representing this missing subject

CSA4006-Dr. Anirban Bhowmick

Contd.
 Simply supply a subject to the λ-expression by applying a final λ-reduction to a dummy constant.

 The IMP operator can then be applied to this representation as in the following semantic
attachment.

11
 Applying this rule

 Imperatives can be viewed as a kind of speech act

CSA4006-Dr. Anirban Bhowmick

Contd.
 yes-no-questions consist of a sentence initial auxiliary verb, followed by a subject noun phrase and
then a verb phrase.

 The following semantic attachment simply ignores the auxiliary, and with the exception of the YNQ 12

operator
 Yes or No Questions should be thought as asking the whether the propositional part of its meaning
is true or false given the knowledge currently contained in the knowledge-base.

CSA4006-Dr. Anirban Bhowmick

Contd.
 wh-subject-questions ask for specific information about the subject of the sentence rather than
the sentence as a whole.

 The following attachment produces a representation that consists of the operator WHQ, the
variable corresponding to the subject of the sentence, and the body of the proposition.
13

CSA4006-Dr. Anirban Bhowmick

Contd.
 Such questions can be answered by returning a set of assignments for the subject variable that
make the resulting proposition true with respect to the current knowledge base.
 Finally, consider the following wh non subject question.

How can I go from Minneapolis to Long Beach?

 The question is not about the subject of the sentence but rather some other argument, or some 14

aspect of the proposition as a whole.

CSA4006-Dr. Anirban Bhowmick

Why should you study Machine Translation?
 One of the most challenging problems in Natural Language Processing
 Pushes the boundaries of NLP
 Involves analysis as well as synthesis
 Involves all layers of NLP: morphology, syntax, semantics, pragmatics,
discourse
13
 Theory and techniques in MT are applicable to a wide range of other
problems like transliteration, speech recognition and synthesis

CSA4006-Dr. Anirban Bhowmick

Why is Machine Translation interesting?

Language Divergence  the great diversity among languages of the world

The central problem of MT is to bridge

this language divergence

CSA4006-Dr. Anirban Bhowmick

Language Divergence
Word order: SOV (Hindi), SVO (English), VSO, OSV

E: Argentina won the last World Cup

H: अजें टीना ने पपछला पवश्व कप जीर्ा था

Free (Hindi) vs rigid (English) word order

पपछला पवश्व कप अजें टीना ने जीर्ा था (correct)

The last World Cup Argentina won (grammatically incorrect)

The last World Cup won Argentina (meaning changes)

CSA4006-Dr. Anirban Bhowmick

Language Divergence.

Different ways of expressing same concept

water  पानी, जल, नीर
16

Language registers
Formal: आप बैठिये Informal: तू बैि
Standard : मझ
ु े डोसा चाठिए Dakhini: मेरे को डोसा िोना

CSA4006-Dr. Anirban Bhowmick

Why is Machine Translation difficult?
● Ambiguity
○ Same word, multiple meanings: मंत्री (minister or chess piece)
○ Same meaning, multiple words: जल, पानी, नीि (water)

● Word Order
○ Underlying deeper syntactic structure 17

○ Phrase structure grammar?

○ Computationally intensive

1. Phrase fragment matching: (data-driven)

he buys
a book 22
international politics

2. Translation of segments: (data-driven)

वह खिीदर्ा है
एक पकर्ाब
अंर्ि िाष्ट्रीय िाजनीपर्
● Partly rule-based, partly data-
driven. 3. Recombination: (human crafted rules/templates)
● Good methods for matching वह अंर्ि िाष्ट्रीय िाजनीपर् पि एक पकर्ाब खिीदर्ा है
and large corpora did not exist
when proposed
CSA4006-Dr. Anirban Bhowmick
23

EEE1001-Dr. Anirban Bhowmick

Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 18
Topic: Statistical Machine Translation
Syllabus

CSA4006-Dr. Anirban Bhowmick

SMT
 Parallel corpora are available in several language pairs.

 Basic idea: use a parallel corpora as a training set of translation examples

 Classis example: IBM work on French-English translation, using the Canadian Hansards.
(1.7 million sentences of 30 words or less in length)
9
 Idea goes back to Warren Weaver (1949): suggested applying statistical and cryptanalytic
techniques to translation.

….one naturally wonders if the problem of translation could conceivably be treated as

a problem in cryptography. When I look at an article in Russian, I say: “This is really
written in English, but it has been coded in some strange symbols. I will now proceed
to decode”
(Warren Weaver, 1949, in a letter to Norbert Wiener)

CSA4006-Dr. Anirban Bhowmick

The Noisy Channel Model
 Goal: translation system from French to English

 Have a model p(e|f) which estimates conditional probability of any English sentence e
given the French sentence f. Use the training corpus to set the parameters.

 A Noisy Channel Model has two components:

10
p(e) the language model
p(f|e) the translation model

CSA4006-Dr. Anirban Bhowmick

NCM

CSA4006-Dr. Anirban Bhowmick

SMT
Let’s formalize the translation process

We will model translation using a probabilistic model. Why?

- We would like to have a measure of confidence for the translations we learn
- We would like to model uncertainty in translation
12

Model: a simplified and idealized understanding of a physical process

CSA4006-Dr. Anirban Bhowmick

SMT

Why use this counter-intuitive way of explaining translation?

● Makes it easier to mathematically represent translation and learn probabilities

● Fidelity and Fluency can be modelled separately

CSA4006-Dr. Anirban Bhowmick

SMT
We have already seen how to learn n-gram language models

14
Let’s see how to learn the translation model  𝑃(𝒇|𝒆)

To learn sentence translation probabilities,

 we first need to learn word-level translation probabilities

That is the task of word alignment

CSA4006-Dr. Anirban Bhowmick

EEE1001-Dr. Anirban Bhowmick

Natural Language
Processing
CSA4006

Dr. Anirban Bhowmick

Assistant Professor
VIT Bhopal
Lecture : 19
Topic: Statistical Machine Translation
Syllabus

CSA4006-Dr. Anirban Bhowmick

IBM Model

CSA4006-Dr. Anirban Bhowmick

CSA4006-Dr. Anirban Bhowmick

Contd.

CSA4006-Dr. Anirban Bhowmick

Contd.
The encoder
Layers of recurrent units where, in each time step, an input token is received, collecting relevant
information and producing a hidden state. This depends on the type of RNN; in our example, a LSTM,
the unit mixes the current hidden state and the input and returns an output, discarded, and a new
hidden state.
The encoder vector
The encoder vector is the last hidden state of the encoder, and it tries to contain as much of the useful 12

input information as possible to help the decoder get the best results. It’s the only information from the
input that the decoder will get.
The decoder
Layers of recurrent units — e.g., LSTMs — where each unit produces an output at a time step t. The
hidden state of the first unit is the encoder vector, and the rest of the units accept the hidden state
from the previous unit. The output is calculated using a softmax function to obtain a probability for
every token in the output vocabulary.

CSA4006-Dr. Anirban Bhowmick

Problem and Solution
Why? Longer sentences illustrate the limitations of a single-directional encoder-decoder architecture.

Because language consists of tokens and grammar, the problem with this model is it does not entirely
address the complexity of the grammar.

Specifically, when translating the nth word in the source language, the RNN was considering only the
1st n-word in the source sentence, but grammatically, the meaning of a word depends on both the 13
sequence of words before and after it in a sentence:

A solution: The bi-directional LSTM model. If we use a bi-directional model, it allows us to input the
context of both past and future words to create an accurate encoder output vector:

CSA4006-Dr. Anirban Bhowmick

Bi-LSTM

But then, the challenge then becomes, which word do we need to focus on in a sequence?

CSA4006-Dr. Anirban Bhowmick

Attention Mechanism
Attention Mechanism Overview:
The attention mechanism enhances
the traditional encoder-decoder
architecture by allowing the decoder
to "pay attention" to different parts of
the source sentence when generating
each word in the target sequence.
15

CSA4006-Dr. Anirban Bhowmick

Attention Mechanism

CSA4006-Dr. Anirban Bhowmick

Video Tutorials

CSA4006-Dr. Anirban Bhowmick

EEE1001-Dr. Anirban Bhowmick

Transformers For Natural Language Processing and Computer Vision, Third Edition Denis Rothman Download
100% (3)
Transformers For Natural Language Processing and Computer Vision, Third Edition Denis Rothman Download
46 pages
The Knowledge Graph Cookbook
No ratings yet
The Knowledge Graph Cookbook
228 pages
Natural Language Processing in Artificial Intelligence
100% (1)
Natural Language Processing in Artificial Intelligence
297 pages
Natural Language Processing in The Real World Text Processing, Analytics, and Classification
100% (10)
Natural Language Processing in The Real World Text Processing, Analytics, and Classification
393 pages
PLTC Course
No ratings yet
PLTC Course
4 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
211 pages
Bhawini NLP File
No ratings yet
Bhawini NLP File
100 pages
NLP Guide: Theory & Practice
No ratings yet
NLP Guide: Theory & Practice
26 pages
NLP Notes
No ratings yet
NLP Notes
203 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
NLP-1 (Tokenization)
100% (1)
NLP-1 (Tokenization)
10 pages
A Practical Guide To Hybrid Natural Language Processing (Combining Neural Models and Knowledge Graph
No ratings yet
A Practical Guide To Hybrid Natural Language Processing (Combining Neural Models and Knowledge Graph
281 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
NLP and Generative AI Syllabus - 2025
No ratings yet
NLP and Generative AI Syllabus - 2025
5 pages
Question Bank
No ratings yet
Question Bank
13 pages
MLP Mid Sem Merge (Raja)
No ratings yet
MLP Mid Sem Merge (Raja)
351 pages
Introduction To NLP
No ratings yet
Introduction To NLP
30 pages
The Rise of The Knowledge Graph
100% (1)
The Rise of The Knowledge Graph
88 pages
NLP Presentation
No ratings yet
NLP Presentation
19 pages
Natural Language Processing: Dr. Tulasi Prasad Sariki SCOPE, VIT Chennai
No ratings yet
Natural Language Processing: Dr. Tulasi Prasad Sariki SCOPE, VIT Chennai
29 pages
Machine Learning
No ratings yet
Machine Learning
20 pages
AI Agents: A Beginner's Guide
No ratings yet
AI Agents: A Beginner's Guide
29 pages
Social Network Analytics Course Overview
No ratings yet
Social Network Analytics Course Overview
35 pages
CSE4022 Natural-Language-Processing ETH 1 AC41
No ratings yet
CSE4022 Natural-Language-Processing ETH 1 AC41
6 pages
Whitepaper - Foundational Large Language Models & Text Generation - v2
100% (1)
Whitepaper - Foundational Large Language Models & Text Generation - v2
86 pages
Ai Chapter1
No ratings yet
Ai Chapter1
24 pages
Analysis of Statistical Parsing in Natural Language Processing
No ratings yet
Analysis of Statistical Parsing in Natural Language Processing
6 pages
2019 - On The Control of Multi-Agent Systems - A Survey
No ratings yet
2019 - On The Control of Multi-Agent Systems - A Survey
164 pages
The Geometry of Intelligence Foundations of Transformer Networks in Deep Learning (Pradeep Singh, Balasubramanian Raman) (Z-Library)
No ratings yet
The Geometry of Intelligence Foundations of Transformer Networks in Deep Learning (Pradeep Singh, Balasubramanian Raman) (Z-Library)
375 pages
AI
No ratings yet
AI
272 pages
Semantic Web Course Overview
No ratings yet
Semantic Web Course Overview
35 pages
Eisenstein NLP Notes
No ratings yet
Eisenstein NLP Notes
573 pages
Shivangi Tyagi (NLP Assignments)
No ratings yet
Shivangi Tyagi (NLP Assignments)
60 pages
Natural Language Toolkit Tutorial
100% (1)
Natural Language Toolkit Tutorial
109 pages
Machine Learning
50% (2)
Machine Learning
430 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Deep Natural Language Processing and Ai Applications For Industry 5.0
No ratings yet
Deep Natural Language Processing and Ai Applications For Industry 5.0
258 pages
(Jorge Cardoso) Semantic Web Services Theory
100% (1)
(Jorge Cardoso) Semantic Web Services Theory
372 pages
Table of Content
No ratings yet
Table of Content
13 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
9 pages
GANppt
100% (1)
GANppt
34 pages
AI Applications for Engineers
No ratings yet
AI Applications for Engineers
421 pages
Role of Machine Learning in The Field of Fiber Reinforced Polymer
No ratings yet
Role of Machine Learning in The Field of Fiber Reinforced Polymer
6 pages
Knowledge Representation
No ratings yet
Knowledge Representation
47 pages
Social Network Analysis Unit-5
No ratings yet
Social Network Analysis Unit-5
31 pages
Intro To NLP and Text Mining
No ratings yet
Intro To NLP and Text Mining
28 pages
Nn4ir PDF
No ratings yet
Nn4ir PDF
290 pages
Unit-8: Natural Language: Processing
No ratings yet
Unit-8: Natural Language: Processing
16 pages
Dive Into Deep Learning
100% (2)
Dive Into Deep Learning
291 pages
Cryptography Roadmap
No ratings yet
Cryptography Roadmap
1 page
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
276 pages
Natural Language Processing-Wiki
No ratings yet
Natural Language Processing-Wiki
237 pages
Natural Language Processing
No ratings yet
Natural Language Processing
49 pages
Linear Algebra - Intuition, Math, Code
No ratings yet
Linear Algebra - Intuition, Math, Code
565 pages
OCR Translation App for Tourists
No ratings yet
OCR Translation App for Tourists
8 pages
NLP Trends and Challenges
No ratings yet
NLP Trends and Challenges
26 pages
1 - Machine Learning (Start)
No ratings yet
1 - Machine Learning (Start)
32 pages
Natural Language Processing Overview
No ratings yet
Natural Language Processing Overview
26 pages
NLP Lab Tasks for Students
No ratings yet
NLP Lab Tasks for Students
16 pages
Anitha S. Pillai and Roberto Tedesco - Machine Learning and Deep Learning in Natural Language Processing-CRC Press (2024)
100% (2)
Anitha S. Pillai and Roberto Tedesco - Machine Learning and Deep Learning in Natural Language Processing-CRC Press (2024)
245 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
31 pages
QHSE Management System Guide
No ratings yet
QHSE Management System Guide
41 pages
Tech Insights for Curious Minds
No ratings yet
Tech Insights for Curious Minds
9 pages
Introducing Statistics
100% (2)
Introducing Statistics
13 pages
The Role of Management Accountant in Pro-47198431
No ratings yet
The Role of Management Accountant in Pro-47198431
51 pages
Architectural Diagram of A City
No ratings yet
Architectural Diagram of A City
8 pages
Efficiency in Education
No ratings yet
Efficiency in Education
7 pages
Physical Geography A Landscape Appreciation 9th Edition Tom L McKnight Darrel Hess ISBN 9780132239011 Ebook and TestBank Bundle Instructor Test Bank
No ratings yet
Physical Geography A Landscape Appreciation 9th Edition Tom L McKnight Darrel Hess ISBN 9780132239011 Ebook and TestBank Bundle Instructor Test Bank
345 pages
Performance Appraisal Short Core Values Behav Descr
No ratings yet
Performance Appraisal Short Core Values Behav Descr
5 pages
Statistical Methods and Reasoning For The Clinical Sciences Ebook and TestBank Bundle Fast Access
No ratings yet
Statistical Methods and Reasoning For The Clinical Sciences Ebook and TestBank Bundle Fast Access
303 pages
Dairy Farm Management Level 5 Curriculum E-1
No ratings yet
Dairy Farm Management Level 5 Curriculum E-1
74 pages
Architectural Journalism
100% (1)
Architectural Journalism
16 pages
Software Engineering: CMP:3310 For M.SC (IT) University of Sargodha
No ratings yet
Software Engineering: CMP:3310 For M.SC (IT) University of Sargodha
9 pages
Exploring The Public Sector Adoption of HRIS: Imds 111,3
No ratings yet
Exploring The Public Sector Adoption of HRIS: Imds 111,3
19 pages
Info Sheet 3.1-2
No ratings yet
Info Sheet 3.1-2
8 pages
ICT Empowerment for Grade 11-12
No ratings yet
ICT Empowerment for Grade 11-12
7 pages
Test Plan - Real Estate
No ratings yet
Test Plan - Real Estate
5 pages
GIS in Telecommunication Network Infrastructure Management
80% (5)
GIS in Telecommunication Network Infrastructure Management
7 pages
Computer Basics for Beginners
No ratings yet
Computer Basics for Beginners
9 pages
The Data-Information-Knowledge-Wisdom Hierarchy An
No ratings yet
The Data-Information-Knowledge-Wisdom Hierarchy An
10 pages
The Effect of Communication To Change and Organizational Trust On Readiness For Change (Case Study On Xyz School)
No ratings yet
The Effect of Communication To Change and Organizational Trust On Readiness For Change (Case Study On Xyz School)
10 pages
E-Learning: Library Perspective
No ratings yet
E-Learning: Library Perspective
7 pages
5-02-Revised Blooms PDF
No ratings yet
5-02-Revised Blooms PDF
3 pages
Computer Applications Technology PAT
0% (1)
Computer Applications Technology PAT
29 pages
Key Success Factors: Theory & Method
No ratings yet
Key Success Factors: Theory & Method
33 pages
Grade 6 English Lesson Plan
No ratings yet
Grade 6 English Lesson Plan
4 pages
Institut Für Informatik: Technischer Bericht
No ratings yet
Institut Für Informatik: Technischer Bericht
51 pages
MC 10183624 0001
No ratings yet
MC 10183624 0001
5 pages
Accounting Information Systems Understanding Business Processes 4th Edition Considine Test Bank Full Chapter PDF
100% (25)
Accounting Information Systems Understanding Business Processes 4th Edition Considine Test Bank Full Chapter PDF
40 pages
Week 3 Media Literacy
No ratings yet
Week 3 Media Literacy
8 pages