CS480 Lecture November 21st
CS480 Lecture November 21st
2
Plan for Today
Casual introduction to Machine Learning
Natural Language Understanding / Processing
Basics
3
Main Machine Learning Categories
Supervised learning Unsupervised learning Reinforcement learning
4
What is Reinforcement Learning?
Idea:
Reinforcement learning is inspired by behavioral
psychology. It is based on a rewarding / punishing
an algorithm.
5
RL: Agents and Environments
State
Agent
Reward
Action
Environment
6
Reinforcement Learning in Action
7
Reinforcement Learning in Action
Source: https://www.youtube.com/watch?v=x4O8pojMF0w
8
Reinforcement Learning in Action
Source: https://www.youtube.com/watch?v=kopoLzvh5jY
9
Reinforcement Learning in Action
Source: https://www.youtube.com/watch?v=Tnu4O_xEmVk
10
ANN for Simple Game Playing
UP
Game
DOWN
state
JUMP
11
ANN for Simple Game Playing
Current game is an input. Decisions (UP/DOWN/JUMP) are rewarded/punished.
UP
Game
DOWN
state
JUMP
12
RL: Agents and Environments
State
What’s
inside?
Reward
Action
Environment
13
RL: Agents and Environments
State
Reward
Action
Environment
14
K-Armed Bandit Problem
15
K-Armed Bandit Problem
The K-armed bandit problem is a problem in which
a fixed limited set of resources must be allocated
between competing (alternative) choices in a way
that maximizes their expected gain.
16
K-Armed Bandit Problem
In the problem, each machine provides a random
reward from a probability distribution specific to
that machine, that is not known a-priori.
17
K-Armed Bandit Problem
33 % 52 % 78 %
current current current
success (win) success (win) success (win)
rate rate rate
18
K-Armed Bandit
Agent
Reward
Select Arm
Environment
19
Exploration vs. Exploitation
The crucial tradeoff the gambler faces at each trial
is between "exploitation" of the machine that has
the highest expected payoff and "exploration" to
get more information about the expected payoffs
of the other machines.
20
-greedy Algorithm
generate random number p [0,1]
if (p < ) // explore
else // exploit
end
21
Q-Learning Algorithm
Initialize
Set Initialize
parameters Q-table
Yes Yes
Pick a
Stop random
action
Apply action
Update
to
Q-table
environment
22
Q-Learning Algorithm
Initialize Initialize Q-table:
Set up and initialize (all values set to
Set
parameters
Initialize
Q-table
0) a table where:
rows represent possible states
columns represent actions
Yes Yes
Pick a
Stop random
action
Apply action
Update
to
Q-table
environment
23
Q-Learning Algorithm
Initialize Set parameters:
Set and initialize hyperparameters
Set
parameters
Initialize
Q-table
for the Q-learning process.
Hyperparemeters include:
chance of choosing a random
action: a threshold for choosing
Initialize
Get
Is goal No Pick a No Reference a random action over an action
environment random action in
simulator
state
reached?
action? Q-table from the Q-table
learning rate: a parameter that
Yes Yes describes how quickly the
algorithm should learn from
rewards in different states
Stop
Pick a
random high: faster learning with
action erratic Q-table changes
low: gradual learning with
possibly more iterations
discount factor: a parameter
Apply action
that describes how valuable are
Update
Q-table
to future rewards. It tells the
environment
algorithm whether it should seek
Repeat for n iterations “immediate gratification” (small)
or “long-term reward” (large)
24
Q-Learning Algorithm
Initialize Initialize simulator:
Reset the simulated environment to
Set
parameters
Initialize
Q-table
its initial state and place the agent in
a neutral state.
Yes Yes
Pick a
Stop random
action
Apply action
Update
to
Q-table
environment
25
Q-Learning Algorithm
Initialize Get environment state:
Report the current state of the
Set
parameters
Initialize
Q-table
environment. Typically a vector of
values representing all relevant
variables.
Yes Yes
Pick a
Stop random
action
Apply action
Update
to
Q-table
environment
26
Q-Learning Algorithm
Initialize Is goal reached?:
Verify if the goal of the simulation
Set
parameters
Initialize
Q-table
has been achieved. It could be
decided with the agent arriving in
expected final state or by some
simulation parameter.
Yes Yes
Pick a
Stop random
action
Apply action
Update
to
Q-table
environment
27
Q-Learning Algorithm
Initialize Pick a random action?:
Decide whether next action should
Set
parameters
Initialize
Q-table
be picked at random or not (it will be
selected based on Q-table data
then).
Yes Yes
Pick a
Stop random
action
Apply action
Update
to
Q-table
environment
28
Q-Learning Algorithm
Initialize Reference action in Q-table:
Next action decision will be based on
Set
parameters
Initialize
Q-table
data from the Q-table given the
current state of the environment.
Yes Yes
Pick a
Stop random
action
Apply action
Update
to
Q-table
environment
29
Q-Learning Algorithm
Initialize Pick a random action:
Pick any of the available actions at
Set
parameters
Initialize
Q-table
random. Helpful with exploration of
the environment.
Yes Yes
Pick a
Stop random
action
Apply action
Update
to
Q-table
environment
30
Q-Learning Algorithm
Initialize Apply action to environment:
Apply the action to the environment
Set
parameters
Initialize
Q-table
to change it. Each action will have its
own reward.
Yes Yes
Pick a
Stop random
action
Apply action
Update
to
Q-table
environment
31
Q-Learning Algorithm
Initialize Update Q-table:
Update the Q-table given the reward
Set
parameters
Initialize
Q-table
resulting from recently applied
action (feedback from the
environment).
Yes Yes
Pick a
Stop random
action
Apply action
Update
to
Q-table
environment
32
Q-Learning Algorithm
Initialize Stop:
Stop the learning process
Set Initialize
parameters Q-table
Yes Yes
Pick a
Stop random
action
Apply action
Update
to
Q-table
environment
33
Q-Learning Algorithm
Actions
Q-table
1 0 0 0 0
2 0 0 0 0
States
... ... ... ... ...
n 0 0 0 0
Rewards:
Move into car: -100
Move into pedestrian: -1000
Move into empty space: 100
Move into goal: 500
Action: Reward:
Q-table value:
Learning rate Discount
Q(state, action) = (1 − alpha) ∗ Q(state, action) + alpha ∗ (reward + gamma ∗ Q(next state, all actions))
Current value Maximum value of all actions on next state
34
Q-Learning Algorithm
Actions
Q-table
1 0 0 0 0
2 0 0 0 0
States
... ... ... ... ...
n 0 0 0 0
Rewards:
Move into car: -100
Move into pedestrian: -1000
Move into empty space: 100
Move into goal: 500
Action: Reward:
Q-table value:
Learning rate Discount
Q(state, action) = (1 − alpha) ∗ Q(state, action) + alpha ∗ (reward + gamma ∗ Q(next state, all actions))
Current value Maximum value of all actions on next state
35
Q-Learning Algorithm
Actions
Q-table
1 0 0 0 0
2 0 0 0 0
States
... ... ... ... ...
n 0 0 0 0
Rewards:
Move into car: -100
Move into pedestrian: -1000
Move into empty space: 100
Move into goal: 500
Action: Reward:
Q-table value:
Learning rate Discount
Q(state, action) = (1 − alpha) ∗ Q(state, action) + alpha ∗ (reward + gamma ∗ Q(next state, all actions))
Current value Maximum value of all actions on next state
36
Q-Learning Algorithm
Actions
Q-table
1 0 0 0 0
2 0 0 0 0
States
... ... ... ... ...
n 0 0 0 0
Rewards:
Move into car: -100
Move into pedestrian: -1000
Move into empty space: 100
Move into goal: 500
37
Q-Learning Algorithm
Actions
Q-table
1 0 0 -10 0
2 0 0 0 0
States
... ... ... ... ...
n 0 0 0 0
Rewards:
Move into car: -100
Move into pedestrian: -1000
Move into empty space: 100
Move into goal: 500
38
Q-Learning Algorithm
Actions
Q-table
1 0 0 -10 0
2 0 0 0 0
States
... ... ... ... ...
n 0 0 0 0
Rewards:
Move into car: -100
Move into pedestrian: -1000
Move into empty space: 100
Move into goal: 500
39
Q-Learning Algorithm
Actions
Q-table
1 0 0 -10 0
2 0 0 0
States
-100
40
Deep Reinforcement Learning
State
Reward
Action
Environment
41
RL: Agents and Environments
State
Agent
Reward
Action
Environment
42
Natural Language Processing (NLP)
Definition:
Natural language processing (NLP) is a subfield of
linguistics, computer science, and artificial intelligence
concerned with the interactions between computers and
human language, in particular how to program computers
to process and analyze large amounts of natural language
data.
Involves:
Speech processing
Natural language understanding
Natural language generation
43
Computers vs Language and Speech
Text processing: engineering practices for transforming,
normalizing, compressing or accessing textual data
Natural language understanding / processing: the study
of methods for exploiting or generating language
represented as text, for practical tasks
Computational linguistics: the use of computational
tools to understand or learn the structure of human
languages
Speech processing: The study of methods for exploiting
or generating language represented as audible
waveforms, for practical tasks
44
Primary Reasons for NLP
To enable human-machine communication
To learn from written sources
most information (80-90% or more) in most organizations is in
natural language (reports, order forms, bulletin boards, email,
web pages, video, audio, etc.) and not in a traditional
database!
most of that information is now digital
Estimate in 1998: ~60%
Now, more than 90%!
45
What Are Key NLP Goals?
Long range perspective:
True understanding of natural language
Deep reasoning about texts
Real-time spoken dialogue / translation
Engineering perspective:
Extract useful facts from documents
Search the web
Better spelling / grammar checking
etc.
46
Core NLP Applications
Language modeling: the task of predicting what the next
word in a sentence will be based on history of previous
words. Its goal is to learn the probability of a sequence
of words appearing in a given language.
Text classification: the task of bucketing the text into a
known set of categories based on its content.
Information extraction: the task of extracting relevant
information from text.
Information retrieval: the task of finding documents /
data relevant to a specific query from a large collection.
47
Core NLP Applications
Conversational agent: the task of building a dialogue
systems that can converse in human languages.
Text summarization: this task aims to create short
summaries of longer documents while preserving the
core content and meaning of the text.
Question answering: the task of building a system that
can answer questions posed in natural language.
Machine translation: the task of converting a piece of
text from one language to another.
Topic modeling: the task of uncovering the topical
structure of a large collection of documents.
48
Selected Real-world NLP Applications
Question
Text Information Conversationa Information
Applications classification extraction l Agent retrieval
answering
systems
Calendar
General Spam Personal Search
event Jeopardy!
applications classification
extraction
assistants engines
Industry
Social media Retail catalog Health record Financial Legal entity
specific analysis extraction analysis analysis extraction
applications
49
Spell checking
Easy
Keyword-based information retrieval
Topic modeling
Text classification
Information extraction
Text summarization
Question answering
Machine translation
Hard
50
Natural Language
51
What is Language?
Language is a structured system of communication that
involves complex combinations of its consituent
components, such as characters, words, sentences, etc.
Source: https://www.englishclub.com/pronunciation/phonemic-chart.htm
53
Language: Morphemes and Lexemes
A morpheme is the smallest unit of language that has
meaning. It is a combination of phonemes. Not all
morphemes are words. All prefixes and suffixes are
morphemes.
unbreakble cats
un + break + able cat + s
Lexemes are structural variations of morphemes related
to one another by meaning. For example
run and running
belong to the same lexeme form.
54
Language: Syntax
Syntax is a set of rules used to construct grammatically
correct sentences out of words and phrases in a language.
A common approach to representing sentences is a parse
tree.
Legend:
S for sentence
NP for noun phrase
VP for verb phrase
V for verb
D for determiner, in this instance the definite article "the"
N for noun
55
Language: Context / Semantics
Context / semantics is how various parts in a language
come together to convey a particular meaning. Context
incluedes long-term references, world knowledge, and
common sense along with the literal meaning of words
and phrases.
56
Blocks of Language | Applications
Contex / Semantics
Summarization
(meaning) Topic modeling
Sentiment analysis
Parsing
Syntax
Entity extraction
(phrases and sentences)
Relation extraction
Tokenization
Morphemes and lexemes
Word embeddings
(words)
Part-of-speech tags
Speech to text
Phonemes
Speaker Identification
(speech and sounds)
Text to speech
58
Key NLP Tasks: Semantics
Named entity recognition (NER): determining the parts of a text
that can be identified and categorized into preset groups.
Examples of such groups include names of people and names of
places.
Word sense disambiguation: giving meaning to a word based on
the context.
Natural language generation: using databases to derive
semantic intentions and convert them into human language.
59
Why is NLP Hard?
Complexity
Ambiguity
“I made her duck”
Common knowledge is required for understanding
Fuzzy and probabilistic
Creativity
Diversity
“Living” / evolving languages
neologisms, etc.
60
AI vs ML vs NLP
AI
ML NLP
DL
61
NLP as a System
62
Basic NLP Spoken Language Pipeline
Morphological
Speech Contextual Information
and lexical Parsing
analysis analysis
reasoning retrieval
Application
reasoning
and execution
Question
Speech Morphological Syntactic Utterance
answering
synthesis realization Realization planning
systems
63
NLP vs. Adjacent Fields
64
Basic NLP Spoken Language Pipeline
Morphological
Speech Contextual Information
and lexical Parsing
analysis analysis
reasoning retrieval
Application
reasoning
and execution
Question
Speech Morphological Syntactic Utterance
answering
synthesis realization Realization planning
systems
65
Common Lexical Categories
Lexical
Definition* Example
category
Adjective A word or phrase naming an attribute, added to or The quick red fox jumped over the lazy
grammatically related to a noun to modify or describe it brown dogs.
Adverb A word or phrase that modifies or qualifies an adjective, The dogs lazily ran down the field after
verb, or other adverb, or a word group, expressing a the fox.
relation of place, time, circumstance, manner, cause,
degree, etc.
Conjunction A word that joins two words, phrases, or clauses The quick red fox and the silver coyote
jumped over the lazy brown dogs.
Determiner A modifying word that determines the kind of reference The quick red fox jumped over the lazy
a noun or noun group has, for example a, the, very brown dogs.
Noun A word used to identify any of the class of people, The quick red fox jumped over the lazy
places, or things, or to name a particular one of these. brown dogs.
Preposition A word governing, and usually preceding, a noun or The quick red fox jumped over the lazy
pronoun and expressing a relation to another word or brown dogs.
element in the clause
Verb A word used to describe an action, state, or occurence, The quick red fox jumped over the lazy
and forming the main part of the predicate of a brown dogs.
sentence, such as hear, become, and happen
* all definitions are taken from the New Oxford American Dictionary, 2nd Edition
66
Morphology
Morphology is a study of the internal Suffix Example Verb
67
Common Phrasal Categories
Type Example Comments
Adjective The unusually red fox jumped over the The adverbs unusually and exceptionally modify the
exceptionally lazy dogs. adjectives red and lazy, respectively, to create adjectival
phrases.
Adverb The dogs almost always ran down the The adverb almost modifies the adverb always to create
field after the fox. adverbial phrase.
Conjunction The quick red fox as well as the silver Though this is somewhat of an exceptional case, you can
coyote jumped over the lazy brown dogs. see that the phrase as well as performs the same
function as a conjunction such as and.
Noun The quick red fox jumped over the lazy The noun fox and its modifiers the, quick, and red create
brown dogs. a noun phrase, as does the noun dogs and its modifiers
the, lazy, and brown.
Preposition The quick red fox jumped over the lazy The preposition over and the noun phrase the lazy brown
brown dogs. dogs form a prepositional phrase that modifies the verb
jumped.
Verb The quick red fox jumped over the lazy The verb jumped and its modifier the prepositional
brown dogs. phrase over the lazy brown dogs form a verb phrase.
68
Parsing
The task of determining the parts of speech,
phrases, clauses, and their relationship to one
another is called parsing.
69
Basic NLP Spoken Language Pipeline
Morphological
Speech Contextual Information
and lexical Parsing
analysis analysis
reasoning retrieval
Application
reasoning
and execution
Question
Speech Morphological Syntactic Utterance
answering
synthesis realization Realization planning
systems
70
Knowledge Levels / Forms for NLP
Level Description
Phonetic and phonological Concerned with how the words are related to sounds that realize them. Such
knowledge knowledge is crucial for speech-based systems.
Morphological knowledge Concerned with how words are constructed from the basic meaning units called
morphemes.
Syntactic knowledge Concerned with how words can be put together to form correct sentences and
determines what structural role each word plays in the sentence and what
phrases are subparts of what other phrases.
Semantic knowledge Concerned with what the words mean and how these meanings combine in
sentences to form sentence meanings. This is the study of context-independent
meaning - the meaning a sentence has regardless of the context in which it is
used.
Pragmatic knowledge Concerned with how sentences are used in different situations and how use
affects the interpretation of the sentence.
Discourse knowledge Concerned with how the immediately preceding sentences affect the
interpretation of the next sentence. This information is especially important for
interpreting pronouns and for interpreting the temporal aspects of the
information.
World knowledge Includes the general knowledge about the structure of the world that language
users must have in order to, for example, maintain a conversation. It includes
what each language user must know about the other user’s beliefs and goals.
71
(English) Syntax
The structure of words and phrases Examples:
within a sentence:
Different formalisms, coming
from the American (phrase
structure) and European
(dependency grammar)
structuralist traditions
Applications:
Part-of-speech tagging
Entity extraction
Syntactic parsing (Context-Free
Grammar)
Syntactic parsing (dependencies)
72
Semantics
The representation of meaning in Example:
language:
at different levels: lexical,
sentential, textual
x [bird(x) fly(x)]
logical formalisms: reference and
truth conditions
Applications:
Word embedding / encoding
Lexical resources
Semantic role labeling
73
Pragmatics
How language is used to achieve Examples:
specific intentions:
conversational implicatures: how “I ate most of your cookies”
I interpret what you say because
of what I assume you are trying
I did not eat all of your cookies
to do
speech acts
“Where does your brother live?”
Applications:
I do not know where your brother
Speech act labeling
lives
Discourse structure parsing
Dialogue systems
74
Structure / Rank Levels for NLP
“So, what do you
Discourse / text
think?”
“I level
S disagree...”
Clause / sentence
NP VP level
NP PP
Group / phrase
level
V NP AJP
Morpheme
lectur er ‘s teach ing course s
level
75
NLU: Flow of Information
Syntactic Contextual
Raw text input structure interpretation Application
Parsing
(words) and Reasoning
logical form final meaning
76
Automated Text Processing
The task of automatic processing of text is to extract a
numerical representation of the meaning of that text. This
is the natural language understanding (NLU) part of NLP.
The numerical representation of the meaning of natural
language usually takes the form of a vector called an
embedding.
77
Automated Text Processing
The task of automatic processing of text is to extract a
numerical representation of the meaning of that text. This
is the natural language understanding (NLU) part of NLP.
The numerical representation of the meaning of natural
language usually takes the form of a vector called an
embedding.
78
Text Pre-Processing
79
Character Encoding: ASCII
Source: https://www.sciencebuddies.org/science-fair-projects/references/ascii-table
80
Character Encoding: ISO 8859-1 Latin 1
Source: https://visual-integrity.com/iso-8859/
81
Character Encoding: Unicode (Sample)
Source: https://www.vertex42.com/ExcelTips/unicode-symbols.html
82
Basic Pre-Processing: Normalization
Document(s) / text level:
Sentence
Text Sentences
tokenization / segmentation
Removal of punctuation
and stop words
Tokenized
sentence
Stemming
Lemmatization
Note: depending on the nature of data, additional pre-processing steps may be required / important.
83
Exercise: Tokenization
http://text-processing.com/demo/tokenize/
84
Pre-processing: Lowercasing
Some applications (eg. Information Retrieval, search)
reduce all letters to lower case:
users tend to use lower case
possible exception: upper case in mid-sentence?
General Motors
Fed vs. fed
85
Stemming: Before and After
Before: After:
86
Stemming vs. Lemmatization
Stemming Lemmatization
87
Tokenization / Lemmatization Example
Sentence input:
Chaplin wrote, directed, and composed music for most of his films.
Tokenization:
Chaplin wrote , directed , and composed music for most of his films .
Chaplin wrote, directed, and composed music for most of his films.
Lemmatization:
Chaplin write , direct , and compose music for most of he film .
Chaplin wrote, directed, and composed music for most of his films.
88
Exercise: Stemming
http://text-processing.com/demo/stem/
89
Pre-processing: Stop Words
Very common words (articles, propositions, pronouns,
conjunctions,etc.) that do not add much information (but
take up space) are called stop words and are frequently
filtered out.
Examples in English: an, the, a, for, is
Filtering based on the stop (word) list
generated based on collection frequency
Tools: RegEx + stop list, NLP libraries have their own
stop lists
Careful: sometimes it may lead to removing important
information
90
Additional Pre-processing Steps
Additional normalization
in addition to stemming, lemmatization:
standardizing abbreviations (eg. expanding), hyphenations, digits to
text (9 to nine) conversions, etc.
Language detection
Code mixing
embedding of linguistic units such as phrases, words, and
morphemes of one language into an utterance of another
language
Transliteration
converting between different writing systems
91
Relationships Between Words
92
Lexical Relationships
Lexical relationships are the connections established between one word and
another:
Synonymy is the idea that some words have the same meaning as others
quick is similar to fast
Antonymy is precisely the opposite of synonymy
good is the opposite bad
Hyponymy is similar to the notion of embeddedness
Human Female (Female is a more specific concept than Human)
Holonomy and Meronomy describe relationships between an object and its
parts:
tree is a holonym of bark (tree has bark)
bark is a meronym of tree (bark is a part of tree)
93
Language Models and Word
Prediction
94
95
(Statistical) Language Model
A (statistical) language model is a probability
distribution over words or word sequences.
In practice, a language model gives the
probability of a certain word sequence being
“valid”.
Validity in this context does not need to mean
grammatical validity at all.
97
Words: Frequency and Rank
Frequency: a the number of
occurences of a word in the given
document or corpus.
Rank: position occupied by a word
within a given document or a corpus.
A word with the highest frequency will
have the highest rank.
98
Part of Speech Tagging
99
Parts of Speech
Idea:
classify words according to their grammatical
categories
Categories = part of speech, word classes, POS,
POS tags
Basic categories / tags:
noun, verb, pronoun, preposition, adverb,
conjunction, participle, article
10
Parts of Speech: Closed vs. Open
10
Parts of Speech Tagging
Assigning a part-of-speech (POS) to each
word in a text.
Words often have more than one POS.
example: book
VERB: Book that flight
NOUN: Hand me that book
10
Sample Tagged Sentence
There/PRO were/VERB 70/NUM children/NOUN
there/ADV ./PUNC
Preliminary/ADJ findings/NOUN were/AUX
reported/VERB in/ADP today/NOUN ’s/PART
New/PROPN England/PROPN Journal/PROPN
of/ADP Medicine/PROPN
10
Parts of Speech: Tagset Example
Parts of Speech in the Universal Dependencies tagset
10
Parts of Speech Tagging: Motivation
Can be useful for other NLP tasks
Parsing: POS tagging can improve syntactic parsing
MT: reordering of adjectives and nouns (say from Spanish to
English)
Sentiment or affective tasks: may want to distinguish
adjectives or other POS
Text-to-speech (how do we pronounce “lead” or "object"?)
Or linguistic or language-analytic computational tasks
Need to control for POS when studying linguistic change like
creation of new words, or meaning shift
Or control for POS in measuring meaning similarity or
difference
10
Hidden Markov Model
P(Bigram) Estimate
0.65
ARTICLE VERB P(ARTICLE|<S>) 0.71
P(NOUN|<S>) 0.29
0.71
P(NOUN|ARTICLE) 1.00
0.43 P(VERB|NOUN) 0.43
<s> 1.00 0.74
0.35 P(NOUN|NOUN) 0.13
P(PREPOSITION|NOUN) 0.44
0.29
0.26 P(NOUN|VERB) 0.35
10
Example: All Possible Sequences
flies/V like/V a/V flower/V
107
Spelling Correction
10
Spelling: Real-world Problems
Non-word error detection
graffe instead of giraffe
Isolated-word error correction
Context-dependent error detection and
correction
typos
three instead of there
homophone or near-homophones
dessert instead of desert or piece for peace
10
How Similar are Two Strings?
The user typed “graffe”. Which string is
closest?
graf
graft
grail
giraffe
11
How Similar are Two Strings?
Why? Computational Biology:
Align two sequences of nucleotides:
AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC
Resulting alignment:
-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---
TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC
11
How Similar are Two Strings?
The user typed “graffe”. Which string is
closest?
graf deleted “i” deleted “fe”
graft deleted “i” “e” and substituted “f”
grail deletion and substitution
giraffe correct form (we need to insert
“i”)
11
Edits with Costs: Edit Distance
Each edit operation can have its cost:
cost(d) = cost(i) = cost(s) = 1
Edit distance = 5
11
Edit Path
One of the edit paths (we want minimum # of edits):
11
Parsing
11
Parse Tree: Example
Parse tree for:
I prefer a morning flight
S
NP VP
Pro Verb NP
S NP VP
NP Pronoun
| Proper-Noun
a Nom Noun
| Det Nominal
Nominal Nominal Noun
| Noun Noun flight
VP Verb
| Verb NP
| Verb NP PP
| Verb PP morning
NP Preposition NP
11
Parse Tree: Example
Parse tree for:
I prefer a morning flight
S
NP VP
Pro Verb NP
S NP VP
NP Pronoun
| Proper-Noun
a Nom Noun
| Det Nominal
Nominal Nominal Noun
| Noun Noun flight
VP Verb
| Verb NP
| Verb NP PP
| Verb PP morning
NP Preposition NP
11
Parse Tree: Example
Parse tree for:
I prefer a morning flight
S
NP VP
Pro Verb NP
S NP VP
NP Pronoun
| Proper-Noun
a Nom Noun
| Det Nominal
Nominal Nominal Noun
| Noun Noun flight
VP Verb
| Verb NP
| Verb NP PP
| Verb PP morning
NP Preposition NP
11
Parse Tree: Example
Parse tree for:
I prefer a morning flight
S
NP VP
Pro Verb NP
S NP VP
NP Pronoun
| Proper-Noun
a Nom Noun
| Det Nominal
Nominal Nominal Noun
| Noun Noun flight
VP Verb
| Verb NP
| Verb NP PP
| Verb PP morning
NP Preposition NP
11
Parse Tree: Example
Parse tree for:
I prefer a morning flight
S
NP VP
Pro Verb NP
S NP VP
NP Pronoun
| Proper-Noun
a Nom Noun
| Det Nominal
Nominal Nominal Noun
| Noun Noun flight
VP Verb
| Verb NP
| Verb NP PP
| Verb PP morning
NP Preposition NP
12
Syntactic Parsing: Sentence Tree
I prefer a morning flight
Syntactic
parsing
S
NP VP
Pro Verb NP
a Nom Noun
Noun flight
morning
12
Parsing
The task of determining the parts of speech,
phrases, clauses, and their relationship to one
another is called parsing.
12
Ambiguous Grammar
S
S
S Op S
S Op S
Num +
S Op S - Num S Op S
Digit
Num + Num Digit Num - Num
1
Digit Digit 3 Digit Digit
1 2 2 3
12
Ambiguous Grammar
S
S
S Op S
S Op S
Num +
S Op S - Num S Op S
Digit
Num + Num Digit Num - Num
1
Digit Digit 3 Digit Digit
1 2 2 3
12
Text Classification
12
What is Classification?
Definition:
Classification is a process of categorizing data into
distinct classes. In practice it means developing a
model that maps input data to a discrete set of
labels / targets. Classification can be:
binary - there is only two classes: yes / no, true /
false, spam / not spam
multi-class - there are multiple classes available,
only one is assigned
multi-label - multiple classes an be assigned
12
Main Machine Learning Categories
Supervised learning Unsupervised learning Reinforcement learning
12
Reality versus Model
Reality
Reality
(unknown function)
y = f(x) y
x
Learning
Model
Model
(approximated function)
y = h(x) y
x
12
Supervised Learning with ML
input Training
label
input machine
learning
feature algorithm
extractor
features
Prediction
input
classifier
output
model
feature
label
extractor
features
12
Text Classification: the Idea
General idea
raw
text
classifier model output
label
In practice
pre-
raw
text processor
classifier
output
model
feature
extractor label
features
13
Text Classification: Applications
Sentiment / opinion analysis
Spam detection
Gender identification
Authorship identification
Language identification
Assigning subject categories, topics, or genres
…
131
Text Classification: Applications
raw
spam raw
positive
text classifier text classifier
model model
ham negative
13
Text Classification: Rule-Based
Rules based on combinations of words or other
features
spam: black-list-address OR (“dollars” AND “you have
been selected”)
Accuracy can be high
If rules carefully refined by expert
But building and maintaining these rules is
expensive
133
Decision Tree: Spam Filter
13
Text Training Set (Auto) Labeling
input input input input input input input Training set
😄 😓 😄 😄 😓 😓 😓
input input input input input input input
Look for positive (happy) and negative (sad, angry) emoticons to decide label (Twitter, Facebook, etc.)
★★ ★ ★★ ★★ ★★ ★★ ★
input input input input input input input
Starred reviews: ★★★★ or more stars positive | ★★★ or less negative (Amazon, IMdB, etc.)
13
Text Classification: Supervised ML
Various Machine Learning supervised learning
classifier approaches can be employed:
Naïve Bayes
Logistic regression
Neural networks
k-Nearest Neighbors
etc.
136
136
Text Classification: Feature Extraction
General idea
raw
text
classifier model output
label
In practice
Text is messy.
pre- How do we
input processor extract features?
classifier
output
model
feature
extractor label
features
13
Bag of Words: the Idea
FIXED size
Feature vector
13
Bag of Words: the Idea
Some document:
I love this movie! It's sweet, but
with satirical humor. The dialogue
is great and the adventure scenes
are fun... It manages to be
whimsical and romantic while
laughing at the conventions of the
fairy tale genre. I would
recommend it to just about
anyone. I've seen it several times,
and I'm always happy to see it
again whenever I have a friend
who hasn't seen it yet!
vector
141
Bag of Words: Document Vector
Pre-defined Vocabulary:
she want to walk drive fly there or
143
Document Vectors in Vector Space
145
Classification with k-Nearest
Neighbors
146
kNN: Text Classification
147
kNN: Text Classification
Class A
Class B
148
kNN: Text Classification
149
Classical NLP vs Deep Learning NLP
Source: https://www.oreilly.com/library/view/python-natural-language/9781787121423/6f015f49-58e9-4dd1-8045-b11e7f8bf2c8.xhtml
150
Sentiment Analysis
151
Spam Detection: Learning
Training set Learning
Vocabulary V
Naive Bayes Classifier:
word1 rolex word3 features
replica word5 word6 word7
x1 0 0 1 0 1 1 1 y1=HAM 𝑎𝑟𝑔𝑚𝑎𝑥
𝑵
x3 0 1 0 1 0 1 1 y3=SPAM 𝑁
𝑃(𝑦 ) =
. . .
𝑁
. . .
. . . 𝑐𝑜𝑢𝑛𝑡(𝑥 , 𝑦 )
𝑃(𝑥 | 𝑦 ) =
∑ ∈ 𝑐𝑜𝑢𝑛𝑡(𝑥, 𝑦 )
xN-2 1 1 1 1 0 1 1 yN-2=HAM
xN-1 1 1 0 1 0 0 1 yN-1=SPAM
xN 1 0 0 1 0 0 1 yN=HAM
x1, x2, x3, ..., xN-2, xN-1, xN - feature vectors (in bold) | y1, y2, y3, ..., yN-2, yN-1, yN - labels
152
Spam Detection: Learning
Training set Learning
Vocabulary V
Naive Bayes Classifier:
word1 rolex word3 features
replica word5 word6 word7
x1 0 0 1 0 1 1 1 y1=HAM 𝑎𝑟𝑔𝑚𝑎𝑥
𝑵
x3 0 1 0 1 0 1 1 y3=SPAM 𝑁 5
𝑃(𝑦 = 𝐻𝐴𝑀) = =
𝑁 7
𝑁 2
𝑃(𝑦 = 𝑆𝑃𝐴𝑀) = =
𝑁 7
x4 1 1 1 1 0 0 0 y4=HAM
𝑃(𝑥 = 𝑟𝑜𝑙𝑒𝑥| 𝑦 = 𝑆𝑃𝐴𝑀) =
𝑐𝑜𝑢𝑛𝑡(𝑥 = 𝑟𝑜𝑙𝑒𝑥, 𝑦 = 𝑆𝑃𝐴𝑀) 2
= =
x5 1 1 1 1 0 1 1 y5=HAM ∑ ∈ 𝑐𝑜𝑢𝑛𝑡(𝑥, 𝑦 = 𝑆𝑃𝐴𝑀) 8
and so on...
x6 1 1 0 1 0 0 1 y6=SPAM
x7 1 0 0 1 0 0 1 y7=HAM
x1, x2, x3, ..., xN-2, xN-1, xN - feature vectors (in bold) | y1, y2, y3, ..., yN-2, yN-1, yN - labels
153
Spam Detection: Learning
Training set Learning
Vocabulary V
Naive Bayes Classifier:
word1 rolex word3 features
replica word5 word6 word7
x1 0 0 1 0 1 1 1 y1=HAM 𝑎𝑟𝑔𝑚𝑎𝑥
𝑵
𝑁
x3 0 1 0 1 0 1 1 y3=SPAM 𝑃(𝑦 ) =
𝑁
or
. . .
. . . equiprobable (all classes
. . . have equal probability)
can be determined by
xN-1 1 1 0 1 0 0 1 yN-1=SPAM experts in the area
xN 1 0 0 1 0 0 1 yN=HAM
x1, x2, x3, ..., xN-2, xN-1, xN - feature vectors (in bold) | y1, y2, y3, ..., yN-2, yN-1, yN - labels
154
Sentiment Analysis: Motivation
Movie: is this review positive or negative?
Products: what do people think about the new
iPhone?
Public sentiment: what is consumer confidence?
Politics: what do people think about this
candidate or issue?
Prediction: predict election outcomes or market
trends from sentiment
155
Sentiment Analysis: Twitter Mood
source: https://arxiv.org/pdf/1010.3003.pdf
156
Sentiment Analysis: Tweets
source: https://www.csc2.ncsu.edu/faculty/healey/tweet_viz/tweet_app/
157
Sentiment Analysis: Text and Polls
source: https://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/viewFile/1536/1842
158
Scherer Typology of Affective States
Emotion: brief organically synchronized … evaluation of a major event
angry, sad, joyful, fearful, ashamed, proud, elated
Mood: diffuse non-caused low-intensity long-duration change in subjective
feeling
cheerful, gloomy, irritable, listless, depressed, buoyant
Interpersonal stances: affective stance toward another person in a specific
interaction
friendly, flirtatious, distant, cold, warm, supportive, contemptuous
Attitudes: enduring, affectively colored beliefs, dispositions towards objects or
persons
liking, loving, hating, valuing, desiring
Personality traits: stable personality dispositions and typical behavior
tendencies
nervous, anxious, reckless, morose, hostile, jealous
159
Point in Space Based on Distribution
Each word = a vector
not just "good" or "word45"
Similar words: “nearby in semantic space"
We build this space automatically by seeing
which words are nearby in text
160
Vector Semantics: Words as Vectors
Source: Signorelli, Camilo & Arsiwalla, Xerxes. (2019). Moral Dilemmas for Artificial Intelligence: a position paper on an application of
Compositional Quantum Cognition
161
Information Extraction
162
Information Extraction
Information extraction (IE) is the task of
automatically extracting structured information
from unstructured and/or semi-structured
machine-readable documents and other
electronically represented sources.
In most of the cases this activity concerns
processing human language texts by means of
natural language processing (NLP).
Structured data:
Organization Location
BBDO South Atlanta
Georgia-Pacific Atlanta
164
Information Extraction Architecture
Raw text POS tagged sentences
Chunked
Sentences
sentences
Relation
Tokenization
Extraction
Tokenized
sentences
165
Sample POS-Tagged Sentence
There/PRO were/VERB 70/NUM children/NOUN
there/ADV ./PUNC
Preliminary/ADJ findings/NOUN were/AUX
reported/VERB in/ADP today/NOUN ’s/PART
New/PROPN England/PROPN Journal/PROPN
of/ADP Medicine/PROPN
166
Named Entity Recognition
Named-entity recognition (NER):
also known as: entity identification, entity
chunking, and entity extraction
a subtask of NLP that seeks to locate and classify
named entities mentioned in unstructured text
into pre-defined categories such as:
person names, organizations, locations, medical
codes, time expressions, quantities, monetary values,
percentages, etc.
167
Named Entities
Named entity, in its core usage, means anything that can
be referred to with a proper name.
168
Sample NER-Tagged Text
Citing high fuel prices, [United Airlines]ORG said [Friday]TIME it has
increased fares by [$6]MONEY per round trip on flights to some cities
also served by lower-cost carriers. [American Airlines]ORG, a unit of
[AMR Corp.]ORG, immediately matched the move, spokesman [Tim
Wagner]PER said. [United]ORG, a unit of [UAL Corp.]ORG, said the
increase took effect [Thursday]TIME and applies to most routes where it
competes against discount carriers, such as [Chicago]LOC to [Dallas]LOC
and [Denver]LOC to [San Francisco]LOC.
169
Named Entity Recognition
Unstructured data (document):
The fourth Wells account moving to another agency is the packaged paper-products division
of Georgia-Pacific Corp., which arrived at Wells only last fall. Like Hertz and the History
Channel, it is also leaving for an Omnicom-owned agency, the BBDO South unit of BBDO
Worldwide. BBDO South in Atlanta, which handles corporate advertising for Georgia-Pacific,
will assume additional duties for brands like Angel Soft toilet tissue and Sparkle paper towels,
said Ken Haldin, a spokesman for Georgia-Pacific in Atlanta.
Unstructured data (document) AFTER applying the Named Entity Recognition Process:
The fourth [Wells]ORG account moving to another agency is the packaged paper-products
division of [Georgia-Pacific Corp.]ORG, which arrived at [Wells]ORG only last fall. Like [Hertz]ORG
and the History Channel, it is also leaving for an Omnicom-owned agency, the [BBDO
South]ORG unit of [BBDO Worldwide]ORG. [BBDO South]ORG in [Atlanta]LOC, which handles
corporate advertising for Georgia-Pacific, will assume additional duties for brands like Angel
Soft toilet tissue and Sparkle paper towels, said Ken Haldin, a spokesman for [Georgia-
Pacific]ORG in [Atlanta]ORG.
170
Entity Tagging: Challenge
Consider the following four sentences:
Washington was born into slavery on the farm of James Burroughs.
Blair arrived in Washington for what may well be his last state visit
Type ambiguity!
171
Entity Tagging: Challenge
Consider the following four sentences:
[Washington]PER was born into slavery on the farm of James Burroughs.
Blair arrived in [Washington]LOC for what may well be his last state visit
172
Relation Extraction
Relation Extraction is the task of predicting
attributes and relations for entities in a
sentence.
For example, given a sentence
174
Coreference Resolution
Coreference resolution is the task of finding
all expressions that refer to the same entity
in a text.
It is an important step for a lot of higher
level NLP tasks that involve natural language
understanding such as document
summarization, question answering, and
information extraction.
175
Coreference Resolution
Source: https://huggingface.co/coref/
176
Putting it All Together
177
Embeddings as Input Features
Assumption:
“3-word sentences”
178
Embeddings as Input Features
features weights weights output
(features learned from data)
Embeddings
BINARY
answer
179
Neural Language Model
180
Voice Assistant: Alexa
Your home Amazon Alexa Cloud Service
Transcription:
“Play Two Steps Behind by Def Leppard”
Extracted meaning:
Intent: PlayMusic | Artist: Def Leppard | Song: Two Steps Behind
181
Language Models: Application
182
Language Modeling Example
Source: https://transformer.huggingface.co/doc/distil-gpt2
183
Language Models and Transformers
Source: https://deepai.org/machine-learning-model/text-generator
184
GPT
185
Generative Pre-trained Transformer 3
What is it?
Generative Pre-trained Transformer 3 (GPT-3) is an
autoregressive language model that uses deep learning to
produce human-like text. It is the third-generation
language prediction model in the GPT-n series (and the
successor to GPT-2) created by OpenAI, a San Francisco-
based artificial intelligence research laboratory. GPT-3's full
version has a capacity of 175 billion machine learning
parameters.
186
Exercise: Sentiment Analysis
https://monkeylearn.com/sentiment-analysis-
online/
https://text2data.com/Demo
http://text-processing.com/demo/sentiment/
187
Exercise: Speech-to-text
https://cloud.google.com/speech-to-text
188
Text Classification Example
Source: https://text2data.com/Demo
189
Information Extraction Example
Source: https://explosion.ai/demos/displacy-ent
190
Information Retrieval/Search Example
Source: https://www.google.com/
191
Text Summarization Example
Source: https://deepai.org/machine-learning-model/summarization
192
Machine Translation Example
193