Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
17 views193 pages

CS480 Lecture November 21st

This document provides an overview of reinforcement learning concepts and algorithms for a university-level artificial intelligence course. It discusses key reinforcement learning topics like agents and environments, exploration vs exploitation, the ε-greedy algorithm, and the Q-learning algorithm. The document also provides examples of reinforcement learning applications and how neural networks can be used for simple game playing.

Uploaded by

Rajeswari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views193 pages

CS480 Lecture November 21st

This document provides an overview of reinforcement learning concepts and algorithms for a university-level artificial intelligence course. It discusses key reinforcement learning topics like agents and environments, exploration vs exploitation, the ε-greedy algorithm, and the Q-learning algorithm. The document also provides examples of reinforcement learning applications and how neural networks can be used for simple game playing.

Uploaded by

Rajeswari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 193

CS 480

Introduction to Artificial Intelligence

November 21, 2023


Announcements / Reminders
 Please follow the Week 13 To Do List instructions (if you
haven't already)
 Written Assignment #04 due on Tuesday (11/28/23)
11:59 PM CST
 Programming Assignment #02 due on Monday
(11/27/23) 11:59 PM CST
 DON’T EXPECT EXTENSIONS!
 Final Exam date:
– Thursday 11/30/2023 (last week of classes!)
 Ignore the date provided by the Registrar

2
Plan for Today
 Casual introduction to Machine Learning
 Natural Language Understanding / Processing
Basics

3
Main Machine Learning Categories
Supervised learning Unsupervised learning Reinforcement learning

Supervised learning is one Unsupervised learning Reinforcement learning is


of the most common involves finding underlying inspired by behavioral
techniques in machine patterns within data. psychology. It is based on a
learning. It is based on Typically used in clustering rewarding / punishing an
known relationship(s) and data points (similar algorithm.
patterns within data (for customers, etc.)
example: relationship Rewards and punishments
between inputs and are based on algorithm’s
outputs). action within its
environment.
Frequently used types:
regression, and
classification.

4
What is Reinforcement Learning?
Idea:
Reinforcement learning is inspired by behavioral
psychology. It is based on a rewarding / punishing
an algorithm.

Rewards and punishments are based on algorithm’s


action within its environment.

5
RL: Agents and Environments
State

Agent
Reward

Action
Environment

6
Reinforcement Learning in Action

7
Reinforcement Learning in Action

Source: https://www.youtube.com/watch?v=x4O8pojMF0w

8
Reinforcement Learning in Action

Source: https://www.youtube.com/watch?v=kopoLzvh5jY

9
Reinforcement Learning in Action

Source: https://www.youtube.com/watch?v=Tnu4O_xEmVk

10
ANN for Simple Game Playing

UP

Game
DOWN
state

JUMP

Input Hidden Hidden Output


layer layer layer layer

11
ANN for Simple Game Playing
Current game is an input. Decisions (UP/DOWN/JUMP) are rewarded/punished.

UP

Game
DOWN
state

JUMP

Input Hidden Hidden Output


layer layer layer layer

Correct all the weights using Reinforcement Learning.

12
RL: Agents and Environments
State
What’s
inside?
Reward

Action
Environment

13
RL: Agents and Environments
State

Reward

Action
Environment

14
K-Armed Bandit Problem

15
K-Armed Bandit Problem
The K-armed bandit problem is a problem in which
a fixed limited set of resources must be allocated
between competing (alternative) choices in a way
that maximizes their expected gain.

Each choice's properties are only partially known at


the time of allocation, and may become better
understood as time passes or by allocating
resources to the choice.

16
K-Armed Bandit Problem
In the problem, each machine provides a random
reward from a probability distribution specific to
that machine, that is not known a-priori.

The objective of the gambler is to maximize the


sum of rewards earned through a sequence of lever
pulls.

17
K-Armed Bandit Problem

Bandit/Arm 1 Bandit/Arm 2 Bandit/Arm 3

33 % 52 % 78 %
current current current
success (win) success (win) success (win)
rate rate rate

Which bandit shall we play next?

18
K-Armed Bandit

Agent
Reward

Select Arm
Environment

19
Exploration vs. Exploitation
The crucial tradeoff the gambler faces at each trial
is between "exploitation" of the machine that has
the highest expected payoff and "exploration" to
get more information about the expected payoffs
of the other machines.

20
-greedy Algorithm
generate random number p  [0,1]

if (p < ) // explore

select random arm

else // exploit

select current best arm

end
21
Q-Learning Algorithm
Initialize
Set Initialize
parameters Q-table

Get No Pick a No Reference


Initialize Is goal
environment random action in
simulator reached?
state action? Q-table

Yes Yes

Pick a
Stop random
action

Apply action
Update
to
Q-table
environment

Repeat for n iterations

22
Q-Learning Algorithm
Initialize Initialize Q-table:
Set up and initialize (all values set to
Set
parameters
Initialize
Q-table
0) a table where:
 rows represent possible states
 columns represent actions

Note that additional states can be


Initialize
Get
Is goal No Pick a No Reference added to the table when
environment random action in
simulator
state
reached?
action? Q-table encountered.

Yes Yes

Pick a
Stop random
action

Apply action
Update
to
Q-table
environment

Repeat for n iterations

23
Q-Learning Algorithm
Initialize Set parameters:
Set and initialize hyperparameters
Set
parameters
Initialize
Q-table
for the Q-learning process.

Hyperparemeters include:
 chance of choosing a random
action: a threshold for choosing
Initialize
Get
Is goal No Pick a No Reference a random action over an action
environment random action in
simulator
state
reached?
action? Q-table from the Q-table
 learning rate: a parameter that
Yes Yes describes how quickly the
algorithm should learn from
rewards in different states
Stop
Pick a
random  high: faster learning with
action erratic Q-table changes
 low: gradual learning with
possibly more iterations
 discount factor: a parameter
Apply action
that describes how valuable are
Update
Q-table
to future rewards. It tells the
environment
algorithm whether it should seek
Repeat for n iterations “immediate gratification” (small)
or “long-term reward” (large)

24
Q-Learning Algorithm
Initialize Initialize simulator:
Reset the simulated environment to
Set
parameters
Initialize
Q-table
its initial state and place the agent in
a neutral state.

Get No Pick a No Reference


Initialize Is goal
environment random action in
simulator reached?
state action? Q-table

Yes Yes

Pick a
Stop random
action

Apply action
Update
to
Q-table
environment

Repeat for n iterations

25
Q-Learning Algorithm
Initialize Get environment state:
Report the current state of the
Set
parameters
Initialize
Q-table
environment. Typically a vector of
values representing all relevant
variables.

Get No Pick a No Reference


Initialize Is goal
environment random action in
simulator reached?
state action? Q-table

Yes Yes

Pick a
Stop random
action

Apply action
Update
to
Q-table
environment

Repeat for n iterations

26
Q-Learning Algorithm
Initialize Is goal reached?:
Verify if the goal of the simulation
Set
parameters
Initialize
Q-table
has been achieved. It could be
decided with the agent arriving in
expected final state or by some
simulation parameter.

Get No Pick a No Reference


Initialize Is goal
environment random action in
simulator reached?
state action? Q-table

Yes Yes

Pick a
Stop random
action

Apply action
Update
to
Q-table
environment

Repeat for n iterations

27
Q-Learning Algorithm
Initialize Pick a random action?:
Decide whether next action should
Set
parameters
Initialize
Q-table
be picked at random or not (it will be
selected based on Q-table data
then).

Use the chance of choosing a


Initialize
Get
Is goal No Pick a No Reference random action hyperparameter to
environment random action in
simulator
state
reached?
action? Q-table decide.

Yes Yes

Pick a
Stop random
action

Apply action
Update
to
Q-table
environment

Repeat for n iterations

28
Q-Learning Algorithm
Initialize Reference action in Q-table:
Next action decision will be based on
Set
parameters
Initialize
Q-table
data from the Q-table given the
current state of the environment.

Get No Pick a No Reference


Initialize Is goal
environment random action in
simulator reached?
state action? Q-table

Yes Yes

Pick a
Stop random
action

Apply action
Update
to
Q-table
environment

Repeat for n iterations

29
Q-Learning Algorithm
Initialize Pick a random action:
Pick any of the available actions at
Set
parameters
Initialize
Q-table
random. Helpful with exploration of
the environment.

Get No Pick a No Reference


Initialize Is goal
environment random action in
simulator reached?
state action? Q-table

Yes Yes

Pick a
Stop random
action

Apply action
Update
to
Q-table
environment

Repeat for n iterations

30
Q-Learning Algorithm
Initialize Apply action to environment:
Apply the action to the environment
Set
parameters
Initialize
Q-table
to change it. Each action will have its
own reward.

Get No Pick a No Reference


Initialize Is goal
environment random action in
simulator reached?
state action? Q-table

Yes Yes

Pick a
Stop random
action

Apply action
Update
to
Q-table
environment

Repeat for n iterations

31
Q-Learning Algorithm
Initialize Update Q-table:
Update the Q-table given the reward
Set
parameters
Initialize
Q-table
resulting from recently applied
action (feedback from the
environment).

Get No Pick a No Reference


Initialize Is goal
environment random action in
simulator reached?
state action? Q-table

Yes Yes

Pick a
Stop random
action

Apply action
Update
to
Q-table
environment

Repeat for n iterations

32
Q-Learning Algorithm
Initialize Stop:
Stop the learning process
Set Initialize
parameters Q-table

Get No Pick a No Reference


Initialize Is goal
environment random action in
simulator reached?
state action? Q-table

Yes Yes

Pick a
Stop random
action

Apply action
Update
to
Q-table
environment

Repeat for n iterations

33
Q-Learning Algorithm
      Actions
Q-table
      
     1 0 0 0 0
   2 0 0 0 0

States
     ... ... ... ... ...
    n 0 0 0 0
       Rewards:
   Move into car: -100
Move into pedestrian: -1000
   Move into empty space: 100
  Move into goal: 500
    

Action: Reward:
Q-table value:
Learning rate Discount
Q(state, action) = (1 − alpha) ∗ Q(state, action) + alpha ∗ (reward + gamma ∗ Q(next state, all actions))
Current value Maximum value of all actions on next state
34
Q-Learning Algorithm
      Actions
Q-table
      
     1 0 0 0 0
   2 0 0 0 0

States
     ... ... ... ... ...
    n 0 0 0 0
       Rewards:
   Move into car: -100
Move into pedestrian: -1000
   Move into empty space: 100
  Move into goal: 500
    

Action: Reward:
Q-table value:
Learning rate Discount
Q(state, action) = (1 − alpha) ∗ Q(state, action) + alpha ∗ (reward + gamma ∗ Q(next state, all actions))
Current value Maximum value of all actions on next state
35
Q-Learning Algorithm
      Actions
Q-table
      
     1 0 0 0 0
   2 0 0 0 0

States
     ... ... ... ... ...
    n 0 0 0 0
       Rewards:
   Move into car: -100
Move into pedestrian: -1000
   Move into empty space: 100
  Move into goal: 500
    

Action: Reward:
Q-table value:
Learning rate Discount
Q(state, action) = (1 − alpha) ∗ Q(state, action) + alpha ∗ (reward + gamma ∗ Q(next state, all actions))
Current value Maximum value of all actions on next state
36
Q-Learning Algorithm
      Actions
Q-table
      
     1 0 0 0 0
   2 0 0 0 0

States
     ... ... ... ... ...
    n 0 0 0 0
       Rewards:
   Move into car: -100
Move into pedestrian: -1000
   Move into empty space: 100
  Move into goal: 500
    

Action:  Reward:  -100


Q-table value:
Q(1, east) = (1 − 0.1) ∗ 0 + 0.1 ∗ (−100 + 0.6 ∗ max of Q(2, all actions))

37
Q-Learning Algorithm
      Actions
Q-table
      
     1 0 0 -10 0
   2 0 0 0 0

States
     ... ... ... ... ...
    n 0 0 0 0
       Rewards:
   Move into car: -100
Move into pedestrian: -1000
   Move into empty space: 100
  Move into goal: 500
    

Action:  Reward:  -100


Q-table value:
Q(1, east) = (1 − 0.1) ∗ 0 + 0.1 ∗ (−100 + 0.6 ∗ 0) = −10

38
Q-Learning Algorithm
     Actions
 Q-table
      
     1 0 0 -10 0
   2 0 0 0 0

States
     ... ... ... ... ...
    n 0 0 0 0
       Rewards:
   Move into car: -100
Move into pedestrian: -1000
   Move into empty space: 100
  Move into goal: 500
    

Action:  Reward:  -1000


Q-table value:
Q(2, south) = (1 − 0.1) ∗ 0 + 0.1 ∗ (−1000 + 0.6 ∗ max of Q(3, all actions))

39
Q-Learning Algorithm
     Actions
 Q-table
      
     1 0 0 -10 0
   2 0 0 0

States
-100

     ... ... ... ... ...


    n 0 0 0 0
       Rewards:
   Move into car: -100
Move into pedestrian: -1000
   Move into empty space: 100
  Move into goal: 500
    

Action:  Reward:  -1000


Q-table value:
Q(2, south) = (1 − 0.1) ∗ 0 + 0.1 ∗ (−1000 + 0.6 ∗ 0) = −100

40
Deep Reinforcement Learning
State

Reward

Action
Environment

41
RL: Agents and Environments
State

Agent
Reward

Action
Environment

42
Natural Language Processing (NLP)
Definition:
Natural language processing (NLP) is a subfield of
linguistics, computer science, and artificial intelligence
concerned with the interactions between computers and
human language, in particular how to program computers
to process and analyze large amounts of natural language
data.
Involves:
 Speech processing
 Natural language understanding
 Natural language generation

43
Computers vs Language and Speech
 Text processing: engineering practices for transforming,
normalizing, compressing or accessing textual data
 Natural language understanding / processing: the study
of methods for exploiting or generating language
represented as text, for practical tasks
 Computational linguistics: the use of computational
tools to understand or learn the structure of human
languages
 Speech processing: The study of methods for exploiting
or generating language represented as audible
waveforms, for practical tasks
44
Primary Reasons for NLP
 To enable human-machine communication
 To learn from written sources
 most information (80-90% or more) in most organizations is in
natural language (reports, order forms, bulletin boards, email,
web pages, video, audio, etc.) and not in a traditional
database!
 most of that information is now digital
 Estimate in 1998: ~60%
 Now, more than 90%!

 To advance the scientific understanding of languages


and language use

45
What Are Key NLP Goals?
 Long range perspective:
 True understanding of natural language
 Deep reasoning about texts
 Real-time spoken dialogue / translation
 Engineering perspective:
 Extract useful facts from documents
 Search the web
 Better spelling / grammar checking
 etc.

46
Core NLP Applications
 Language modeling: the task of predicting what the next
word in a sentence will be based on history of previous
words. Its goal is to learn the probability of a sequence
of words appearing in a given language.
 Text classification: the task of bucketing the text into a
known set of categories based on its content.
 Information extraction: the task of extracting relevant
information from text.
 Information retrieval: the task of finding documents /
data relevant to a specific query from a large collection.

47
Core NLP Applications
 Conversational agent: the task of building a dialogue
systems that can converse in human languages.
 Text summarization: this task aims to create short
summaries of longer documents while preserving the
core content and meaning of the text.
 Question answering: the task of building a system that
can answer questions posed in natural language.
 Machine translation: the task of converting a piece of
text from one language to another.
 Topic modeling: the task of uncovering the topical
structure of a large collection of documents.
48
Selected Real-world NLP Applications
Question
Text Information Conversationa Information
Applications classification extraction l Agent retrieval
answering
systems

Calendar
General Spam Personal Search
event Jeopardy!
applications classification
extraction
assistants engines

Industry
Social media Retail catalog Health record Financial Legal entity
specific analysis extraction analysis analysis extraction
applications

49
Spell checking

Easy
Keyword-based information retrieval

Topic modeling

Text classification

Information extraction

Closed domain conversational agent


Medium

Text summarization

Question answering

Machine translation
Hard

Open domain conversational agent


NLP Tasks in Order of Difficulty

50
Natural Language

51
What is Language?
Language is a structured system of communication that
involves complex combinations of its consituent
components, such as characters, words, sentences, etc.

We can think of human language as composed of four


major building blocks:
 phonemes
 morphemes and lexemes
 syntax
 context / semantics
52
Language: Phonemes
Phonemes are the smallest units of sound in a language.
They may not have any meaning by themselves.

Source: https://www.englishclub.com/pronunciation/phonemic-chart.htm

53
Language: Morphemes and Lexemes
A morpheme is the smallest unit of language that has
meaning. It is a combination of phonemes. Not all
morphemes are words. All prefixes and suffixes are
morphemes.
unbreakble cats
un + break + able cat + s
Lexemes are structural variations of morphemes related
to one another by meaning. For example
run and running
belong to the same lexeme form.

54
Language: Syntax
Syntax is a set of rules used to construct grammatically
correct sentences out of words and phrases in a language.
A common approach to representing sentences is a parse
tree.

Legend:
S for sentence
NP for noun phrase
VP for verb phrase
V for verb
D for determiner, in this instance the definite article "the"
N for noun
55
Language: Context / Semantics
Context / semantics is how various parts in a language
come together to convey a particular meaning. Context
incluedes long-term references, world knowledge, and
common sense along with the literal meaning of words
and phrases.

56
Blocks of Language | Applications
Contex / Semantics
Summarization
(meaning) Topic modeling
Sentiment analysis

Parsing
Syntax
Entity extraction
(phrases and sentences)
Relation extraction

Tokenization
Morphemes and lexemes
Word embeddings
(words)
Part-of-speech tags

Speech to text
Phonemes
Speaker Identification
(speech and sounds)
Text to speech

Blocks of language Applications


57
Key NLP Tasks: Syntax
 Lemmatization: reducing the various inflected forms of a word
into a single form for easy analysis.
 Morphological segmentation: dividing words into individual
units called morphemes. undivided -> un - divided
 Word segmentation: dividing a large piece of continuous text
into distinct units.
 Part-of-speech tagging: identifying the part of speech for every
word.
 Parsing: grammatical analysis for the provided sentence.
 Sentence breaking: placing sentence boundaries on a large piece
of text.
 Stemming: It involves cutting the inflected words to their root
form.

58
Key NLP Tasks: Semantics
 Named entity recognition (NER): determining the parts of a text
that can be identified and categorized into preset groups.
Examples of such groups include names of people and names of
places.
 Word sense disambiguation: giving meaning to a word based on
the context.
 Natural language generation: using databases to derive
semantic intentions and convert them into human language.

59
Why is NLP Hard?
 Complexity
 Ambiguity
 “I made her duck”
 Common knowledge is required for understanding
 Fuzzy and probabilistic
 Creativity
 Diversity
 “Living” / evolving languages
 neologisms, etc.

60
AI vs ML vs NLP

AI
ML NLP

DL

AI - Artificial Intelligence DL - Deep Learning


ML - Machine Learning NLP - Natural Language Processing

61
NLP as a System

62
Basic NLP Spoken Language Pipeline
Morphological
Speech Contextual Information
and lexical Parsing
analysis analysis
reasoning retrieval

Application
reasoning
and execution

Question
Speech Morphological Syntactic Utterance
answering
synthesis realization Realization planning
systems

Pronounciation Morphological Lexicon and Discourse Domain


model rules grammar context knowledge

Phonology Morphology Syntax Semantics Reasoning

63
NLP vs. Adjacent Fields

64
Basic NLP Spoken Language Pipeline

Natural Language Understanding (NLU)

Morphological
Speech Contextual Information
and lexical Parsing
analysis analysis
reasoning retrieval

Application
reasoning
and execution

Question
Speech Morphological Syntactic Utterance
answering
synthesis realization Realization planning
systems

Natural Language Generation (NLG)

Natural Language Processing (NLP)

65
Common Lexical Categories
Lexical
Definition* Example
category
Adjective A word or phrase naming an attribute, added to or The quick red fox jumped over the lazy
grammatically related to a noun to modify or describe it brown dogs.
Adverb A word or phrase that modifies or qualifies an adjective, The dogs lazily ran down the field after
verb, or other adverb, or a word group, expressing a the fox.
relation of place, time, circumstance, manner, cause,
degree, etc.
Conjunction A word that joins two words, phrases, or clauses The quick red fox and the silver coyote
jumped over the lazy brown dogs.
Determiner A modifying word that determines the kind of reference The quick red fox jumped over the lazy
a noun or noun group has, for example a, the, very brown dogs.
Noun A word used to identify any of the class of people, The quick red fox jumped over the lazy
places, or things, or to name a particular one of these. brown dogs.
Preposition A word governing, and usually preceding, a noun or The quick red fox jumped over the lazy
pronoun and expressing a relation to another word or brown dogs.
element in the clause
Verb A word used to describe an action, state, or occurence, The quick red fox jumped over the lazy
and forming the main part of the predicate of a brown dogs.
sentence, such as hear, become, and happen
* all definitions are taken from the New Oxford American Dictionary, 2nd Edition

66
Morphology
 Morphology is a study of the internal Suffix Example Verb

structure of words -ation nomination nominate


-ee appointee appoint
 Words consist of: -ure closure close
 lexeme (root form) -al refusal refuse
-er runner run
 affixes (suffix, prefix)
 Morphology has two categories: Suffix Example Adjective
-dom freedom free
 inflectional - does not create new -hood likelihood likely
lexemes (happier) -ist realist real

 derivational - creates new lexemes -th warmth warm


-ness happiness happy
(unhappy)
 Inlectional morphemes carry Suffix Example Marked form
N/A look base form
grammatical meaning (plural -s), but
-ing looking gerund form
they do not change the meaning of the -s looks third person singular
word -ed loooked past tense form
-en taken past participle

67
Common Phrasal Categories
Type Example Comments

Adjective The unusually red fox jumped over the The adverbs unusually and exceptionally modify the
exceptionally lazy dogs. adjectives red and lazy, respectively, to create adjectival
phrases.
Adverb The dogs almost always ran down the The adverb almost modifies the adverb always to create
field after the fox. adverbial phrase.

Conjunction The quick red fox as well as the silver Though this is somewhat of an exceptional case, you can
coyote jumped over the lazy brown dogs. see that the phrase as well as performs the same
function as a conjunction such as and.
Noun The quick red fox jumped over the lazy The noun fox and its modifiers the, quick, and red create
brown dogs. a noun phrase, as does the noun dogs and its modifiers
the, lazy, and brown.
Preposition The quick red fox jumped over the lazy The preposition over and the noun phrase the lazy brown
brown dogs. dogs form a prepositional phrase that modifies the verb
jumped.
Verb The quick red fox jumped over the lazy The verb jumped and its modifier the prepositional
brown dogs. phrase over the lazy brown dogs form a verb phrase.

68
Parsing
The task of determining the parts of speech,
phrases, clauses, and their relationship to one
another is called parsing.

69
Basic NLP Spoken Language Pipeline
Morphological
Speech Contextual Information
and lexical Parsing
analysis analysis
reasoning retrieval

Application
reasoning
and execution

Question
Speech Morphological Syntactic Utterance
answering
synthesis realization Realization planning
systems

Pronounciation Morphological Lexicon and Discourse Domain


model rules grammar context knowledge

Phonology Morphology Syntax Semantics Reasoning

70
Knowledge Levels / Forms for NLP
Level Description

Phonetic and phonological Concerned with how the words are related to sounds that realize them. Such
knowledge knowledge is crucial for speech-based systems.
Morphological knowledge Concerned with how words are constructed from the basic meaning units called
morphemes.
Syntactic knowledge Concerned with how words can be put together to form correct sentences and
determines what structural role each word plays in the sentence and what
phrases are subparts of what other phrases.
Semantic knowledge Concerned with what the words mean and how these meanings combine in
sentences to form sentence meanings. This is the study of context-independent
meaning - the meaning a sentence has regardless of the context in which it is
used.
Pragmatic knowledge Concerned with how sentences are used in different situations and how use
affects the interpretation of the sentence.
Discourse knowledge Concerned with how the immediately preceding sentences affect the
interpretation of the next sentence. This information is especially important for
interpreting pronouns and for interpreting the temporal aspects of the
information.
World knowledge Includes the general knowledge about the structure of the world that language
users must have in order to, for example, maintain a conversation. It includes
what each language user must know about the other user’s beliefs and goals.

71
(English) Syntax
The structure of words and phrases Examples:
within a sentence:
 Different formalisms, coming
from the American (phrase
structure) and European
(dependency grammar)
structuralist traditions
Applications:
 Part-of-speech tagging
 Entity extraction
 Syntactic parsing (Context-Free
Grammar)
 Syntactic parsing (dependencies)

72
Semantics
The representation of meaning in Example:
language:
 at different levels: lexical,
sentential, textual
x [bird(x)  fly(x)]
 logical formalisms: reference and
truth conditions

Applications:
 Word embedding / encoding
 Lexical resources
 Semantic role labeling

73
Pragmatics
How language is used to achieve Examples:
specific intentions:
 conversational implicatures: how “I ate most of your cookies”
I interpret what you say because
of what I assume you are trying
I did not eat all of your cookies
to do
 speech acts
“Where does your brother live?”

Applications:
I do not know where your brother
 Speech act labeling
lives
 Discourse structure parsing
 Dialogue systems

74
Structure / Rank Levels for NLP
“So, what do you
Discourse / text
think?”
“I level
S disagree...”
Clause / sentence
NP VP level

NP PP
Group / phrase
level
V NP AJP

DT N1 VB VVG DT DPS N2 PRP AV AJ Word


level
The lecturer’s teaching all his courses with great class

Morpheme
lectur er ‘s teach ing course s
level

75
NLU: Flow of Information

Syntactic Contextual
Raw text input structure interpretation Application
Parsing
(words) and  Reasoning
logical form final meaning

76
Automated Text Processing
The task of automatic processing of text is to extract a
numerical representation of the meaning of that text. This
is the natural language understanding (NLU) part of NLP.
The numerical representation of the meaning of natural
language usually takes the form of a vector called an
embedding.

Input: Text Output:


Natural Language Understanding
Vector
(natural (numbers,
(rules, patterns or encoder)
language) embedding)

77
Automated Text Processing
The task of automatic processing of text is to extract a
numerical representation of the meaning of that text. This
is the natural language understanding (NLU) part of NLP.
The numerical representation of the meaning of natural
language usually takes the form of a vector called an
embedding.

Input: Text Output:


Natural Language Understanding
Pre- Vector
(natural processing (numbers,
(rules, patterns or encoder)
language) embedding)

78
Text Pre-Processing

79
Character Encoding: ASCII

Source: https://www.sciencebuddies.org/science-fair-projects/references/ascii-table

80
Character Encoding: ISO 8859-1 Latin 1

Source: https://visual-integrity.com/iso-8859/

81
Character Encoding: Unicode (Sample)

Source: https://www.vertex42.com/ExcelTips/unicode-symbols.html

82
Basic Pre-Processing: Normalization
Document(s) / text level:

Sentence
Text Sentences
tokenization / segmentation

Tokenized sentence level:

Lowercasing / case folding

Removal of punctuation
and stop words
Tokenized
sentence
Stemming

Lemmatization

Note: depending on the nature of data, additional pre-processing steps may be required / important.

83
Exercise: Tokenization
http://text-processing.com/demo/tokenize/

84
Pre-processing: Lowercasing
Some applications (eg. Information Retrieval, search)
reduce all letters to lower case:
 users tend to use lower case
 possible exception: upper case in mid-sentence?
 General Motors
 Fed vs. fed

For sentiment analysis, topic modeling:


 preserving case is important (US vs. us)

85
Stemming: Before and After
Before: After:

For example compressed and For exampl compress and


compression are both compress ar both accept as
accepted as equivalent to equival to compress.
compress.

86
Stemming vs. Lemmatization
Stemming Lemmatization

adjustable  adjust was  (to) be


better  good
meeting  meet meeting  meeting
studies  studi studies  study
studying  study studying  study

87
Tokenization / Lemmatization Example
Sentence input:
Chaplin wrote, directed, and composed music for most of his films.

Tokenization:
Chaplin wrote , directed , and composed music for most of his films .

Chaplin wrote, directed, and composed music for most of his films.

Lemmatization:
Chaplin write , direct , and compose music for most of he film .

Chaplin wrote, directed, and composed music for most of his films.

88
Exercise: Stemming
http://text-processing.com/demo/stem/

89
Pre-processing: Stop Words
Very common words (articles, propositions, pronouns,
conjunctions,etc.) that do not add much information (but
take up space) are called stop words and are frequently
filtered out.
 Examples in English: an, the, a, for, is
 Filtering based on the stop (word) list
 generated based on collection frequency
 Tools: RegEx + stop list, NLP libraries have their own
stop lists
 Careful: sometimes it may lead to removing important
information
90
Additional Pre-processing Steps
 Additional normalization
 in addition to stemming, lemmatization:
 standardizing abbreviations (eg. expanding), hyphenations, digits to
text (9 to nine) conversions, etc.

 Language detection
 Code mixing
 embedding of linguistic units such as phrases, words, and
morphemes of one language into an utterance of another
language
 Transliteration
 converting between different writing systems

91
Relationships Between Words

92
Lexical Relationships
Lexical relationships are the connections established between one word and
another:
 Synonymy is the idea that some words have the same meaning as others
 quick is similar to fast
 Antonymy is precisely the opposite of synonymy
 good is the opposite bad
 Hyponymy is similar to the notion of embeddedness
 Human  Female (Female is a more specific concept than Human)
 Holonomy and Meronomy describe relationships between an object and its
parts:
 tree is a holonym of bark (tree has bark)
 bark is a meronym of tree (bark is a part of tree)

93
Language Models and Word
Prediction

94
95
(Statistical) Language Model
 A (statistical) language model is a probability
distribution over words or word sequences.
 In practice, a language model gives the
probability of a certain word sequence being
“valid”.
 Validity in this context does not need to mean
grammatical validity at all.

Use lexical resources (corpora) to build LM.


96
Word Prediction
Words do not randomly appear in text.

The probability of a word appearing in a text is to a large


degree related to the words that have appeared before it.
 e. g. I’d like to make a collect. . .
 call is the most likely next word, but other words such
as telephone, international. . . are also possible.
 other (very common) words are unlikely (e. g. dog,
house).

97
Words: Frequency and Rank
 Frequency: a the number of
occurences of a word in the given
document or corpus.
 Rank: position occupied by a word
within a given document or a corpus.
A word with the highest frequency will
have the highest rank.

98
Part of Speech Tagging

99
Parts of Speech
 Idea:
 classify words according to their grammatical
categories
 Categories = part of speech, word classes, POS,
POS tags
 Basic categories / tags:
 noun, verb, pronoun, preposition, adverb,
conjunction, participle, article

10
Parts of Speech: Closed vs. Open

10
Parts of Speech Tagging
 Assigning a part-of-speech (POS) to each
word in a text.
 Words often have more than one POS.
 example: book
 VERB: Book that flight
 NOUN: Hand me that book

10
Sample Tagged Sentence
There/PRO were/VERB 70/NUM children/NOUN
there/ADV ./PUNC
Preliminary/ADJ findings/NOUN were/AUX
reported/VERB in/ADP today/NOUN ’s/PART
New/PROPN England/PROPN Journal/PROPN
of/ADP Medicine/PROPN

10
Parts of Speech: Tagset Example
Parts of Speech in the Universal Dependencies tagset

10
Parts of Speech Tagging: Motivation
 Can be useful for other NLP tasks
 Parsing: POS tagging can improve syntactic parsing
 MT: reordering of adjectives and nouns (say from Spanish to
English)
 Sentiment or affective tasks: may want to distinguish
adjectives or other POS
 Text-to-speech (how do we pronounce “lead” or "object"?)
 Or linguistic or language-analytic computational tasks
 Need to control for POS when studying linguistic change like
creation of new words, or meaning shift
 Or control for POS in measuring meaning similarity or
difference
10
Hidden Markov Model
P(Bigram) Estimate
0.65
ARTICLE VERB P(ARTICLE|<S>) 0.71

P(NOUN|<S>) 0.29
0.71
P(NOUN|ARTICLE) 1.00
0.43 P(VERB|NOUN) 0.43
<s> 1.00 0.74
0.35 P(NOUN|NOUN) 0.13

P(PREPOSITION|NOUN) 0.44
0.29
0.26 P(NOUN|VERB) 0.35

NOUN PREPOSITION P(ARTICLE|VERB) 0.65

0.44 P(ARTICLE|PREPOSITION) 0.74


0.13
P(NOUN|PREPOSITION) 0.26

Probability of occurrence of a sequence of categories (tags):


P(<s>, ARTICLE, NOUN, VERB, NOUN) =
 P(ART|<s>) * P(N|ART) * P(V|N) * P(N|V) = 0.71 * 1.00 * 0.43 * 0.35 = 0.107

10
Example: All Possible Sequences
flies/V like/V a/V flower/V

flies/N like/N a/N flower/N


<s>

flies/P like/P a/P flower/P

flies/ART like/ART a/ART flower/ART

Every sequence can be assigned a probability:

107
Spelling Correction

10
Spelling: Real-world Problems
 Non-word error detection
 graffe instead of giraffe
 Isolated-word error correction
 Context-dependent error detection and
correction
 typos
 three instead of there
 homophone or near-homophones
 dessert instead of desert or piece for peace

10
How Similar are Two Strings?
 The user typed “graffe”. Which string is
closest?
 graf
 graft
 grail
 giraffe

 Why? Spell checking

11
How Similar are Two Strings?
 Why? Computational Biology:
 Align two sequences of nucleotides:
AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC

 Resulting alignment:
-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---
TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC

11
How Similar are Two Strings?
 The user typed “graffe”. Which string is
closest?
 graf deleted “i” deleted “fe”
 graft deleted “i” “e” and substituted “f”
 grail deletion and substitution
 giraffe correct form (we need to insert
“i”)

11
Edits with Costs: Edit Distance
Each edit operation can have its cost:
 cost(d) = cost(i) = cost(s) = 1

Edit distance = 5

11
Edit Path
One of the edit paths (we want minimum # of edits):

11
Parsing

11
Parse Tree: Example
Parse tree for:
I prefer a morning flight
S

NP VP

Pro Verb NP

I prefer Det Nom

S  NP VP
NP  Pronoun
| Proper-Noun
a Nom Noun
| Det Nominal
Nominal  Nominal Noun
| Noun Noun flight
VP  Verb
| Verb NP
| Verb NP PP
| Verb PP morning
NP  Preposition NP

11
Parse Tree: Example
Parse tree for:
I prefer a morning flight
S

NP VP

Pro Verb NP

I prefer Det Nom

S  NP VP
NP  Pronoun
| Proper-Noun
a Nom Noun
| Det Nominal
Nominal  Nominal Noun
| Noun Noun flight
VP  Verb
| Verb NP
| Verb NP PP
| Verb PP morning
NP  Preposition NP

11
Parse Tree: Example
Parse tree for:
I prefer a morning flight
S

NP VP

Pro Verb NP

I prefer Det Nom

S  NP VP
NP  Pronoun
| Proper-Noun
a Nom Noun
| Det Nominal
Nominal  Nominal Noun
| Noun Noun flight
VP  Verb
| Verb NP
| Verb NP PP
| Verb PP morning
NP  Preposition NP

11
Parse Tree: Example
Parse tree for:
I prefer a morning flight
S

NP VP

Pro Verb NP

I prefer Det Nom

S  NP VP
NP  Pronoun
| Proper-Noun
a Nom Noun
| Det Nominal
Nominal  Nominal Noun
| Noun Noun flight
VP  Verb
| Verb NP
| Verb NP PP
| Verb PP morning
NP  Preposition NP

11
Parse Tree: Example
Parse tree for:
I prefer a morning flight
S

NP VP

Pro Verb NP

I prefer Det Nom

S  NP VP
NP  Pronoun
| Proper-Noun
a Nom Noun
| Det Nominal
Nominal  Nominal Noun
| Noun Noun flight
VP  Verb
| Verb NP
| Verb NP PP
| Verb PP morning
NP  Preposition NP

12
Syntactic Parsing: Sentence  Tree
I prefer a morning flight
Syntactic
parsing
S

NP VP

Pro Verb NP

I prefer Det Nom

a Nom Noun

Noun flight

morning

12
Parsing
The task of determining the parts of speech,
phrases, clauses, and their relationship to one
another is called parsing.

12
Ambiguous Grammar
S
S

S Op S
S Op S

Num +
S Op S - Num S Op S

Digit
Num + Num Digit Num - Num

1
Digit Digit 3 Digit Digit

1 2 2 3

A grammar is said to be ambiguous if it can generate the


same string (here: 1 + 2 - 3) through multiple derivations.

12
Ambiguous Grammar
S
S

S Op S
S Op S

Num +
S Op S - Num S Op S

Digit
Num + Num Digit Num - Num

1
Digit Digit 3 Digit Digit

1 2 2 3

Derivations (generated by pre-order tree traversal):


Left: (S(S(S(Num(Digit 1)))(Op +)(S(Num(Digit 2))))(Op -)(S(Num(Digit 3))))
Right: (S(S(Num(Digit 1)))(Op +)(S(Num(Digit 2))(Op -)(S(Num(Digit 3)))))

12
Text Classification

12
What is Classification?
Definition:
Classification is a process of categorizing data into
distinct classes. In practice it means developing a
model that maps input data to a discrete set of
labels / targets. Classification can be:
 binary - there is only two classes: yes / no, true /
false, spam / not spam
 multi-class - there are multiple classes available,
only one is assigned
 multi-label - multiple classes an be assigned
12
Main Machine Learning Categories
Supervised learning Unsupervised learning Reinforcement learning

Supervised learning is one Unsupervised learning Reinforcement learning is


of the most common involves finding underlying inspired by behavioral
techniques in machine patterns within data. psychology. It is based on a
learning. It is based on Typically used in clustering rewarding / punishing an
known relationship(s) and data points (similar algorithm.
patterns within data (for customers, etc.)
example: relationship Rewards and punishments
between inputs and are based on algorithm’s
outputs). action within its
environment.
Frequently used types:
regression, and
classification.

12
Reality versus Model
Reality

Reality
(unknown function)
y = f(x) y
x

Learning

Model

Model
(approximated function)
y = h(x) y
x

12
Supervised Learning with ML
input Training

label
input machine
learning
feature algorithm
extractor
features

Prediction

input
classifier
output
model
feature
label
extractor
features

12
Text Classification: the Idea
General idea

raw
text
classifier model output

label

In practice

pre-
raw
text processor
classifier
output
model
feature
extractor label
features

13
Text Classification: Applications
 Sentiment / opinion analysis
 Spam detection
 Gender identification
 Authorship identification
 Language identification
 Assigning subject categories, topics, or genres
…

131
Text Classification: Applications

raw
spam raw
positive
text classifier text classifier
model model
ham negative

Spam classification Sentiment and opinion analysis

pre- positive female


raw raw
processor
text classifier text classifier
neutral
model model
feature
extractor negative male

Sentiment and opinion analysis Gender identification / Attribution

13
Text Classification: Rule-Based
 Rules based on combinations of words or other
features
 spam: black-list-address OR (“dollars” AND “you have
been selected”)
 Accuracy can be high
 If rules carefully refined by expert
 But building and maintaining these rules is
expensive

133
Decision Tree: Spam Filter

13
Text Training Set (Auto) Labeling
input input input input input input input Training set

😄 😓 😄 😄 😓 😓 😓
input input input input input input input

Look for positive (happy) and negative (sad, angry) emoticons to decide label (Twitter, Facebook, etc.)

input input input input input input input Training set

★★ ★ ★★ ★★ ★★ ★★ ★
input input input input input input input

Starred reviews: ★★★★ or more stars  positive | ★★★ or less  negative (Amazon, IMdB, etc.)

13
Text Classification: Supervised ML
 Various Machine Learning supervised learning
classifier approaches can be employed:
 Naïve Bayes
 Logistic regression
 Neural networks
 k-Nearest Neighbors
 etc.

136
136
Text Classification: Feature Extraction
General idea

raw
text
classifier model output

label

In practice
Text is messy.
pre- How do we
input processor extract features?
classifier
output
model
feature
extractor label
features

13
Bag of Words: the Idea

FIXED size

Feature vector

13
Bag of Words: the Idea
Some document:
I love this movie! It's sweet, but
with satirical humor. The dialogue
is great and the adventure scenes
are fun... It manages to be
whimsical and romantic while
laughing at the conventions of the
fairy tale genre. I would
recommend it to just about
anyone. I've seen it several times,
and I'm always happy to see it
again whenever I have a friend
who hasn't seen it yet!

Bag of words assumption: word/token position does not matter.


139
Bag of Words: the Idea
Some document: Word: Frequency:
I love this movie! It's sweet, but it 6
with satirical humor. The dialogue I 5
is great and the adventure scenes
the 4
are fun... It manages to be
whimsical and romantic while to 3
laughing at the conventions of the and 3
fairy tale genre. I would seen 2
recommend it to just about
yet 1
anyone. I've seen it several times,
and I'm always happy to see it whimsical 1
again whenever I have a friend times 1
who hasn't seen it yet! .... ...

Bag of words assumption: word/token position does not matter.


140
Bag of Words: Document Vector
Some document: Word: Frequency:
I love this movie! It's sweet, but it 6
with satirical humor. The dialogue I 5
is great and the adventure scenes
the 4
are fun... It manages to be
whimsical and romantic while to 3
laughing at the conventions of the and 3
fairy tale genre. I would seen 2
recommend it to just about
yet 1
anyone. I've seen it several times,
and I'm always happy to see it whimsical 1
again whenever I have a friend times 1
who hasn't seen it yet! .... ...

vector
141
Bag of Words: Document Vector
Pre-defined Vocabulary:
she want to walk drive fly there or

“She wants to walk there today”: Binary Document Vector


1 1 1 1 0 0 1 0

“She wants to drive there today”: Binary Document Vector


1 1 1 0 1 0 1 0

“She wants to fly or drive there today”: Binary Document Vector


1 1 1 0 1 1 1 1

Note: sentences lemmatized and lowercased.


142
Similar Documents
=
Similar Structure

143
Document Vectors in Vector Space

Note: vector space can be N-dimensional (N - feature vector length).


144
How similar are two documents?
=
How similar are their structures?
=
How close (in a vector space) are
points defined by their document
vectors

145
Classification with k-Nearest
Neighbors

146
kNN: Text Classification

147
kNN: Text Classification
Class A

Class B

148
kNN: Text Classification

149
Classical NLP vs Deep Learning NLP

Source: https://www.oreilly.com/library/view/python-natural-language/9781787121423/6f015f49-58e9-4dd1-8045-b11e7f8bf2c8.xhtml

150
Sentiment Analysis

151
Spam Detection: Learning
Training set Learning
Vocabulary V
Naive Bayes Classifier:
word1 rolex word3 features
replica word5 word6 word7

x1 0 0 1 0 1 1 1 y1=HAM 𝑎𝑟𝑔𝑚𝑎𝑥
𝑵

𝑦𝑴𝑨𝑷 ∝ 𝑦 ∈ 𝑌 𝑷(𝑦) ∗ 𝑷(𝑥 | 𝑦)


𝒊 𝟏

x2 1 0 1 1 0 1 1 y2=HAM Probability estimates (Maximium


Likelihood estimation):

x3 0 1 0 1 0 1 1 y3=SPAM 𝑁
𝑃(𝑦 ) =
. . .
𝑁
. . .
. . . 𝑐𝑜𝑢𝑛𝑡(𝑥 , 𝑦 )
𝑃(𝑥 | 𝑦 ) =
∑ ∈ 𝑐𝑜𝑢𝑛𝑡(𝑥, 𝑦 )
xN-2 1 1 1 1 0 1 1 yN-2=HAM

xN-1 1 1 0 1 0 0 1 yN-1=SPAM

xN 1 0 0 1 0 0 1 yN=HAM

x1, x2, x3, ..., xN-2, xN-1, xN - feature vectors (in bold) | y1, y2, y3, ..., yN-2, yN-1, yN - labels

152
Spam Detection: Learning
Training set Learning
Vocabulary V
Naive Bayes Classifier:
word1 rolex word3 features
replica word5 word6 word7

x1 0 0 1 0 1 1 1 y1=HAM 𝑎𝑟𝑔𝑚𝑎𝑥
𝑵

𝑦𝑴𝑨𝑷 ∝ 𝑦 ∈ 𝑌 𝑷(𝑦) ∗ 𝑷(𝑥 | 𝑦)


𝒊 𝟏

x2 1 0 1 1 0 1 1 y2=HAM Probability estimates (Maximium


Likelihood estimation):

x3 0 1 0 1 0 1 1 y3=SPAM 𝑁 5
𝑃(𝑦 = 𝐻𝐴𝑀) = =
𝑁 7
𝑁 2
𝑃(𝑦 = 𝑆𝑃𝐴𝑀) = =
𝑁 7
x4 1 1 1 1 0 0 0 y4=HAM
𝑃(𝑥 = 𝑟𝑜𝑙𝑒𝑥| 𝑦 = 𝑆𝑃𝐴𝑀) =
𝑐𝑜𝑢𝑛𝑡(𝑥 = 𝑟𝑜𝑙𝑒𝑥, 𝑦 = 𝑆𝑃𝐴𝑀) 2
= =
x5 1 1 1 1 0 1 1 y5=HAM ∑ ∈ 𝑐𝑜𝑢𝑛𝑡(𝑥, 𝑦 = 𝑆𝑃𝐴𝑀) 8

and so on...
x6 1 1 0 1 0 0 1 y6=SPAM

x7 1 0 0 1 0 0 1 y7=HAM

x1, x2, x3, ..., xN-2, xN-1, xN - feature vectors (in bold) | y1, y2, y3, ..., yN-2, yN-1, yN - labels

153
Spam Detection: Learning
Training set Learning
Vocabulary V
Naive Bayes Classifier:
word1 rolex word3 features
replica word5 word6 word7

x1 0 0 1 0 1 1 1 y1=HAM 𝑎𝑟𝑔𝑚𝑎𝑥
𝑵

𝑦𝑴𝑨𝑷 ∝ 𝑦 ∈ 𝑌 𝑷(𝑦) ∗ 𝑷(𝑥 | 𝑦)


𝒊 𝟏

x2 1 0 1 1 0 1 1 y2=HAM Probability estimates:

𝑁
x3 0 1 0 1 0 1 1 y3=SPAM 𝑃(𝑦 ) =
𝑁
or
. . .
. . .  equiprobable (all classes
. . . have equal probability)

xN-2 1 1 1 1 0 1 1 yN-2=HAM 𝑃(𝑦 = 𝐻𝐴𝑀) = 𝑃(𝑦 = 𝑆𝑃𝐴𝑀) = 0.5

 can be determined by
xN-1 1 1 0 1 0 0 1 yN-1=SPAM experts in the area

xN 1 0 0 1 0 0 1 yN=HAM

x1, x2, x3, ..., xN-2, xN-1, xN - feature vectors (in bold) | y1, y2, y3, ..., yN-2, yN-1, yN - labels

154
Sentiment Analysis: Motivation
 Movie: is this review positive or negative?
 Products: what do people think about the new
iPhone?
 Public sentiment: what is consumer confidence?
 Politics: what do people think about this
candidate or issue?
 Prediction: predict election outcomes or market
trends from sentiment

155
Sentiment Analysis: Twitter Mood

source: https://arxiv.org/pdf/1010.3003.pdf

156
Sentiment Analysis: Tweets

source: https://www.csc2.ncsu.edu/faculty/healey/tweet_viz/tweet_app/

157
Sentiment Analysis: Text and Polls

source: https://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/viewFile/1536/1842

158
Scherer Typology of Affective States
 Emotion: brief organically synchronized … evaluation of a major event
 angry, sad, joyful, fearful, ashamed, proud, elated
 Mood: diffuse non-caused low-intensity long-duration change in subjective
feeling
 cheerful, gloomy, irritable, listless, depressed, buoyant
 Interpersonal stances: affective stance toward another person in a specific
interaction
 friendly, flirtatious, distant, cold, warm, supportive, contemptuous
 Attitudes: enduring, affectively colored beliefs, dispositions towards objects or
persons
 liking, loving, hating, valuing, desiring
 Personality traits: stable personality dispositions and typical behavior
tendencies
 nervous, anxious, reckless, morose, hostile, jealous

159
Point in Space Based on Distribution
 Each word = a vector
 not just "good" or "word45"
 Similar words: “nearby in semantic space"
 We build this space automatically by seeing
which words are nearby in text

160
Vector Semantics: Words as Vectors

Source: Signorelli, Camilo & Arsiwalla, Xerxes. (2019). Moral Dilemmas for Artificial Intelligence: a position paper on an application of
Compositional Quantum Cognition
161
Information Extraction

162
Information Extraction
Information extraction (IE) is the task of
automatically extracting structured information
from unstructured and/or semi-structured
machine-readable documents and other
electronically represented sources.
In most of the cases this activity concerns
processing human language texts by means of
natural language processing (NLP).

Related: processing video, etc.


163
Information Extraction
Unstructured data (document):
The fourth Wells account moving to another agency is the packaged paper-products division
of Georgia-Pacific Corp., which arrived at Wells only last fall. Like Hertz and the History
Channel, it is also leaving for an Omnicom-owned agency, the BBDO South unit of BBDO
Worldwide. BBDO South in Atlanta, which handles corporate advertising for Georgia-Pacific,
will assume additional duties for brands like Angel Soft toilet tissue and Sparkle paper towels,
said Ken Haldin, a spokesman for Georgia-Pacific in Atlanta.

Structured data:
Organization Location
BBDO South Atlanta
Georgia-Pacific Atlanta

164
Information Extraction Architecture
Raw text POS tagged sentences

Sentence (Named) Entity


Segmentation Recognition

Chunked
Sentences
sentences

Relation
Tokenization
Extraction

Tokenized
sentences

Part of Speech Relations


Tagging (entity, relation, entity)

165
Sample POS-Tagged Sentence
There/PRO were/VERB 70/NUM children/NOUN
there/ADV ./PUNC
Preliminary/ADJ findings/NOUN were/AUX
reported/VERB in/ADP today/NOUN ’s/PART
New/PROPN England/PROPN Journal/PROPN
of/ADP Medicine/PROPN

166
Named Entity Recognition
Named-entity recognition (NER):
 also known as: entity identification, entity
chunking, and entity extraction
 a subtask of NLP that seeks to locate and classify
named entities mentioned in unstructured text
into pre-defined categories such as:
 person names, organizations, locations, medical
codes, time expressions, quantities, monetary values,
percentages, etc.

167
Named Entities
Named entity, in its core usage, means anything that can
be referred to with a proper name.

Four most common entity tags:


 PER (Person): “Marie Curie”
 LOC (Location): “New York City”
 ORG (Organization): “Stanford University”
 GPE (Geo-Political Entity): "Boulder, Colorado"

Often multi-word phrases. The term is also extended to


things that are not entities:dates, times, prices

168
Sample NER-Tagged Text
Citing high fuel prices, [United Airlines]ORG said [Friday]TIME it has
increased fares by [$6]MONEY per round trip on flights to some cities
also served by lower-cost carriers. [American Airlines]ORG, a unit of
[AMR Corp.]ORG, immediately matched the move, spokesman [Tim
Wagner]PER said. [United]ORG, a unit of [UAL Corp.]ORG, said the
increase took effect [Thursday]TIME and applies to most routes where it
competes against discount carriers, such as [Chicago]LOC to [Dallas]LOC
and [Denver]LOC to [San Francisco]LOC.

169
Named Entity Recognition
Unstructured data (document):
The fourth Wells account moving to another agency is the packaged paper-products division
of Georgia-Pacific Corp., which arrived at Wells only last fall. Like Hertz and the History
Channel, it is also leaving for an Omnicom-owned agency, the BBDO South unit of BBDO
Worldwide. BBDO South in Atlanta, which handles corporate advertising for Georgia-Pacific,
will assume additional duties for brands like Angel Soft toilet tissue and Sparkle paper towels,
said Ken Haldin, a spokesman for Georgia-Pacific in Atlanta.

Unstructured data (document) AFTER applying the Named Entity Recognition Process:
The fourth [Wells]ORG account moving to another agency is the packaged paper-products
division of [Georgia-Pacific Corp.]ORG, which arrived at [Wells]ORG only last fall. Like [Hertz]ORG
and the History Channel, it is also leaving for an Omnicom-owned agency, the [BBDO
South]ORG unit of [BBDO Worldwide]ORG. [BBDO South]ORG in [Atlanta]LOC, which handles
corporate advertising for Georgia-Pacific, will assume additional duties for brands like Angel
Soft toilet tissue and Sparkle paper towels, said Ken Haldin, a spokesman for [Georgia-
Pacific]ORG in [Atlanta]ORG.

170
Entity Tagging: Challenge
Consider the following four sentences:
Washington was born into slavery on the farm of James Burroughs.

Washington went up 2 games to 1 in the four-game series.

Blair arrived in Washington for what may well be his last state visit

In June, Washington passed a primary seatbelt law.

Type ambiguity!

171
Entity Tagging: Challenge
Consider the following four sentences:
[Washington]PER was born into slavery on the farm of James Burroughs.

[Washington]ORG went up 2 games to 1 in the four-game series.

Blair arrived in [Washington]LOC for what may well be his last state visit

In June, [Washington]GPE passed a primary seatbelt law.

do you see a challenge?

172
Relation Extraction
Relation Extraction is the task of predicting
attributes and relations for entities in a
sentence.
For example, given a sentence

Ronald Reagan was born in Tampico, Illinois.

a relation classifier aims at predicting the


relation of “bornInCity”.
173
Relation Extraction
Unstructured data (document) AFTER applying the Named Entity Recognition Process:
The fourth [Wells]ORG account moving to another agency is the packaged paper-products
division of [Georgia-Pacific Corp.]ORG, which arrived at [Wells]ORG only last fall. Like [Hertz]ORG
and the History Channel, it is also leaving for an Omnicom-owned agency, the [BBDO
South]ORG unit of [BBDO Worldwide]ORG. [BBDO South]ORG in [Atlanta]LOC, which handles
corporate advertising for Georgia-Pacific, will assume additional duties for brands like Angel
Soft toilet tissue and Sparkle paper towels, said Ken Haldin, a spokesman for [Georgia-
Pacific]ORG in [Atlanta]ORG.

[ENTITY] relation [ENTITY]


[ORG: BBDO South] in [LOC: Atlanta]
[ORG: Georgia-Pacific] in [LOC: Atlanta]
Structured data:
Organization Location
BBDO South Atlanta
Georgia-Pacific Atlanta

174
Coreference Resolution
Coreference resolution is the task of finding
all expressions that refer to the same entity
in a text.
It is an important step for a lot of higher
level NLP tasks that involve natural language
understanding such as document
summarization, question answering, and
information extraction.

175
Coreference Resolution

Source: https://huggingface.co/coref/

176
Putting it All Together

177
Embeddings as Input Features

Assumption:
“3-word sentences”

178
Embeddings as Input Features
features weights weights output
(features learned from data)
Embeddings

BINARY

answer

Input Hidden Output


layer layer layer

179
Neural Language Model

180
Voice Assistant: Alexa
Your home Amazon Alexa Cloud Service

Automated Speech Recognition (ASR) [Deep NN]

Transcription:
“Play Two Steps Behind by Def Leppard”

Natural Language Understanding (NLU) [Deep NN]

Extracted meaning:
Intent: PlayMusic | Artist: Def Leppard | Song: Two Steps Behind

Your home Dialog Manager Actions

Response: Action: PlayMusic


“Playing Two Steps Behind by Def Artist: Two Steps Behind
Leppard” Song: Def Leppard

Neural Text-To-Speech (NTTS) Amazon Music

181
Language Models: Application

we want to find predict the “rest” of the query

182
Language Modeling Example

Source: https://transformer.huggingface.co/doc/distil-gpt2

183
Language Models and Transformers

Source: https://deepai.org/machine-learning-model/text-generator

184
GPT

185
Generative Pre-trained Transformer 3
What is it?
Generative Pre-trained Transformer 3 (GPT-3) is an
autoregressive language model that uses deep learning to
produce human-like text. It is the third-generation
language prediction model in the GPT-n series (and the
successor to GPT-2) created by OpenAI, a San Francisco-
based artificial intelligence research laboratory. GPT-3's full
version has a capacity of 175 billion machine learning
parameters.

186
Exercise: Sentiment Analysis
https://monkeylearn.com/sentiment-analysis-
online/
https://text2data.com/Demo
http://text-processing.com/demo/sentiment/

187
Exercise: Speech-to-text
https://cloud.google.com/speech-to-text

188
Text Classification Example

Source: https://text2data.com/Demo

189
Information Extraction Example

Source: https://explosion.ai/demos/displacy-ent

190
Information Retrieval/Search Example

Source: https://www.google.com/

191
Text Summarization Example

Source: https://deepai.org/machine-learning-model/summarization

192
Machine Translation Example

193

You might also like