Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
36 views33 pages

Constituency Parsing PPT 2

The document provides an overview of constituency parsing in natural language processing, covering key concepts such as context-free grammars (CFGs), probabilistic context-free grammars (PCFGs), and the CKY algorithm. It discusses the importance of syntactic structure for understanding language, the challenges of ambiguity in parsing, and the evaluation metrics for parsing accuracy. Additionally, it highlights the limitations of PCFGs and introduces lexicalized PCFGs as a solution to improve parsing performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views33 pages

Constituency Parsing PPT 2

The document provides an overview of constituency parsing in natural language processing, covering key concepts such as context-free grammars (CFGs), probabilistic context-free grammars (PCFGs), and the CKY algorithm. It discusses the importance of syntactic structure for understanding language, the challenges of ambiguity in parsing, and the evaluation metrics for parsing accuracy. Additionally, it highlights the limitations of PCFGs and introduces lexicalized PCFGs as a solution to improve parsing performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

COS 484: Natural Language Processing

Constituency Parsing

Fall 2019

(Some slides adapted from Chris Manning, Mike Collins)


Overview

• Constituency structure vs dependency structure


• Context-free grammar (CFG)
• Probabilistic context-free grammar (PCFG)
• The CKY algorithm
• Evaluation
• Lexicalized PCFGs
Syntactic structure: constituency and dependency

Two views of linguistic structure


• Constituency
• = phrase structure grammar
• = context-free grammars (CFGs)
• Dependency
Constituency structure
• Phrase structure organizes words into nested constituents

• Starting units: words are given a category: part-of-speech tags

the, cuddly, cat, by, the, door


Det, Adj, N, P, Det, N

• Words combine into phrases with categories


the cuddly cat, by the door
NP→Det Adj N PP→P NP

• Phrases can combine into bigger phrases recursively


the cuddly cat by the door
NP→ NP PP
This Thursday
Dependency structure
• Dependency structure shows which words depend on (modify or
are arguments of) which other words.

nmod

nsubj dobj case

Satellites spot whales from space

Satellites spot whales from space



Why do we need sentence structure?

• We need to understand sentence structure in order to be able to


interpret language correctly

• Human communicate complex ideas by composing words together


into bigger units

• We need to know what is connected to what


Syntactic parsing
• Syntactic parsing is the task of recognizing a sentence and
assigning a structure to it.

Input: Output:

Beoing is located in Seattle.


Syntactic parsing
• Used as intermediate representation for downstream applications
English word order: subject — verb — object
Japanese word order: subject — object — verb

Image credit: http://vas3k.com/blog/machine_translation/


Syntactic parsing
• Used as intermediate representation for downstream applications

Image credit: (Zhang et al, 2018)


Context-free grammars

• The most widely used formal system for modeling


constituency structure in English and other natural languages

• A context free grammar G = (N, Σ, R, S) where


• N is a set of non-terminal symbols
• Σ is a set of terminal symbols
• R is a set of rules of the form X → Y1Y2…Yn for n ≥ 1,
X ∈ N, Yi ∈ (N ∪ Σ)
• S ∈ N is a distinguished start symbol
A Context-Free Grammar for English

Grammar Lexicon

S:sentence, VP:verb phrase, NP: noun phrase, PP:prepositional phrase,


DT:determiner, Vi:intransitive verb, Vt:transitive verb, NN: noun, IN:preposition
(Left-most) Derivations

• Given a CFG G, a left-most derivation is a sequence of strings


s1, s2, …, sn, where

• s1 = S
• sn ∈ Σ*: all possible strings made up of words from Σ
• Each si for i = 2,…, n is derived from si−1 by picking the left-most
non-terminal X in si−1 and replacing it by some β where X → β ∈ R

• sn: yield of the derivation


(Left-most) Derivations
• s1 = S
• s2 = NP VP
• s3 = DT NN VP
• s4 = the NN VP
• s5 = the man VP
• s6 = the man Vi
• s7 = the man sleeps
A derivation can be represented as a parse tree!

• A string s ∈ Σ* is in the language defined by the CFG if


there is at least one derivation whose yield is s

• The set of possible derivations may be finite or infinite


Ambiguity
• Some strings may have more than one derivations (i.e. more
than one parse trees!).
“Classical” NLP Parsing
• In fact, sentences can have a very large number of possible parses

The board approved [its acquisition] [by Royal Trustco Ltd.] [of
Toronto] [for $27 a share] [at its monthly meeting].

((ab)c)d (a(bc))d (ab)(cd) a((bc)d) a(b(cd))


1
n+1 (n)
Catalan number: Cn = 2n

• It is also difficult to construct a grammar with enough coverage


• A less constrained grammar can parse more sentences but
result in more parses for even simple sentences
• There is no way to choose the right parse!
Statistical parsing
• Learning from data: treebanks

• Adding probabilities to the rules: probabilistic CFGs (PCFGs)

Treebanks: a collection of sentences paired with their parse trees

The Penn Treebank Project (Marcus et al, 1993)


Treebanks
• Standard setup (WSJ portion of Penn Treebank):
• 40,000 sentences for training
• 1,700 for development
• 2,400 for testing

• Why building a treebank instead of a grammar?

• Broad coverage
• Frequencies and distributional information

• A way to evaluate systems


Probabilistic context-free grammars (PCFGs)

• A probabilistic context-free grammar (PCFG) consists of:

• A context-free grammar: G = (N, Σ, R, S)

• For each rule α → β ∈ R, there is a parameter q(α → β) ≥ 0.


For any X ∈ N,


q(α → β) = 1
α→β:α=X
Probabilistic context-free grammars (PCFGs)
For any derivation (parse tree) containing rules:
α1 → β1, α2 → β2, …, αl → βl, the probability of the parse is:
l


q(αi → βi)
i=1

P(t) = q(S → NP VP) × q(NP → DT NN) × q(DT → the)


× q(NN → man) × q(VP → Vi) × q(Vi → sleeps)

= 1.0 × 0.3 × 1.0 × 0.7 × 0.4 × 1.0 = 0.084


Why do we want q(α → β) = 1?
α→β:α=X
Deriving a PCFG from a treebank
• Training data: a set of parse trees t1, t2, …, tm

• A PCFG (N, Σ, S, R, q):


• N is the set of all non-terminals seen in the trees
• Σ is the set of all words seen in the trees
• S is taken to be S.
• R is taken to be the set of all rules α → β seen in the trees
• The maximum-likelihood parameter estimates are:
Count(α → β)
qML(α → β) =
Count(α)

If we have seen the rule VP → Vt NP 105 times, and the the non-terminal
VP 1000 times, q(VP → Vt NP) = 0.105
Parsing with PCFGs
• Given a sentence s and a PCFG, how to find the highest scoring
parse tree for s?
argmaxt∈𝒯(s)P(t)

• The CKY algorithm: applies to a PCFG in Chomsky normal


form (CNF)

• Chomsky Normal Form (CNF): all the rules take one


of the two following forms:

• X → Y1Y2 where X ∈ N, Y1 ∈ N, Y2 ∈ N
• X → Y where X ∈ N, Y ∈ Σ
• It is possible to convert any PCFG into an equivalent grammar in CNF!
• However, the trees will look differently; It is possible to do “reverse
transformation”
Converting PCFGs into a CNF grammar
• n-ary rules (n > 2): NP → DT NNP VBG NN

• Unary rules: VP → Vi, Vi → sleeps

• Eliminate all the unary rules recursively by adding VP → sleeps

• We will come back to this later!


The CKY algorithm

• Dynamic programming

• Given a sentence x1, x2, …, xn, denote π(i, j, X) as the highest score
for any parse tree that dominates words xi, …, xj and has non-
terminal X ∈ N as its root.

• Output: π(1,n, S)

• Initially, for i = 1,2,…, n,

{0
q(X → xi) if X → xi ∈ R
π(i, i, X) =
otherwise
The CKY algorithm
• For all (i, j) such that 1 ≤ i < j ≤ n for all X ∈ N,

π(i, j, X) = max q(X → YZ) × π(i, k, Y ) × π(k + 1,j, Z)


X→YZ∈R,i≤k<j

Also stores backpointers which allow us to recover the parse tree


The CKY algorithm

Running time?
O(n 3 | R | )
CKY with unary rules
• In practice, we also allow unary rules:

X → Y where X, Y ∈ N
conversion to/from the normal form is easier

How does this change CKY?


π(i, j, X) = max q(X → Y ) × π(i, j, Y )
X→Y∈R

• Compute unary closure: if there is a rule chain


X → Y1, Y1 → Y2, …, Yk → Y, add
q(X → Y ) = q(X → Y1) × ⋯ × q(Yk → Y )

• Update unary rule once after the binary rules


Evaluating constituency parsing
Evaluating constituency parsing

• Recall: (# correct constituents in candidate) / (# constituents in


gold tree)
• Precision: (# correct constituents in candidate) / (# constituents in
candidate)
• Labeled precision/recall require getting the non-terminal label
correct
• F1 = (2 * precision * recall) / (precision + recall)
Evaluating constituency parsing

• Precision: 3/7 = 42.9%


• Recall: 3/8 = 37.5%
• F1 = 40.0%
• Tagging accuracy: 100%
Weaknesses of PCFGs
• Lack of sensitivity to lexical information (words)

The only difference between these two parses:


q(VP → VP PP) vs q(NP → NP PP)
… without looking at the words!
Weaknesses of PCFGs
• Lack of sensitivity to lexical information (words)

Exactly the same set of context-free rules!


Lexicalized PCFGs
• Key idea: add headwords to trees

• Each context-free rule has one special child that is the


head of the rule (a core idea in syntax)
Lexicalized PCFGs

• Further reading: Michael Collins. 2003. Head-Driven


Statistical Models for Natural Language Parsing.

• Results for a PCFG: 70.6% recall, 74.8% precision

• Results for a lexicalized PCFG: 88.1% recall, 88.3% precision

http://nlpprogress.com/english/constituency_parsing.html

You might also like