Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
38 views11 pages

NLP - Unit 3

The document discusses auxiliary verbs and their subcategorization features, emphasizing the distinction between auxiliary and main verbs. It explains how context-free grammar can handle yes/no and wh-questions using specific rules and features like GAP. Additionally, it covers part-of-speech tagging, including the use of n-gram models to estimate probabilities for sequences of lexical categories.

Uploaded by

Mithun B N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views11 pages

NLP - Unit 3

The document discusses auxiliary verbs and their subcategorization features, emphasizing the distinction between auxiliary and main verbs. It explains how context-free grammar can handle yes/no and wh-questions using specific rules and features like GAP. Additionally, it covers part-of-speech tagging, including the use of n-gram models to estimate probabilities for sequences of lexical categories.

Uploaded by

Mithun B N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Natural Language Processing

Unit 3: Grammars for Natural


Language
Mithun B N
Asst. Prof
Auxiliary verbs and verb phrases
• I can see the house
• I will have seen the house
• I was watching the movie
• I should have been watching the movie

• Auxiliary verbs have subcategorization features that restrict their verb


phrase complements.
• A clear distinction is made between auxiliary and main verbs.
Handling questions in context free
grammar
• Rule to handle yes/no questions.
• s[+inv]  (AUX AGR?a SUBCAT ?v) (NP AGR? a) (VP VFORM ?v)
• This enforces subject verb agreement between the AUX and the
subject NP, and ensures that the VP has the right VFORM to follow the
AUX.
• This rule is sufficient to handle yes/no questions.
Handling questions in context free
grammar
• A special feature GAP is introduced to handle wh-questions. This
feature is passed from mother to sub constituent until the
appropriate place for the gap is found in the sentence.
• The rule:
• (NP GAP (( CAT NP) (AGR ?a)) AGR ?a) ε
• It builds an NP from no input if the NP sought has a GAP feature that is set to an
NP. Furthermore, the AGR feature of this empty NP is set to the AGR feature of
the feature of the feature that is the value of the GAP.
Sl No Code Tagging
Part-of-Speech Abbreviation
– The Penn TreebankSlTagset
No Code Abbreviation
1 CC Coordinating conjuction 19 PPS Possessive pronoun
2 CD Cardinal number 20 RB Adverb
3 DT Determiner 21 RBR Comparative adverb
4 EX Existential there 22 RBS Superlative adverb
5 FW Foreign word 23 RP Particle
6 IN Preposition/subord.conj 24 SYM Symbol (math or scientific)
7 JJ Adjective 25 TO To
8 JJR Comparative adjective 26 UH Interjection
9 JJS Superlative adjective 27 VB Verb, base form
10 LS List item marker 28 VBD Verb. Past tense
11 MD Modal 29 VBG Verb. Gerund/pres. Participle
12 NN Noun, singlular or mass 30 VBN Verb, past participle
13 NNS Noun, plural 31 VBP Verb, non-3s, present
14 NNP Proper noun, singular 32 VBZ Verb, 3s present
15 NNPS Proper noun, plural 33 WDT Wh-Determiner
16 PDT Predeterminer 34 WP Wh-pronoun
17 POS Possessive ending 35 WPZ Possessive wh-pronoun
18 PRP Personal pronoun 36 WRB Wh-adverb
Part-of-speech tagging
• Consider the problem in its full generality. Let w1, w2, … wT be a
sequence of words. We want to find the sequence of lexical
categories C1, C2, . . . CT that maximises
• PROB(C1, C2, . . . CT | w1, w2, … wT) unfortunately it would take far too much
data to generate reasonable estimates for such sequences, so direct methods
cannot be applied.
• By applying Bayes’ rule, which says that this conditional probability equals
(PROB(C1, C2, . . . CT )* PROB (w1, w2, … wT)| C1, C2, . . . CT )) / PROB (w1, w2, …
wT)
• We are interested in finding C1, C2, … Cn that gives the maximum value, the
common denominator in all these cases will not affect the answer.
Part-of-speech tagging (Contd.,)
• The problem reduces to finding the sequence C1,C2,… Cn that maximizes the
formula. PROB (C1, C2, … CT) * PROB (W1, W2, … WT | C1, C2, …. CT)
• There are no effective methods for calculating the probability of these long
sequences accurately, as it would require far too much data.
• The BIGRAM model looks at pairs of categories (or words) and uses the
conditional probability that a category Ci will follow a category Ci-1, written as
PROB (Ci | Ci-1).
• The TRIGRAM model uses the conditional probability of one category (or
word) given the two preceding categories (or words) PROB(Ci|Ci-2|Ci-1).
• These models are called n-gram models, in which n represents the number of
words used in the pattern.
Part-of-speech tagging (Contd.,)
• Using BIGRAM the following approximation can be used:
• PROB(C1, C2, … CT) approximately = Πi=1,T PROB(Ci|Ci-1)
• To account for the beginning of a sentence, we posit a pseudo
category φ at position 0 as the value of C0.
• The first bigram for a sentence beginning with an ART would be PROB
(ART|φ).
• Approximation of the probability of the sequence ART N V N using
bigrams would be
• PROB (ART N V N) approximately = (PROB (ART |φ ) * PROB(N|ART) * PROB (V|N) * PROB
(N|V)
Part-of-speech tagging (Contd.,)
• The second probability in formula 3, PROB (W1, W2, … WT | C1, C2,
…. CT) can be approximated by assuming that a word appears in a
category independent of the words in the preceding or succeeding
categories.
• It is approximated by the product of the probability that each word
occurs in the indicated part of speech, i.e.,
• PROB (w1, .. WT | C1, … CT) approximately = Πi=1,T PROB(Wi|Ci)
• By these approximations, the problem has changed into finding the
sequence C1, .. CT that maximizes the value of Πi=1,T PROB (Ci|Ci-
1)*PROB (wi|Ci)
Part-of-speech tagging (Contd.,)
• By this new formula probabilities involved can be readily estimated
from a corpus of text labelled with parts of speech.
• In a database of text, the bigram probabilities can be estimated
simply by counting the number of times each pair of categories occur
compared to the individual category.
• The probability that a V follows an N would be estimated as follows:
Part-of-speech tagging (Contd.,)
Category Count at ‘i' Pair Count at i,i+1 Bigram Estimate
φ 300 Φ, ART 213 PROB (ART|N) .71
φ 300 Φ, N 87 PROB (N | φ) .29
ART 558 ART, N 558 PROB (N |ART) 1
N 833 N,V 358 PROB(V|N) .43
N 833 N,N 108 PROB(N|N) .13
N 833 N,P 366 PROB(P|N) .44
V 300 V,N 75 PROB(N|V) .35
V 300 V,ART 194 PROB(ART|V) .65
P 307 P,ART 226 PROB(ART|P) .74
P 307 P, N 81 PROB (N|P) .26c5

You might also like