NLP - Unit 3

The document discusses auxiliary verbs and their subcategorization features, emphasizing the distinction between auxiliary and main verbs. It explains how context-free grammar can handle yes/no and wh-questions using specific rules and features like GAP. Additionally, it covers part-of-speech tagging, including the use of n-gram models to estimate probabilities for sequences of lexical categories.

Uploaded by

Mithun B N

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views11 pages

NLP - Unit 3

Uploaded by

Mithun B N

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Natural Language Processing

Unit 3: Grammars for Natural

Language
Mithun B N
Asst. Prof
Auxiliary verbs and verb phrases
• I can see the house
• I will have seen the house
• I was watching the movie
• I should have been watching the movie

• Auxiliary verbs have subcategorization features that restrict their verb

phrase complements.
• A clear distinction is made between auxiliary and main verbs.
Handling questions in context free
grammar
• Rule to handle yes/no questions.
• s[+inv]  (AUX AGR?a SUBCAT ?v) (NP AGR? a) (VP VFORM ?v)
• This enforces subject verb agreement between the AUX and the
subject NP, and ensures that the VP has the right VFORM to follow the
AUX.
• This rule is sufficient to handle yes/no questions.
Handling questions in context free
grammar
• A special feature GAP is introduced to handle wh-questions. This
feature is passed from mother to sub constituent until the
appropriate place for the gap is found in the sentence.
• The rule:
• (NP GAP (( CAT NP) (AGR ?a)) AGR ?a) ε
• It builds an NP from no input if the NP sought has a GAP feature that is set to an
NP. Furthermore, the AGR feature of this empty NP is set to the AGR feature of
the feature of the feature that is the value of the GAP.
Sl No Code Tagging
Part-of-Speech Abbreviation
– The Penn TreebankSlTagset
No Code Abbreviation
1 CC Coordinating conjuction 19 PPS Possessive pronoun
2 CD Cardinal number 20 RB Adverb
3 DT Determiner 21 RBR Comparative adverb
4 EX Existential there 22 RBS Superlative adverb
5 FW Foreign word 23 RP Particle
6 IN Preposition/subord.conj 24 SYM Symbol (math or scientific)
7 JJ Adjective 25 TO To
8 JJR Comparative adjective 26 UH Interjection
9 JJS Superlative adjective 27 VB Verb, base form
10 LS List item marker 28 VBD Verb. Past tense
11 MD Modal 29 VBG Verb. Gerund/pres. Participle
12 NN Noun, singlular or mass 30 VBN Verb, past participle
13 NNS Noun, plural 31 VBP Verb, non-3s, present
14 NNP Proper noun, singular 32 VBZ Verb, 3s present
15 NNPS Proper noun, plural 33 WDT Wh-Determiner
16 PDT Predeterminer 34 WP Wh-pronoun
17 POS Possessive ending 35 WPZ Possessive wh-pronoun
18 PRP Personal pronoun 36 WRB Wh-adverb
Part-of-speech tagging
• Consider the problem in its full generality. Let w1, w2, … wT be a
sequence of words. We want to find the sequence of lexical
categories C1, C2, . . . CT that maximises
• PROB(C1, C2, . . . CT | w1, w2, … wT) unfortunately it would take far too much
data to generate reasonable estimates for such sequences, so direct methods
cannot be applied.
• By applying Bayes’ rule, which says that this conditional probability equals
(PROB(C1, C2, . . . CT )* PROB (w1, w2, … wT)| C1, C2, . . . CT )) / PROB (w1, w2, …
wT)
• We are interested in finding C1, C2, … Cn that gives the maximum value, the
common denominator in all these cases will not affect the answer.
Part-of-speech tagging (Contd.,)
• The problem reduces to finding the sequence C1,C2,… Cn that maximizes the
formula. PROB (C1, C2, … CT) * PROB (W1, W2, … WT | C1, C2, …. CT)
• There are no effective methods for calculating the probability of these long
sequences accurately, as it would require far too much data.
• The BIGRAM model looks at pairs of categories (or words) and uses the
conditional probability that a category Ci will follow a category Ci-1, written as
PROB (Ci | Ci-1).
• The TRIGRAM model uses the conditional probability of one category (or
word) given the two preceding categories (or words) PROB(Ci|Ci-2|Ci-1).
• These models are called n-gram models, in which n represents the number of
words used in the pattern.
Part-of-speech tagging (Contd.,)
• Using BIGRAM the following approximation can be used:
• PROB(C1, C2, … CT) approximately = Πi=1,T PROB(Ci|Ci-1)
• To account for the beginning of a sentence, we posit a pseudo
category φ at position 0 as the value of C0.
• The first bigram for a sentence beginning with an ART would be PROB
(ART|φ).
• Approximation of the probability of the sequence ART N V N using
bigrams would be
• PROB (ART N V N) approximately = (PROB (ART |φ ) * PROB(N|ART) * PROB (V|N) * PROB
(N|V)
Part-of-speech tagging (Contd.,)
• The second probability in formula 3, PROB (W1, W2, … WT | C1, C2,
…. CT) can be approximated by assuming that a word appears in a
category independent of the words in the preceding or succeeding
categories.
• It is approximated by the product of the probability that each word
occurs in the indicated part of speech, i.e.,
• PROB (w1, .. WT | C1, … CT) approximately = Πi=1,T PROB(Wi|Ci)
• By these approximations, the problem has changed into finding the
sequence C1, .. CT that maximizes the value of Πi=1,T PROB (Ci|Ci-
1)*PROB (wi|Ci)
Part-of-speech tagging (Contd.,)
• By this new formula probabilities involved can be readily estimated
from a corpus of text labelled with parts of speech.
• In a database of text, the bigram probabilities can be estimated
simply by counting the number of times each pair of categories occur
compared to the individual category.
• The probability that a V follows an N would be estimated as follows:
Part-of-speech tagging (Contd.,)
Category Count at ‘i' Pair Count at i,i+1 Bigram Estimate
φ 300 Φ, ART 213 PROB (ART|N) .71
φ 300 Φ, N 87 PROB (N | φ) .29
ART 558 ART, N 558 PROB (N |ART) 1
N 833 N,V 358 PROB(V|N) .43
N 833 N,N 108 PROB(N|N) .13
N 833 N,P 366 PROB(P|N) .44
V 300 V,N 75 PROB(N|V) .35
V 300 V,ART 194 PROB(ART|V) .65
P 307 P,ART 226 PROB(ART|P) .74
P 307 P, N 81 PROB (N|P) .26c5

Time To Talk Intermediate b1 S
70% (10)
Time To Talk Intermediate b1 S
83 pages
OTP2 e Teachers Kit
50% (2)
OTP2 e Teachers Kit
80 pages
CME4408 P5 N-Grams Smooting
No ratings yet
CME4408 P5 N-Grams Smooting
43 pages
Chapter Four 1
No ratings yet
Chapter Four 1
91 pages
NLP-Lectures 4,5,6
No ratings yet
NLP-Lectures 4,5,6
85 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
NLP CIE 1 Scheme and Solutions
No ratings yet
NLP CIE 1 Scheme and Solutions
5 pages
Lecture04-Ngram Lang Models
No ratings yet
Lecture04-Ngram Lang Models
39 pages
Lecture 4 N Grams
No ratings yet
Lecture 4 N Grams
29 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
Chapter 5
No ratings yet
Chapter 5
22 pages
Lecture 03
No ratings yet
Lecture 03
41 pages
Language Models L3-6
No ratings yet
Language Models L3-6
49 pages
IJISRT18DC138
No ratings yet
IJISRT18DC138
6 pages
Analysis of Statistical Parsing in Natural Language Processing
No ratings yet
Analysis of Statistical Parsing in Natural Language Processing
6 pages
End Sem Answer Key 2023
No ratings yet
End Sem Answer Key 2023
4 pages
NLP CH 2
No ratings yet
NLP CH 2
59 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
3 cs626 Pos Tagging Week of 8aug22
No ratings yet
3 cs626 Pos Tagging Week of 8aug22
27 pages
Ngrams
100% (1)
Ngrams
22 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Lec PoS Tagging 2022
No ratings yet
Lec PoS Tagging 2022
67 pages
Cs383 Lecture16 PDF
No ratings yet
Cs383 Lecture16 PDF
46 pages
NLP Lecture Notes Unit II Bayes Theorem and Grammar Analysis
No ratings yet
NLP Lecture Notes Unit II Bayes Theorem and Grammar Analysis
32 pages
F15 CS194 Lec 05 Natural Language
No ratings yet
F15 CS194 Lec 05 Natural Language
69 pages
Pos Tagging Pushpak
No ratings yet
Pos Tagging Pushpak
88 pages
N Grams
No ratings yet
N Grams
13 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
NLP Unit 5
No ratings yet
NLP Unit 5
10 pages
Natural Language Processing
No ratings yet
Natural Language Processing
44 pages
Unit 5
No ratings yet
Unit 5
26 pages
Language Models: CS6370: Natural Language Processing
No ratings yet
Language Models: CS6370: Natural Language Processing
35 pages
Slp14 Handout s17hw
No ratings yet
Slp14 Handout s17hw
71 pages
CH 6. Applications of AI-NLP
No ratings yet
CH 6. Applications of AI-NLP
65 pages
NLP - Unit 2
No ratings yet
NLP - Unit 2
14 pages
NLP UNIT III (Part 1)
No ratings yet
NLP UNIT III (Part 1)
15 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
N-Gram Language Models Explained
No ratings yet
N-Gram Language Models Explained
13 pages
Linguistics & N-Gram Models
No ratings yet
Linguistics & N-Gram Models
47 pages
NLP Week4 Ngrams
No ratings yet
NLP Week4 Ngrams
60 pages
Lecture 04
No ratings yet
Lecture 04
42 pages
Chapter 6-NLP
No ratings yet
Chapter 6-NLP
8 pages
Module - 2 - Test Portion
No ratings yet
Module - 2 - Test Portion
33 pages
Equation Sheet
No ratings yet
Equation Sheet
4 pages
Unit 5 Notes Final
No ratings yet
Unit 5 Notes Final
14 pages
NLP Lec 11
No ratings yet
NLP Lec 11
6 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
16 pages
5 Marks
No ratings yet
5 Marks
13 pages
Sepe A POS Tagger For Spanish
No ratings yet
Sepe A POS Tagger For Spanish
10 pages
POS HMM Viterbi Algo 2025
No ratings yet
POS HMM Viterbi Algo 2025
52 pages
Language Models & N-Gram Analysis
No ratings yet
Language Models & N-Gram Analysis
41 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Language Modelling
No ratings yet
Language Modelling
17 pages
Ai Unit 5
No ratings yet
Ai Unit 5
16 pages
Unit Vapplications Notes
No ratings yet
Unit Vapplications Notes
13 pages
Lecture05-Hmm Pos Tagging
No ratings yet
Lecture05-Hmm Pos Tagging
38 pages
Ngrams
No ratings yet
Ngrams
22 pages
N-Grams - Text Representation
No ratings yet
N-Grams - Text Representation
23 pages
Gabbar 2025 Update
No ratings yet
Gabbar 2025 Update
15 pages
NLP & Grammar for Developers
No ratings yet
NLP & Grammar for Developers
19 pages
Language Modeling Lecture1112
No ratings yet
Language Modeling Lecture1112
47 pages
A Study of Reading Strategies
No ratings yet
A Study of Reading Strategies
126 pages
Amdework Asefa Belay
No ratings yet
Amdework Asefa Belay
119 pages
Characteristics of Technical Communication Style
No ratings yet
Characteristics of Technical Communication Style
4 pages
First Conditional PowerPoint Lesson
No ratings yet
First Conditional PowerPoint Lesson
29 pages
Nun Sakin Tanween 97
No ratings yet
Nun Sakin Tanween 97
28 pages
55 Critical Vocabulary Words
No ratings yet
55 Critical Vocabulary Words
3 pages
Unit 1 - CONDITIONAL SENTENCE - Indonesia
No ratings yet
Unit 1 - CONDITIONAL SENTENCE - Indonesia
39 pages
Nouns: Countable vs. Uncountable
No ratings yet
Nouns: Countable vs. Uncountable
5 pages
Present Continuous & Imperatives
100% (2)
Present Continuous & Imperatives
79 pages
1A - Word Order in Question - 1-7
No ratings yet
1A - Word Order in Question - 1-7
8 pages
Aspect Based Sentiment Analysis in Urdu Language: Resource Creation and Evaluation
No ratings yet
Aspect Based Sentiment Analysis in Urdu Language: Resource Creation and Evaluation
17 pages
Conditionals 0 1 2 3
No ratings yet
Conditionals 0 1 2 3
2 pages
Coding & Logic Test for Students
No ratings yet
Coding & Logic Test for Students
7 pages
Some Grammatical Issues On Old Pekingese and Early Northern Mandarin Dialects - With New Comments On Western and Native Documents.
No ratings yet
Some Grammatical Issues On Old Pekingese and Early Northern Mandarin Dialects - With New Comments On Western and Native Documents.
46 pages
Gerund
No ratings yet
Gerund
20 pages
Iconicity in Language - An Encyclopaedic Dictionary
100% (1)
Iconicity in Language - An Encyclopaedic Dictionary
479 pages
STD Vii 1ST Evaluation Syllabus 2025-26
No ratings yet
STD Vii 1ST Evaluation Syllabus 2025-26
1 page
Sociolibguistics Mid Term
No ratings yet
Sociolibguistics Mid Term
2 pages
Multi-Word Verbs Explained
No ratings yet
Multi-Word Verbs Explained
12 pages
Text A Text B Text C Text D: What Is The Text About? (Subject/focus)
75% (12)
Text A Text B Text C Text D: What Is The Text About? (Subject/focus)
2 pages
Lesson1-1 Paying A New Year Call To A Friend
No ratings yet
Lesson1-1 Paying A New Year Call To A Friend
25 pages
Ingles Práctica
No ratings yet
Ingles Práctica
5 pages
4 - Group Activity No. 1.1 - English
No ratings yet
4 - Group Activity No. 1.1 - English
3 pages
Bài Tập Thì Hiện Tại Hoàn Thành: EX 1: Give the correct form of the verbs
No ratings yet
Bài Tập Thì Hiện Tại Hoàn Thành: EX 1: Give the correct form of the verbs
5 pages
Lesson 23 Exercises Henle FYL
100% (1)
Lesson 23 Exercises Henle FYL
10 pages
Transitive and Intransitive Verbs PDF
100% (1)
Transitive and Intransitive Verbs PDF
2 pages
Present Perfect Tense Guide
No ratings yet
Present Perfect Tense Guide
17 pages
ENGLISH 9 Q1 Week 1 2
No ratings yet
ENGLISH 9 Q1 Week 1 2
10 pages

NLP - Unit 3

Uploaded by

NLP - Unit 3

Uploaded by

Natural Language Processing

Unit 3: Grammars for Natural

• Auxiliary verbs have subcategorization features that restrict their verb

You might also like