Catching Up

The document discusses various techniques in natural language processing (NLP), focusing on the Porter Stemmer for morphological analysis and statistical POS tagging methods such as Brill Tagging. It emphasizes the importance of evaluation methodologies, including the use of gold standards and error analysis to improve tagging accuracy. The document also addresses challenges like tag indeterminacy and handling unknown words in tagging processes.

Uploaded by

Dr. Lakshmi Praveena Bellamkonda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views11 pages

Catching Up

Uploaded by

Dr. Lakshmi Praveena Bellamkonda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 11

Catching Up

CS 4705

CS 4705 1
Porter Stemmer (1980)

• Used for tasks in which you only care about the stem
– IR, modeling given/new distinction, topic detection, document
similarity
• Lexicon-free morphological analysis
• Cascades rewrite rules (e.g. misunderstanding -->
misunderstand --> understand --> …)
• Easily implemented as an FST with rules e.g.
– ATIONAL  ATE
– ING  ε
• Not perfect ….
– Doing  doe

2
• Policy  police
• Does stemming help?
– IR, little
– Topic detection, more

3
Statistical POS Tagging

4
Brill Tagging: TBL

• Start with simple (less accurate) rules…learn

better ones from tagged corpus
– Tag each word initially with most likely POS
– Examine set of transformations to see which improves
tagging decisions compared to tagged corpus
– Re-tag corpus
– Repeat until, e.g., performance doesn’t improve
– Result: tagging procedure which can be applied to new,
untagged text

5
An Example

The horse raced past the barn fell.

The/DT horse/NN raced/VBN past/IN the/DT
barn/NN fell/VBD ./.
1) Tag every word with most likely tag and score
The/DT horse/NN raced/VBD past/NN the/DT
barn/NN fell/VBD ./.
2) For each template, try every instantiation (e.g.
Change VBN to VBD when the preceding word is
tagged NN, add rule to ruleset, retag corpus, and
score
6
3) Stop when no transformation improves score
4) Result: set of transformation rules which can be
applied to new, untagged data (after initializing
with most common tag)
….What problems will this process run into?

7
Methodology: Evaluation

• For any NLP problem, we need to know how to

evaluate our solutions
• Possible Gold Standards -- ceiling:
– Annotated naturally occurring corpus
– Human task performance (96-7%)
• How well do humans agree?
• Kappa statistic: avg pairwise agreement
corrected for chance agreement
– Can be hard to obtain for some tasks:
sometimes humans don’t agree
8
• Baseline: how well does simple method do?
– For tagging, most common tag for each word (91%)
– How much improvement do we get over baseline?

9
Methodology: Error Analysis

• Confusion matrix:
– E.g. which tags did we most often confuse with
which other tags?
– How much of the overall error does each
confusion account for?

10
More Complex Issues

Essay Structures & Phrases Guide
100% (1)
Essay Structures & Phrases Guide
16 pages
Selfie
No ratings yet
Selfie
4 pages
423/723 Natural Language Processing: Assignment 1
No ratings yet
423/723 Natural Language Processing: Assignment 1
4 pages
Natural Language processing-Regular-HO
No ratings yet
Natural Language processing-Regular-HO
10 pages
Speech Recognition Systems Guide
No ratings yet
Speech Recognition Systems Guide
13 pages
World Literature 1
No ratings yet
World Literature 1
54 pages
Lecture 16-17-18-19
No ratings yet
Lecture 16-17-18-19
42 pages
Masters of Russian Song (c1917) (Vol 2)
86% (7)
Masters of Russian Song (c1917) (Vol 2)
128 pages
Linguistic Insights in POS Tagging
No ratings yet
Linguistic Insights in POS Tagging
6 pages
Application Log, Deletion of Logs (BALDAT Management and Utilisation)
No ratings yet
Application Log, Deletion of Logs (BALDAT Management and Utilisation)
6 pages
Natural Language Processing From Scratch
No ratings yet
Natural Language Processing From Scratch
45 pages
Web Search Indexing Assignment
No ratings yet
Web Search Indexing Assignment
2 pages
Part of Speech Tagging Using A Hybrid System: Finney@cs - Swarthmore.edu Mark@cs - Swarthmore.edu
No ratings yet
Part of Speech Tagging Using A Hybrid System: Finney@cs - Swarthmore.edu Mark@cs - Swarthmore.edu
7 pages
Immediate Future - Going To
No ratings yet
Immediate Future - Going To
7 pages
Gector - Grammatical Error Correction: Tag, Not Rewrite
No ratings yet
Gector - Grammatical Error Correction: Tag, Not Rewrite
8 pages
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
No ratings yet
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
40 pages
Multi-Tagging For Transition-Based Dependency Parsing
No ratings yet
Multi-Tagging For Transition-Based Dependency Parsing
10 pages
Part-Of-Speech (POS) Tagging
No ratings yet
Part-Of-Speech (POS) Tagging
53 pages
Part of Speech Tagging and Hidden Markov Models
No ratings yet
Part of Speech Tagging and Hidden Markov Models
24 pages
Vtune Profiler - Cookbook - 2023.0 766316 766317
No ratings yet
Vtune Profiler - Cookbook - 2023.0 766316 766317
323 pages
WannaCry Ransomware Analysis
No ratings yet
WannaCry Ransomware Analysis
15 pages
2.1 Rule Based POS Tagging
No ratings yet
2.1 Rule Based POS Tagging
5 pages
Week6 Lab Pos Tagging
No ratings yet
Week6 Lab Pos Tagging
3 pages
Automated Question Tagging Using Machine Learning: Volume:03/Issue:06/June-2021 Impact Factor-5.354
No ratings yet
Automated Question Tagging Using Machine Learning: Volume:03/Issue:06/June-2021 Impact Factor-5.354
6 pages
Software Manual MAS-100 NT & NT Ex en V14.0
No ratings yet
Software Manual MAS-100 NT & NT Ex en V14.0
56 pages
Explain in Detail Rule Based POS Tagging
No ratings yet
Explain in Detail Rule Based POS Tagging
12 pages
Presentation1 Ktu
No ratings yet
Presentation1 Ktu
111 pages
English (Long Term Plan)
No ratings yet
English (Long Term Plan)
33 pages
5 Sequence Learning
No ratings yet
5 Sequence Learning
50 pages
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
No ratings yet
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
86 pages
Alice in Wonderland - A Critique Paper
No ratings yet
Alice in Wonderland - A Critique Paper
2 pages
Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
94 pages
Preprocessing NLTK
No ratings yet
Preprocessing NLTK
5 pages
Infographic PDF About Teaching Strategies To English Skills
No ratings yet
Infographic PDF About Teaching Strategies To English Skills
2 pages
Apex Institute of Technology Natural Language Processing (20CST354)
No ratings yet
Apex Institute of Technology Natural Language Processing (20CST354)
43 pages
S1 Chp1 Slides
No ratings yet
S1 Chp1 Slides
8 pages
POS Tagging for NLP Enthusiasts
No ratings yet
POS Tagging for NLP Enthusiasts
47 pages
POS Tagging: Introduction: Heng Ji
No ratings yet
POS Tagging: Introduction: Heng Ji
35 pages
3.1 Chap NLP Pos - Tagging - Lecture3
No ratings yet
3.1 Chap NLP Pos - Tagging - Lecture3
38 pages
JOHN KEATS AND THE CULTURE OF DISSENT 2nd Edition Nicholas Roe - The Full Ebook Set Is Available With All Chapters For Download
100% (1)
JOHN KEATS AND THE CULTURE OF DISSENT 2nd Edition Nicholas Roe - The Full Ebook Set Is Available With All Chapters For Download
86 pages
Cme4408 p6 Pos Tagging
No ratings yet
Cme4408 p6 Pos Tagging
33 pages
Lec 10
No ratings yet
Lec 10
77 pages
Lecture6 2022
No ratings yet
Lecture6 2022
101 pages
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
No ratings yet
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
108 pages
Lecture 5
No ratings yet
Lecture 5
56 pages
Lim 2014 Manichaeans and Public Disputation in Late Antiquity
No ratings yet
Lim 2014 Manichaeans and Public Disputation in Late Antiquity
40 pages
Lecture Part of Speech Tagging
No ratings yet
Lecture Part of Speech Tagging
41 pages
Lec3-Posner Intro
No ratings yet
Lec3-Posner Intro
30 pages
2025-NLP-Lecture 05 - Sequence Labeling For Parts of Speech and Name Entities
No ratings yet
2025-NLP-Lecture 05 - Sequence Labeling For Parts of Speech and Name Entities
69 pages
Part of Speech Tagging
No ratings yet
Part of Speech Tagging
13 pages
3.word Level Analysis-Tokenization Stemming
No ratings yet
3.word Level Analysis-Tokenization Stemming
8 pages
Module-2 NLP
No ratings yet
Module-2 NLP
50 pages
Understanding Each Pre-Processing Aspect
No ratings yet
Understanding Each Pre-Processing Aspect
5 pages
Unit 1
No ratings yet
Unit 1
101 pages
Ai TXT Unit4
No ratings yet
Ai TXT Unit4
39 pages
Liturgical Cycle
No ratings yet
Liturgical Cycle
29 pages
CPU Scheduling Explained
No ratings yet
CPU Scheduling Explained
20 pages
4 Pos
No ratings yet
4 Pos
62 pages
NLP Learning Materials 1
No ratings yet
NLP Learning Materials 1
28 pages
BAB 9 Matroid
No ratings yet
BAB 9 Matroid
15 pages
Neural Net
No ratings yet
Neural Net
62 pages
Rule Based POS Tagging Example
No ratings yet
Rule Based POS Tagging Example
4 pages
Ms INDJ 131431
No ratings yet
Ms INDJ 131431
12 pages
English 8 Quarter 1 Concept Notes 1
No ratings yet
English 8 Quarter 1 Concept Notes 1
18 pages
NLP: Spelling Correction & QA
No ratings yet
NLP: Spelling Correction & QA
80 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
36 pages
NLPChapter 3
No ratings yet
NLPChapter 3
14 pages
II ND Unit NLP
No ratings yet
II ND Unit NLP
21 pages
BAI601 All Modules VTU 10 Mark Complete
No ratings yet
BAI601 All Modules VTU 10 Mark Complete
18 pages
DLL MATH-2 Week8 Q2 Final
No ratings yet
DLL MATH-2 Week8 Q2 Final
8 pages
Tagging and Its Types
No ratings yet
Tagging and Its Types
3 pages
Migration Tool Office Plus Templates
No ratings yet
Migration Tool Office Plus Templates
8 pages
Lecture#11 (POS Tagging)
No ratings yet
Lecture#11 (POS Tagging)
19 pages
Be4 A 17 NLP Exp6
No ratings yet
Be4 A 17 NLP Exp6
4 pages
Gender Autonomy As A Feminist Premise of Identity and Its Impact Upon Female Protagonists in Fictional Narratives
No ratings yet
Gender Autonomy As A Feminist Premise of Identity and Its Impact Upon Female Protagonists in Fictional Narratives
7 pages
7.2 Algorithms
No ratings yet
7.2 Algorithms
4 pages
NLP Notes Unit2 & Unit3
No ratings yet
NLP Notes Unit2 & Unit3
22 pages
Types of Sentence Structures
No ratings yet
Types of Sentence Structures
6 pages
BH23-Btech CSE (AI & ML)
No ratings yet
BH23-Btech CSE (AI & ML)
8 pages
POS Tagging Comparison
No ratings yet
POS Tagging Comparison
3 pages
What Is POS Tagging in NLP
No ratings yet
What Is POS Tagging in NLP
8 pages
Passive Voice Test for Teens
No ratings yet
Passive Voice Test for Teens
4 pages
Paper 8681
No ratings yet
Paper 8681
3 pages
APP Lab Programs List
No ratings yet
APP Lab Programs List
2 pages
Lets Celebrate Diversity!: Actividad Stop Bullying (Día 2)
No ratings yet
Lets Celebrate Diversity!: Actividad Stop Bullying (Día 2)
5 pages
Unit-3.Word Level Analysis AIML
No ratings yet
Unit-3.Word Level Analysis AIML
5 pages
REVIEW G Pratico and M V Van Pelt Basics
No ratings yet
REVIEW G Pratico and M V Van Pelt Basics
1 page
Big Grammar Revision Board Game Fun Activities Games Games Icebreakers Oneonone Ac 78674
No ratings yet
Big Grammar Revision Board Game Fun Activities Games Games Icebreakers Oneonone Ac 78674
2 pages
TNT - A Statistical Part-Of-Speech Tagger: T I I 1 I 2 I I T+1 T
No ratings yet
TNT - A Statistical Part-Of-Speech Tagger: T I I 1 I 2 I I T+1 T
8 pages
Modification Reason Quantum Computing
No ratings yet
Modification Reason Quantum Computing
1 page
Hutchinson Resume
No ratings yet
Hutchinson Resume
2 pages

Catching Up

Uploaded by

Catching Up

Uploaded by

Catching Up

• Goal: choose the best sequence of tags T for a

• Start with simple (less accurate) rules…learn

The horse raced past the barn fell.

• For any NLP problem, we need to know how to

• Tag indeterminacy: when ‘truth’ isn’t clear

You might also like