0% found this document useful (0 votes)

356 views11 pages

Unit III 1

Nlp

Uploaded by

paij6328

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

356 views11 pages

Unit III 1

Nlp

Uploaded by

paij6328

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Unit-3

Semantic Parsing

1. Introduction

Semantic parsing is the process of converting natural language into a machine-

readable semantic representation. This involves interpreting and representing the
meaning of a sentence or text in a structured form, such as logical forms, Abstract
Meaning Representation (AMR), or dependency graphs. Semantic parsing enables
machines to perform reasoning, answer questions, and carry out tasks based on natural
language input.

Key Goals:

1. Derive precise, context-sensitive meaning from input text.

2. Handle ambiguities inherent in natural language.
3. Enable downstream applications like question answering, chatbots, and
machine translation.

2. Semantic Interpretation

Semantic interpretation involves analyzing and representing the meaning of linguistic

input. It bridges syntax (structure) and semantics (meaning). Key challenges include
dealing with ambiguities, varying word senses, and resolving entities and events.

a. Structural Ambiguity

Occurs when multiple interpretations arise from the same syntactic structure.

Example:"She saw the man with a telescope."

o Interpretation 1: She used a telescope to see the man.

o Interpretation 2: The man has the telescope.

Resolution Approaches:

1. Syntactic Parsing: Derive possible parse trees using tools like Stanford
Parser.
Example parse trees:

o Case 1: [She saw [the man [with a telescope]]].

o Case 2: [She saw [the man]] [with a telescope].

2. Semantic Constraints: Use context or domain-specific rules.

If context suggests she’s an astronomer, the first interpretation is likely.

b. Word Sense
Word sense refers to a word's meaning in context. Disambiguating word sense is
crucial for accurate semantic interpretation or The process of identifying the correct
sense of a word based on its context.

Example:

 "I deposited money in the bank."

o Word Sense: Bank → Financial institution.
 "The boat reached the river bank."

o Word Sense: Bank → River edge.

Techniques for Word Sense Disambiguation (WSD):

 Lesk Algorithm: Matches sentence context with glosses in a lexical resource

like WordNet.
o Bank (financial institution) → A financial establishment for depositing
money.
o Bank (river edge) → The land alongside a river.

 Supervised Learning: Train models on annotated datasets where each word is

labeled with its sense.
 Transformer Models: Contextual embeddings (e.g., BERT) understand word
meaning by analyzing the entire sentence.

c. Entity and Event Resolution

This involves identifying and linking entities and events within text.

Entity Resolution:

 "Alice went to Paris. She loved the city."

Resolve She to Alice and the city to Paris.

Techniques:

1. Coreference Resolution: Use algorithms like SpaCy's neuralcoref to link

mentions.
2. Named Entity Recognition (NER): Extract and classify entities using models
like BiLSTM-CRF.

Event Resolution:

 "John started running at 6 AM. He reached the park by 7 AM."

o Link running and reached the park as events related to John.

d. Predicate-Argument Structure

Describes actions (predicates) and participants (arguments) in a sentence.

Example:

 Sentence: "The boy kicked the ball."

o Predicate: kicked
o Arguments:

 Agent: The boy (Who performed the action?)

 Patient: The ball (What was acted upon?)

Frameworks:

1. FrameNet: Groups sentences into semantic frames like Giving or Buying.

o "Mary gave John a book." → Giving frame.

2. PropBank: Annotates sentences with verb-specific roles.

o kick.01 (PropBank verb sense) → Roles: Arg0 (kicker), Arg1 (thing

kicked).

e. Meaning Representation

Captures semantic meaning in a formal structure.

Examples:

 First-Order Logic:
Sentence: "All cats are animals."
Representation:

1. ∀x(cat(x)→animal(x))

 Abstract Meaning Representation (AMR):

Sentence: "John wants to eat pizza."
Graph representation:

want-01
├── :arg0 (John)
└── :arg1 (eat-01)
├── :arg0 (John)
└── :arg1 (pizza)

 Semantic Role Labeling (SRL): Assigns roles to entities in the sentence.

1. Predicate: eat
2. Arguments:
1. Arg0: John (eater)
2. Arg1: pizza (thing eaten)

3. System Paradigms in Semantic Parsing

Semantic parsing systems can be categorized based on their underlying

methodologies. Each paradigm has strengths and weaknesses, making them suitable
for different use cases.

1. Rule-Based Systems

Rule-based systems rely on handcrafted linguistic rules to parse sentences into

semantic representations. These rules are explicitly defined by experts, often using
grammar formalisms like context-free grammar (CFG) or dependency grammar.

How They Work:

1. Syntactic rules define possible sentence structures.

2. Semantic rules map these structures to their corresponding meaning.

Example: For the sentence "Mary gave John a book.":

 Rule: If the verb is "give," the first noun is the giver, the second noun is
the recipient, and the third is the object.
 Output: (Giver: Mary, Recipient: John, Object: Book)

Advantages:

 High Precision: Well-defined rules ensure accurate parsing in specific

domains.
 Transparency: The system's logic is interpretable and explainable.

Limitations:

 Limited Coverage: Rules are domain-specific and require extensive

effort to generalize.
 Scalability Issues: Difficult to maintain and expand for large, complex
datasets.
 Ambiguity Handling: Cannot efficiently resolve structural or word sense
ambiguities without additional resources.

Applications:

 Early chatbots like ELIZA.

 Knowledge-based systems in restricted domains (e.g., legal or medical
text processing).

2. Statistical Systems
Statistical systems use probabilistic models to predict semantic structures based on
annotated training data. These systems apply probabilistic methods to resolve
ambiguities and improve robustness.

How They Work:

1. Learn syntactic and semantic rules from labeled data (e.g., Penn
Treebank).
2. Compute probabilities for possible interpretations and select the most
likely one.

Key Techniques:

 Probabilistic Context-Free Grammar (PCFG): Extends CFG with

probabilities assigned to grammar rules.

o Example: If a rule VP → V NP has a 70% probability, the system

prefers it over alternatives.

 Hidden Markov Models (HMMs): Useful for sequence labeling tasks

like part-of-speech tagging and semantic role labeling.

Example: For the sentence "He saw the bank,":

 Probability for bank (financial institution) = 0.7.

 Probability for bank (river edge) = 0.3.
 Output: bank → Financial institution.

Advantages:

 Handles Ambiguity: Probabilities guide the system to choose the most

likely interpretation.
 Adaptability: Learns patterns directly from data, reducing manual effort.

Limitations:

 Data Dependency: Requires large annotated corpora.

 Lack of Interpretability: Predictions are based on statistical patterns,
making them harder to explain.

Applications:

 Early machine translation systems.

 Information extraction from structured domains.

3. Neural Systems

Neural systems leverage deep learning to perform semantic parsing. These systems
often use sequence-to-sequence (Seq2Seq) models or transformer architectures.
How They Work:

1. Input sentences are tokenized and converted into embeddings.

2. Neural models learn to map these embeddings to semantic
representations.
3. Output is either structured text (logical forms) or graphs (AMR,
dependency graphs).

Key Techniques:

 Seq2Seq Models: Maps input sentences to semantic outputs.

o Example: Google’s Neural Machine Translation (GNMT) uses

Seq2Seq for language translation.

 Transformers: Pre-trained models like BERT, T5, and GPT fine-tuned

for semantic parsing tasks.

Example: Sentence: "Book a flight from New York to London."

Neural output:

"intent": "book_flight",

"departure": "New York",

"destination": "London"

Advantages:

 High Accuracy: Learns complex patterns and semantic relationships.

 Generalizability: Adapts to various domains with fine-tuning.
 Handles Ambiguity: Captures contextual meaning using large pre-trained
embeddings.

Limitations:

 Data and Compute Intensive: Requires massive labeled datasets and

high computational power.
 Black-Box Nature: Hard to interpret or debug.

Applications:

 Virtual assistants like Alexa, Siri, and Google Assistant.

 Question answering systems (e.g., BERT-based).
4. Hybrid Systems

Hybrid systems combine rule-based and neural/statistical methods to leverage the

advantages of both.

How They Work:

1. Rules handle deterministic, domain-specific tasks (e.g., entity recognition

in medical texts).
2. Neural/statistical models address more complex, ambiguous cases.

Example Workflow:

 Rule: Identify dates, locations, and organizations using regular

expressions.
 Neural Model: Use BERT to classify intent or resolve ambiguities.

Advantages:

 Flexibility: Rules provide precision, while neural/statistical models add

generality.
 Cost-Effective: Reduces the need for extensive annotation in rule-
covered domains.

Limitations:

 Integration Complexity: Requires careful design to combine different

paradigms.
 Maintenance Overhead: Hybrid models can become complex to update.

Applications:

 Intelligent search engines (e.g., combining keyword matching and intent

detection).
 Industry-specific virtual assistants.

Comparison of Paradigms

Example
Paradigm Advantages Disadvantages
Applications

Rule- High precision, Limited scalability, labor- ELIZA, expert

Based interpretable intensive systems

Handles ambiguity, learns Data-dependent, less Early MT systems,

Statistical
from data interpretable NER
Example
Paradigm Advantages Disadvantages
Applications

High accuracy, Data/compute-intensive, Virtual assistants, QA

Neural
generalizable black-box systems

Combines strengths of Complex to Industry-specific AI

Hybrid
other paradigms integrate/maintain systems

4. Word Sense in NLP

What is Word Sense?

Word sense refers to the specific meaning of a word in a given context. Many words
are polysemous, meaning they have multiple senses or meanings. Accurately
determining the intended meaning of a word is crucial for tasks like machine
translation, semantic parsing, and question answering.

Key Components of Word Sense

1. Polysemy

 A single word having multiple meanings.

 Example:
o Bank:
1. Financial institution (e.g., I deposited money in the bank.)
2. River edge (e.g., The boat reached the river bank.)

2. Word Sense Disambiguation (WSD)****

The process of identifying the correct sense of a word based on its context.

Techniques for Word Sense Disambiguation

a. Lesk Algorithm (Knowledge-Based)

The Lesk algorithm disambiguates a word by finding overlaps between the

context of the word and dictionary definitions (glosses) of its senses.

Example:

 Word: Bank
 Sentence: The fisherman sat on the bank of the river.
 Glosses:
1. Bank (financial institution): "A place for receiving deposits."
2. Bank (river edge): "The land alongside a river."
 Overlap: River matches with the second gloss. Thus, the sense is river
edge.
b. Supervised Learning

Uses labeled datasets where words are annotated with their senses. Machine
learning models are trained to predict the sense of a word given its context.

Steps:

1. Feature extraction: Contextual words, part-of-speech tags, and syntactic

dependencies.
2. Training: Use datasets like SemCor to train models such as:

 Decision Trees
 Support Vector Machines (SVMs)
 Neural Networks

Example:

 Input: "I went to the bank to withdraw cash."

 Features: Context words (withdraw, cash).
 Output: Sense → Financial institution.

c. Unsupervised Learning

Clusters words into different senses based on context without labeled data.
Common techniques include:

 Word Embeddings: Group similar contexts in vector space (e.g.,

Word2Vec, GloVe).
 Clustering: Use K-means or hierarchical clustering to group senses.

d. Neural Approaches (Deep Learning)

Modern WSD systems use contextual word embeddings from pre-trained

models like BERT, ELMo, or GPT to determine word sense dynamically.

How It Works:

 The model generates context-aware embeddings for words.

 Similar contexts produce similar embeddings, enabling accurate sense
identification.

Example: For the word bat:

 Sentence: "The bat flew across the room."

Embedding aligns with the sense flying mammal.
 Sentence: "He hit the ball with a bat."
Embedding aligns with the sense sports equipment.

a. Resources
Key resources for WSD include:

1. WordNet

A lexical database where words are grouped into synsets (sets of cognitive
synonyms), each representing a distinct sense.

 Relationships: Hypernym (is-a), Hyponym (kind-of), Meronym (part-of).

 Example for Bank:

 Sense 1: "A financial institution."

 Sense 2: "Land alongside a river."

2. BabelNet

A multilingual resource integrating WordNet and Wikipedia. Useful for cross-

lingual WSD tasks.

3. FrameNet

Focuses on semantic frames. Groups words into concepts based on their roles
in a scenario. Example: Giving frame includes words like give, transfer, donate.

b. Systems

WSD systems can be categorized as:

1. Lesk-Based Systems

 Simple to implement.
 Requires access to a lexical database (e.g., WordNet).

2. Supervised Systems

 Train on labeled corpora (e.g., Senseval, SemCor).

 Examples:

o Decision Trees: Learn rules for predicting senses.

o Neural Models: Transformer-based models fine-tuned for WSD.

3. Unsupervised Systems

 Relies on clustering or word embeddings.

 Effective when labeled data is scarce.

4. Deep Learning-Based Systems

 Use pre-trained models like BERT, GPT, or T5 for dynamic sense
disambiguation.
 Examples:

o Fine-tuned BERT for WSD: Uses contextual embeddings to assign

senses.
o Transformer-based pipelines: End-to-end systems for tasks like
machine translation and semantic parsing.

c. Software

1. NLTK (Python): Implements WSD using WordNet.

Example: nltk.wsd.lesk() function.
2. SpaCy: Provides out-of-the-box tools for entity recognition and WSD.
3. HuggingFace Transformers: Pre-trained models like BERT for semantic
tasks.

Challenges in Word Sense Disambiguation

 Ambiguity in Context: Short contexts often lack sufficient clues for

disambiguation. Example: "He went to the bank."
 Lack of Labeled Data:Creating annotated datasets for all possible word senses is
challenging.
 Rare Senses:Models may struggle with less frequent meanings.
 Domain-Specific Senses:Word meanings vary across domains (e.g., mouse in
biology vs. technology).

SE Unit 4
No ratings yet
SE Unit 4
15 pages
Hyundai Engine HMC l4kb9 Shop Manual
100% (64)
Hyundai Engine HMC l4kb9 Shop Manual
10 pages
NLP Unit 3
No ratings yet
NLP Unit 3
20 pages
System Paradigms in NLP
No ratings yet
System Paradigms in NLP
8 pages
STM Unit 5
No ratings yet
STM Unit 5
31 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
118 pages
Lecture 3: Text Processing & Minimum Edit Distance Algorithm
No ratings yet
Lecture 3: Text Processing & Minimum Edit Distance Algorithm
57 pages
Ai Model Question Paper-4
No ratings yet
Ai Model Question Paper-4
23 pages
Unit 3 AI Srs 13-14
No ratings yet
Unit 3 AI Srs 13-14
45 pages
Applications of Context Free Grammars
No ratings yet
Applications of Context Free Grammars
23 pages
Prolog Basics for Beginners
No ratings yet
Prolog Basics for Beginners
15 pages
Representing Knowledge in An Uncertain Domain IN AI: Bayesian Networks
No ratings yet
Representing Knowledge in An Uncertain Domain IN AI: Bayesian Networks
7 pages
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
No ratings yet
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
35 pages
Aecs Lab Manual Final - 2019-20
No ratings yet
Aecs Lab Manual Final - 2019-20
101 pages
Full Stack Lab Manual
No ratings yet
Full Stack Lab Manual
11 pages
SM 6th-Sem Cse Internet-Of-Things
No ratings yet
SM 6th-Sem Cse Internet-Of-Things
76 pages
R22 III II KRR CSEAIML Model QP
100% (2)
R22 III II KRR CSEAIML Model QP
2 pages
CO1 CC PPT Session 6
No ratings yet
CO1 CC PPT Session 6
22 pages
STM Unit-2
No ratings yet
STM Unit-2
72 pages
NLP Unit-1 Notes
No ratings yet
NLP Unit-1 Notes
59 pages
Unit 1
No ratings yet
Unit 1
15 pages
Unit 4
No ratings yet
Unit 4
26 pages
Stm-Unit I-Path Testing
No ratings yet
Stm-Unit I-Path Testing
66 pages
Unit-5 ML
100% (1)
Unit-5 ML
14 pages
Distributed Systems Unit 4
No ratings yet
Distributed Systems Unit 4
26 pages
Cognitive Computing (Course Code: 18CS3272) : CO1 - Session4 Session Topic: The Elements of A Cognitive System
No ratings yet
Cognitive Computing (Course Code: 18CS3272) : CO1 - Session4 Session Topic: The Elements of A Cognitive System
9 pages
R18 CSM 3-2 Devops
No ratings yet
R18 CSM 3-2 Devops
28 pages
DevOps (UNIT - I)
No ratings yet
DevOps (UNIT - I)
21 pages
OS & SE Lab Manual by Chiru
No ratings yet
OS & SE Lab Manual by Chiru
120 pages
Unit-3 Notes
No ratings yet
Unit-3 Notes
6 pages
6CS4-02 ML PPT Unit-3
No ratings yet
6CS4-02 ML PPT Unit-3
52 pages
Data Warehousing & OLAP Essentials
No ratings yet
Data Warehousing & OLAP Essentials
65 pages
Chpater 1 - Unit 2
No ratings yet
Chpater 1 - Unit 2
31 pages
Introduction to Software Engineering
No ratings yet
Introduction to Software Engineering
29 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
WS MCQ (Sem-5) (Itscholar - Codegency.co - In) (MC)
No ratings yet
WS MCQ (Sem-5) (Itscholar - Codegency.co - In) (MC)
22 pages
IRS Unit-3
No ratings yet
IRS Unit-3
30 pages
N Gram Data Structure in Information Retrieval Systems
No ratings yet
N Gram Data Structure in Information Retrieval Systems
8 pages
Da Unit-2
No ratings yet
Da Unit-2
23 pages
Mobile Computing Unit 4 Guide
No ratings yet
Mobile Computing Unit 4 Guide
31 pages
Smooth N-Gram
No ratings yet
Smooth N-Gram
2 pages
Unit 5 1
No ratings yet
Unit 5 1
18 pages
Department of Information Technology: Question Bank Department: IT Semester: II Class:-BE Subject:-Ubiquitous Computing
No ratings yet
Department of Information Technology: Question Bank Department: IT Semester: II Class:-BE Subject:-Ubiquitous Computing
4 pages
Unit 1 FIOT
No ratings yet
Unit 1 FIOT
28 pages
Unit 5
No ratings yet
Unit 5
17 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Unit-1 Aim 502
No ratings yet
Unit-1 Aim 502
15 pages
AI Spectrum U5
100% (1)
AI Spectrum U5
30 pages
SOA and Cloud Computing Explained
No ratings yet
SOA and Cloud Computing Explained
25 pages
AI & ML Lab Manual 2022-2023
No ratings yet
AI & ML Lab Manual 2022-2023
44 pages
AM601PC KRR Unit 1
No ratings yet
AM601PC KRR Unit 1
16 pages
Knowledge Representation
100% (1)
Knowledge Representation
7 pages
Unit-1 FSD
No ratings yet
Unit-1 FSD
50 pages
Distributed Systems Overview
100% (1)
Distributed Systems Overview
49 pages
NNDL Unit-1
No ratings yet
NNDL Unit-1
28 pages
DATA ANALYTICS Syllabus 3 Units
No ratings yet
DATA ANALYTICS Syllabus 3 Units
37 pages
AI Knowledge Representation Guide
No ratings yet
AI Knowledge Representation Guide
39 pages
Network Layer: Design & Routing
No ratings yet
Network Layer: Design & Routing
32 pages
NLP Notes Unit-3
No ratings yet
NLP Notes Unit-3
19 pages
UNIT 4 New
No ratings yet
UNIT 4 New
14 pages
NLP Unit-2
No ratings yet
NLP Unit-2
18 pages
Noise Pollution
No ratings yet
Noise Pollution
7 pages
VSAQ
No ratings yet
VSAQ
7 pages
POE Notes Student
No ratings yet
POE Notes Student
74 pages
Natural Disasters
No ratings yet
Natural Disasters
14 pages
Unit 2 - Esp in Elt - Complete
No ratings yet
Unit 2 - Esp in Elt - Complete
35 pages
Patiala Army Recruitment Rally 2020
No ratings yet
Patiala Army Recruitment Rally 2020
9 pages
Boarding Pass: Name Booking Code Ticket No
No ratings yet
Boarding Pass: Name Booking Code Ticket No
1 page
Pottery Basics
No ratings yet
Pottery Basics
29 pages
Teaching Reading Skills in A Foreign Language
No ratings yet
Teaching Reading Skills in A Foreign Language
14 pages
Biology of Stem Cells: An Overview: Pedro C. Chagastelles and Nance B. Nardi
No ratings yet
Biology of Stem Cells: An Overview: Pedro C. Chagastelles and Nance B. Nardi
5 pages
Szymanowski List of Compositions
No ratings yet
Szymanowski List of Compositions
12 pages
MCQ
67% (3)
MCQ
274 pages
Solar PV Grant Declaration of Works Form
No ratings yet
Solar PV Grant Declaration of Works Form
2 pages
Tyre Industry in India - Me Project
100% (2)
Tyre Industry in India - Me Project
17 pages
CBSE Sample Question Paper-2021 (Solved) : Section-A (Objective Type)
0% (1)
CBSE Sample Question Paper-2021 (Solved) : Section-A (Objective Type)
17 pages
Tutorial 07-MA 1063
No ratings yet
Tutorial 07-MA 1063
2 pages
01
No ratings yet
01
314 pages
MCB Types
No ratings yet
MCB Types
3 pages
Katz-Moses Multi Sled FENCE Drawing v2
No ratings yet
Katz-Moses Multi Sled FENCE Drawing v2
1 page
CD Lab Exam
No ratings yet
CD Lab Exam
3 pages
Chapter 6 Barriers To International Trade
No ratings yet
Chapter 6 Barriers To International Trade
13 pages
PROBLEMS (Homework)
No ratings yet
PROBLEMS (Homework)
5 pages
BDA Lab Manual 200305105108
No ratings yet
BDA Lab Manual 200305105108
44 pages
9 RWS PT 4 Math Nida 202425
No ratings yet
9 RWS PT 4 Math Nida 202425
2 pages
Business Plan Group 3
100% (1)
Business Plan Group 3
12 pages
Black Dog Institute Online Clinic Assessment Report
No ratings yet
Black Dog Institute Online Clinic Assessment Report
7 pages
Sale of Goods Act 1930 Overview
No ratings yet
Sale of Goods Act 1930 Overview
27 pages
Chapter 3 BJT
No ratings yet
Chapter 3 BJT
58 pages
Universal Shipbuilding Corporation: Single Loop Electro-Hydraulic Steering Gear S.No.038 TYPE
No ratings yet
Universal Shipbuilding Corporation: Single Loop Electro-Hydraulic Steering Gear S.No.038 TYPE
125 pages
LESSON PLAN - 04-Graphing Linear Equations in Two Variables
No ratings yet
LESSON PLAN - 04-Graphing Linear Equations in Two Variables
6 pages
QUESTÕES A SEREM TRABALHADAS EM SALA DE AULA.1111docx
No ratings yet
QUESTÕES A SEREM TRABALHADAS EM SALA DE AULA.1111docx
7 pages
DICA Lab Manual PDF
No ratings yet
DICA Lab Manual PDF
64 pages