Unit-3
Semantic Parsing
1. Introduction
Semantic parsing is the process of converting natural language into a machine-
readable semantic representation. This involves interpreting and representing the
meaning of a sentence or text in a structured form, such as logical forms, Abstract
Meaning Representation (AMR), or dependency graphs. Semantic parsing enables
machines to perform reasoning, answer questions, and carry out tasks based on natural
language input.
Key Goals:
1. Derive precise, context-sensitive meaning from input text.
2. Handle ambiguities inherent in natural language.
3. Enable downstream applications like question answering, chatbots, and
machine translation.
2. Semantic Interpretation
Semantic interpretation involves analyzing and representing the meaning of linguistic
input. It bridges syntax (structure) and semantics (meaning). Key challenges include
dealing with ambiguities, varying word senses, and resolving entities and events.
a. Structural Ambiguity
Occurs when multiple interpretations arise from the same syntactic structure.
Example:"She saw the man with a telescope."
o Interpretation 1: She used a telescope to see the man.
o Interpretation 2: The man has the telescope.
Resolution Approaches:
1. Syntactic Parsing: Derive possible parse trees using tools like Stanford
Parser.
Example parse trees:
o Case 1: [She saw [the man [with a telescope]]].
o Case 2: [She saw [the man]] [with a telescope].
2. Semantic Constraints: Use context or domain-specific rules.
If context suggests she’s an astronomer, the first interpretation is likely.
b. Word Sense
Word sense refers to a word's meaning in context. Disambiguating word sense is
crucial for accurate semantic interpretation or The process of identifying the correct
sense of a word based on its context.
Example:
"I deposited money in the bank."
o Word Sense: Bank → Financial institution.
"The boat reached the river bank."
o Word Sense: Bank → River edge.
Techniques for Word Sense Disambiguation (WSD):
Lesk Algorithm: Matches sentence context with glosses in a lexical resource
like WordNet.
o Bank (financial institution) → A financial establishment for depositing
money.
o Bank (river edge) → The land alongside a river.
Supervised Learning: Train models on annotated datasets where each word is
labeled with its sense.
Transformer Models: Contextual embeddings (e.g., BERT) understand word
meaning by analyzing the entire sentence.
c. Entity and Event Resolution
This involves identifying and linking entities and events within text.
Entity Resolution:
"Alice went to Paris. She loved the city."
Resolve She to Alice and the city to Paris.
Techniques:
1. Coreference Resolution: Use algorithms like SpaCy's neuralcoref to link
mentions.
2. Named Entity Recognition (NER): Extract and classify entities using models
like BiLSTM-CRF.
Event Resolution:
"John started running at 6 AM. He reached the park by 7 AM."
o Link running and reached the park as events related to John.
d. Predicate-Argument Structure
Describes actions (predicates) and participants (arguments) in a sentence.
Example:
Sentence: "The boy kicked the ball."
o Predicate: kicked
o Arguments:
Agent: The boy (Who performed the action?)
Patient: The ball (What was acted upon?)
Frameworks:
1. FrameNet: Groups sentences into semantic frames like Giving or Buying.
o "Mary gave John a book." → Giving frame.
2. PropBank: Annotates sentences with verb-specific roles.
o kick.01 (PropBank verb sense) → Roles: Arg0 (kicker), Arg1 (thing
kicked).
e. Meaning Representation
Captures semantic meaning in a formal structure.
Examples:
First-Order Logic:
Sentence: "All cats are animals."
Representation:
1. ∀x(cat(x)→animal(x))
Abstract Meaning Representation (AMR):
Sentence: "John wants to eat pizza."
Graph representation:
want-01
├── :arg0 (John)
└── :arg1 (eat-01)
├── :arg0 (John)
└── :arg1 (pizza)
Semantic Role Labeling (SRL): Assigns roles to entities in the sentence.
1. Predicate: eat
2. Arguments:
1. Arg0: John (eater)
2. Arg1: pizza (thing eaten)
3. System Paradigms in Semantic Parsing
Semantic parsing systems can be categorized based on their underlying
methodologies. Each paradigm has strengths and weaknesses, making them suitable
for different use cases.
1. Rule-Based Systems
Rule-based systems rely on handcrafted linguistic rules to parse sentences into
semantic representations. These rules are explicitly defined by experts, often using
grammar formalisms like context-free grammar (CFG) or dependency grammar.
How They Work:
1. Syntactic rules define possible sentence structures.
2. Semantic rules map these structures to their corresponding meaning.
Example: For the sentence "Mary gave John a book.":
Rule: If the verb is "give," the first noun is the giver, the second noun is
the recipient, and the third is the object.
Output: (Giver: Mary, Recipient: John, Object: Book)
Advantages:
High Precision: Well-defined rules ensure accurate parsing in specific
domains.
Transparency: The system's logic is interpretable and explainable.
Limitations:
Limited Coverage: Rules are domain-specific and require extensive
effort to generalize.
Scalability Issues: Difficult to maintain and expand for large, complex
datasets.
Ambiguity Handling: Cannot efficiently resolve structural or word sense
ambiguities without additional resources.
Applications:
Early chatbots like ELIZA.
Knowledge-based systems in restricted domains (e.g., legal or medical
text processing).
2. Statistical Systems
Statistical systems use probabilistic models to predict semantic structures based on
annotated training data. These systems apply probabilistic methods to resolve
ambiguities and improve robustness.
How They Work:
1. Learn syntactic and semantic rules from labeled data (e.g., Penn
Treebank).
2. Compute probabilities for possible interpretations and select the most
likely one.
Key Techniques:
Probabilistic Context-Free Grammar (PCFG): Extends CFG with
probabilities assigned to grammar rules.
o Example: If a rule VP → V NP has a 70% probability, the system
prefers it over alternatives.
Hidden Markov Models (HMMs): Useful for sequence labeling tasks
like part-of-speech tagging and semantic role labeling.
Example: For the sentence "He saw the bank,":
Probability for bank (financial institution) = 0.7.
Probability for bank (river edge) = 0.3.
Output: bank → Financial institution.
Advantages:
Handles Ambiguity: Probabilities guide the system to choose the most
likely interpretation.
Adaptability: Learns patterns directly from data, reducing manual effort.
Limitations:
Data Dependency: Requires large annotated corpora.
Lack of Interpretability: Predictions are based on statistical patterns,
making them harder to explain.
Applications:
Early machine translation systems.
Information extraction from structured domains.
3. Neural Systems
Neural systems leverage deep learning to perform semantic parsing. These systems
often use sequence-to-sequence (Seq2Seq) models or transformer architectures.
How They Work:
1. Input sentences are tokenized and converted into embeddings.
2. Neural models learn to map these embeddings to semantic
representations.
3. Output is either structured text (logical forms) or graphs (AMR,
dependency graphs).
Key Techniques:
Seq2Seq Models: Maps input sentences to semantic outputs.
o Example: Google’s Neural Machine Translation (GNMT) uses
Seq2Seq for language translation.
Transformers: Pre-trained models like BERT, T5, and GPT fine-tuned
for semantic parsing tasks.
Example: Sentence: "Book a flight from New York to London."
Neural output:
"intent": "book_flight",
"departure": "New York",
"destination": "London"
Advantages:
High Accuracy: Learns complex patterns and semantic relationships.
Generalizability: Adapts to various domains with fine-tuning.
Handles Ambiguity: Captures contextual meaning using large pre-trained
embeddings.
Limitations:
Data and Compute Intensive: Requires massive labeled datasets and
high computational power.
Black-Box Nature: Hard to interpret or debug.
Applications:
Virtual assistants like Alexa, Siri, and Google Assistant.
Question answering systems (e.g., BERT-based).
4. Hybrid Systems
Hybrid systems combine rule-based and neural/statistical methods to leverage the
advantages of both.
How They Work:
1. Rules handle deterministic, domain-specific tasks (e.g., entity recognition
in medical texts).
2. Neural/statistical models address more complex, ambiguous cases.
Example Workflow:
Rule: Identify dates, locations, and organizations using regular
expressions.
Neural Model: Use BERT to classify intent or resolve ambiguities.
Advantages:
Flexibility: Rules provide precision, while neural/statistical models add
generality.
Cost-Effective: Reduces the need for extensive annotation in rule-
covered domains.
Limitations:
Integration Complexity: Requires careful design to combine different
paradigms.
Maintenance Overhead: Hybrid models can become complex to update.
Applications:
Intelligent search engines (e.g., combining keyword matching and intent
detection).
Industry-specific virtual assistants.
Comparison of Paradigms
Example
Paradigm Advantages Disadvantages
Applications
Rule- High precision, Limited scalability, labor- ELIZA, expert
Based interpretable intensive systems
Handles ambiguity, learns Data-dependent, less Early MT systems,
Statistical
from data interpretable NER
Example
Paradigm Advantages Disadvantages
Applications
High accuracy, Data/compute-intensive, Virtual assistants, QA
Neural
generalizable black-box systems
Combines strengths of Complex to Industry-specific AI
Hybrid
other paradigms integrate/maintain systems
4. Word Sense in NLP
What is Word Sense?
Word sense refers to the specific meaning of a word in a given context. Many words
are polysemous, meaning they have multiple senses or meanings. Accurately
determining the intended meaning of a word is crucial for tasks like machine
translation, semantic parsing, and question answering.
Key Components of Word Sense
1. Polysemy
A single word having multiple meanings.
Example:
o Bank:
1. Financial institution (e.g., I deposited money in the bank.)
2. River edge (e.g., The boat reached the river bank.)
2. Word Sense Disambiguation (WSD)****
The process of identifying the correct sense of a word based on its context.
Techniques for Word Sense Disambiguation
a. Lesk Algorithm (Knowledge-Based)
The Lesk algorithm disambiguates a word by finding overlaps between the
context of the word and dictionary definitions (glosses) of its senses.
Example:
Word: Bank
Sentence: The fisherman sat on the bank of the river.
Glosses:
1. Bank (financial institution): "A place for receiving deposits."
2. Bank (river edge): "The land alongside a river."
Overlap: River matches with the second gloss. Thus, the sense is river
edge.
b. Supervised Learning
Uses labeled datasets where words are annotated with their senses. Machine
learning models are trained to predict the sense of a word given its context.
Steps:
1. Feature extraction: Contextual words, part-of-speech tags, and syntactic
dependencies.
2. Training: Use datasets like SemCor to train models such as:
Decision Trees
Support Vector Machines (SVMs)
Neural Networks
Example:
Input: "I went to the bank to withdraw cash."
Features: Context words (withdraw, cash).
Output: Sense → Financial institution.
c. Unsupervised Learning
Clusters words into different senses based on context without labeled data.
Common techniques include:
Word Embeddings: Group similar contexts in vector space (e.g.,
Word2Vec, GloVe).
Clustering: Use K-means or hierarchical clustering to group senses.
d. Neural Approaches (Deep Learning)
Modern WSD systems use contextual word embeddings from pre-trained
models like BERT, ELMo, or GPT to determine word sense dynamically.
How It Works:
The model generates context-aware embeddings for words.
Similar contexts produce similar embeddings, enabling accurate sense
identification.
Example: For the word bat:
Sentence: "The bat flew across the room."
Embedding aligns with the sense flying mammal.
Sentence: "He hit the ball with a bat."
Embedding aligns with the sense sports equipment.
a. Resources
Key resources for WSD include:
1. WordNet
A lexical database where words are grouped into synsets (sets of cognitive
synonyms), each representing a distinct sense.
Relationships: Hypernym (is-a), Hyponym (kind-of), Meronym (part-of).
Example for Bank:
Sense 1: "A financial institution."
Sense 2: "Land alongside a river."
2. BabelNet
A multilingual resource integrating WordNet and Wikipedia. Useful for cross-
lingual WSD tasks.
3. FrameNet
Focuses on semantic frames. Groups words into concepts based on their roles
in a scenario. Example: Giving frame includes words like give, transfer, donate.
b. Systems
WSD systems can be categorized as:
1. Lesk-Based Systems
Simple to implement.
Requires access to a lexical database (e.g., WordNet).
2. Supervised Systems
Train on labeled corpora (e.g., Senseval, SemCor).
Examples:
o Decision Trees: Learn rules for predicting senses.
o Neural Models: Transformer-based models fine-tuned for WSD.
3. Unsupervised Systems
Relies on clustering or word embeddings.
Effective when labeled data is scarce.
4. Deep Learning-Based Systems
Use pre-trained models like BERT, GPT, or T5 for dynamic sense
disambiguation.
Examples:
o Fine-tuned BERT for WSD: Uses contextual embeddings to assign
senses.
o Transformer-based pipelines: End-to-end systems for tasks like
machine translation and semantic parsing.
c. Software
1. NLTK (Python): Implements WSD using WordNet.
Example: nltk.wsd.lesk() function.
2. SpaCy: Provides out-of-the-box tools for entity recognition and WSD.
3. HuggingFace Transformers: Pre-trained models like BERT for semantic
tasks.
Challenges in Word Sense Disambiguation
Ambiguity in Context: Short contexts often lack sufficient clues for
disambiguation. Example: "He went to the bank."
Lack of Labeled Data:Creating annotated datasets for all possible word senses is
challenging.
Rare Senses:Models may struggle with less frequent meanings.
Domain-Specific Senses:Word meanings vary across domains (e.g., mouse in
biology vs. technology).