Advances in AI
Module-1
Lexicalized vs Unlexicalized
“Lexicalized" and “Unlexicalized" refer to different approaches in
modeling language structures, such as syntactic parsing, dependency
parsing, or even language models.
Lexicalized Models
Lexicalized models incorporate specific words (lexical items) into their
structure. These models directly account for the identity of words when
making predictions or analyzing language. They rely heavily on the
actual words present in the training data to make decisions.
Lexicalized Models
Characteristics:
• Word-Specific Information: Lexicalized models use the identity of words as part
of the model. For instance, in syntactic parsing, a lexicalized model might use
specific words as head words in a parse tree.
• Dependency on Vocabulary: Because lexicalized models are closely tied to
specific words, they are often more sensitive to the vocabulary seen during
training. They might perform less well on unseen or rare words.
• Example in Parsing: In lexicalized parsing, each node in a parse tree might be
associated with a specific word (the head word), which helps determine the
syntactic structure based on the relationships between these head words.
• Rich Feature Space: They often require more features, such as word pairs or
word-structure combinations, making the model more complex but also more
potentially accurate when sufficient data is available.
Lexicalized Models
Example:
A lexicalized parser might distinguish between "bank" as a financial
institution and "bank" as the side of a river by considering the
surrounding words and their specific forms.
Unlexicalized Models
Unlexicalized models do not incorporate specific word identities into
their structures. Instead, they rely on abstract representations like part-
of-speech tags, word embeddings, or general syntactic categories that
are not tied to specific words.
Unlexicalized Models
Characteristics:
• Abstract Representation: Unlexicalized models work with more generalized
features, such as parts of speech or syntactic categories, without considering the
actual words.
• Robustness to Vocabulary: These models are often more robust to unseen words
or variations in vocabulary since they focus on more abstract patterns rather than
word-specific information.
• Example in Parsing: In an unlexicalized parser, the parsing process might be
guided purely by the grammatical structure, using categories like "noun phrase"
or "verb phrase" without considering the specific words within those phrases.
• Simpler Feature Space: Because they do not rely on specific words, unlexicalized
models tend to have a simpler feature space, making them less complex and
often faster, but potentially less accurate in capturing nuances of the language.
Unlexicalized Models
Example:
An unlexicalized parser would treat "the cat sat on the mat" similarly to
"a dog sleeps by the fire" since it focuses on the grammatical roles
rather than the specific words.
Lexicalized vs Unlexicalized
Aspect Lexicalized Unlexicalized
Definition Incorporates specific words (lexical Relies on abstract representations
items) into the model without specific words
Word-Specific Information Uses the identity of words directly Focuses on general patterns like
syntax or parts of speech
Vocabulary Dependence Highly dependent on specific More robust to unseen or rare words
vocabulary seen in training
Rich and complex, involving word Simpler, focusing on grammatical
Feature Space pairs and word-structure categories and structures
combinations
Uses head words in a parse tree to Uses abstract grammatical categories
Example in Parsing determine syntactic structure (e.g., noun phrase) without specific
words
Robustness Less robust to unseen vocabulary More robust to variations in
vocabulary
Model Complexity Typically more complex due to the Generally simpler due to reliance on
inclusion of specific words abstract features
Use Case Example Early syntactic parsers, phrase-based Modern neural language models,
machine translation statistical parsers
Accuracy vs. Generalization Potentially more accurate with Better generalization, especially with
sufficient data but less generalizable limited or diverse data
Training Data Requirement Requires a large amount of data to Can perform well with less data,
capture word-specific patterns focusing on broader patterns
Types of semantics
Semantics refers to the study and modeling of meaning in language.
Different types of semantics are used to capture and process various
aspects of meaning in texts, enabling machines to understand and
generate human language more effectively.
Lexical Semantics
Lexical semantics deals with the meaning of individual words and their
relationships within a language. It focuses on how words represent different types
of concepts and how these concepts are related.
Example: The word "bank" has different meanings:
• "He deposited money in the bank." (financial institution)
• "She sat on the bank of the river." (side of a river)
Lexical semantics focuses on understanding these different meanings of the word
"bank" and distinguishing between them based on context.
Compositional Semantics
Compositional semantics, also known as formal semantics, focuses on how the
meanings of individual words combine to form the meaning of larger linguistic units
like phrases, sentences, or paragraphs.
Example: The sentence "The cat sat on the mat."
• Word meanings: "cat" (a small domesticated animal), "sat" (to rest with the body
supported), "on" (in contact with), "mat" (a piece of material placed on the floor).
• Compositional meaning: The sentence describes a situation where a cat is resting
with its body supported on a mat.
Compositional semantics combines the meanings of individual words according to
grammatical rules to derive the overall meaning of the sentence.
Pragmatic Semantics
Pragmatic semantics focuses on how context influences the interpretation of
meaning in language. It goes beyond the literal meaning of words and sentences to
consider how meaning is constructed in specific situations.
Example: The phrase "Can you pass the salt?"
• Literal meaning: A question about someone's ability to pass the salt.
• Pragmatic meaning: A polite request for the salt to be passed during a meal.
Pragmatic semantics interprets the intended meaning of the phrase based on the
context, recognizing it as a request rather than a literal question.
Distributional Semantics
Distributional semantics is based on the idea that words that occur in similar
contexts tend to have similar meanings. This approach is data-driven and relies on
statistical analysis of large text corpora.
Example: The words "king" and "queen" appear in similar contexts, such as
"royalty," "crown," and "throne."
• Word embeddings: In a vector space model, "king" and "queen" would be
located close to each other, indicating that they have related meanings.
Explanation: Distributional semantics captures the similarity in meaning between
"king" and "queen" based on their shared contextual usage in large text corpora.
Frame Semantics
Frame semantics involves understanding the meaning of words and sentences by
relating them to a broader conceptual structure or "frame." A frame is a structured
representation of a typical situation, encompassing participants, objects, and
events.
Example: The verb "buy" evokes the "commercial transaction" frame.
• Frame elements: Buyer, seller, goods, money.
• Sentence: "John bought a car from the dealership."
• Buyer: John
• Seller: Dealership
• Goods: Car
• Money: Implied transaction
Frame semantics maps the sentence onto the "commercial transaction" frame,
identifying the roles of the buyer, seller, and goods involved.
Formal Semantics
Formal semantics is a mathematically grounded approach to meaning in language,
often using logic to represent and reason about meanings.
Example: The sentence "All dogs are mammals."
• Predicate logic: ∀x (Dog(x) → Mammal(x)), where ∀x means "for all x," and →
represents logical implication.
Formal semantics uses logical representations to express the meaning of the
sentence in a precise, unambiguous way, allowing for reasoning about the truth of
the statement.
Ontological Semantics
Ontological semantics involves mapping words and phrases to a structured
ontology—a formal representation of knowledge within a domain, typically
including entities, relationships, and categories.
Example: In a medical ontology:
• Concepts: "Diabetes" (a disease), "Insulin" (a treatment).
• Relationships: "Diabetes" requires "Insulin".
Ontological semantics uses structured representations like ontologies to define
relationships between concepts, enabling machines to understand and reason
about the domain of medicine.
Conceptual Semantics
Conceptual semantics focuses on the mental representation of meaning, exploring
how concepts are structured in the mind and how they are related.
Example: The concept of a "bird."
• Prototype: A "robin" might be considered a prototypical bird.
• Conceptual schema: Involves attributes like wings, feathers, the ability to fly, and
laying eggs.
Conceptual semantics examines how the concept of a bird is mentally represented
and how a "robin" fits this mental category more typically than a "penguin" might.