Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
1 views14 pages

NLP Unit Class Notes

The document provides an overview of discourse processing, including key tasks such as coreference resolution, discourse segmentation, and text coherence. It also discusses cohesion, reference resolution, and various language models, including n-gram and neural network models, along with their applications in natural language processing (NLP). Additionally, it covers parameter estimation, language model adaptation, and evaluation methods to assess model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views14 pages

NLP Unit Class Notes

The document provides an overview of discourse processing, including key tasks such as coreference resolution, discourse segmentation, and text coherence. It also discusses cohesion, reference resolution, and various language models, including n-gram and neural network models, along with their applications in natural language processing (NLP). Additionally, it covers parameter estimation, language model adaptation, and evaluation methods to assess model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

1.

Discourse Processing Definition

 Discourse processing studies how meaning is conveyed across multiple sentences or


paragraphs.
 It focuses on how sentences relate and contribute to the overall structure and meaning of
a text.

Key Tasks in Discourse Processing

Coreference Resolution

 Definition: Identifying when different words refer to the same entity in a text.
 Example:
"John went to the store. He bought some bread."
→ "He" refers to "John".

Discourse Segmentation

 Definition: Dividing a text into meaningful discourse units (e.g., sentences or topics).
 Example:
Paragraphs in an article about climate change may be segmented into sections like causes,
effects, and solutions.

Text Coherence

 Definition: The logical flow and understandability of a text.


 Features:
o Clear topic progression.
o Use of discourse markers (e.g., however, therefore, in contrast).
 Example:
A coherent essay would connect ideas clearly from the introduction to the conclusion
using logical transitions.

Text Classification

 Definition: Categorizing texts based on content or purpose.


 Applications:
o Sentiment Analysis: Classify text as positive, neutral, or negative.
o Spam Filtering: Identify emails as spam or not spam.
o Topic Modeling: Label articles as sports, politics, entertainment, etc.
 Example:
A tweet saying "The game last night was incredible!" may be classified as positive
sentiment and sports-related.
Applications of Discourse Processing

 Machine Translation: Maintains context across sentences.


 Text Summarization: Ensures coherence in condensed versions.
 Sentiment Analysis: Improves accuracy by considering the full context of a document.

2. Cohension

Definition of Cohesion

 Cohesion refers to the linguistic devices used to connect parts of a text.


 It creates a sense of unity and flow in a text, helping readers follow the intended
meaning.

Cohesion vs. Coherence

 Coherence = overall clarity and logical structure of the text.


 Cohesion = linguistic links (words and grammar) that tie text parts together.

Examples of Cohesive Devices

 Pronouns: he, she, it


 Conjunctions: and, but, or
 Adverbs: however, therefore
 Lexical repetition: Repeating the same or related words
 Types of Cohesive Devices

Type Description Example


Refers back to something previously
Reference "John saw a dog. It was brown."
mentioned
Replaces a word or phrase with a substitute "John saw a dog. The animal was
Substitution
(e.g., synonym or pronoun) brown."
Omits words that are understood from "John ate pizza for dinner, and
Ellipsis
context Mary pasta." (omits "ate")
Connects clauses or sentences using words "John went to the store, and he
Conjunction
like and, but, or bought some bread."
Lexical Links sentences through repeated or related "John drove his car. The vehicle
Cohesion vocabulary was new."

Purpose of Cohesion

 Ensures the text reads smoothly.


 Helps readers understand relationships between ideas.
 Makes the text more engaging and readable.
3. Reference Resolution

Definition

 Reference resolution is the process of identifying which entity a word (typically a


pronoun or noun phrase) refers to in a text.

Importance in NLP

 Crucial for understanding meaning and relationships between entities in sentences or


paragraphs.
 Supports many NLP applications like:
o Machine Translation
o Text Summarization
o Question Answering

What is an Antecedent?

 An antecedent is the word or phrase to which a pronoun or noun phrase refers.


 Example:
"John saw a dog. It was brown."
→ "It" refers to "dog" → "dog" is the antecedent.
 Types of Reference Resolution

Type Description Example


Anaphora Refers backward to a previously mentioned "John saw a dog. It was
Resolution noun (antecedent comes first) brown." ("it" → "dog")
Cataphora Refers forward to a noun that appears later "When he saw the dog, John ran
Resolution (antecedent comes after the pronoun) away." ("he" → "John")

Challenges in Reference Resolution

 Requires contextual understanding.


 Pronouns can be ambiguous or refer to different entities in different situations.
 Complex in long texts or texts with multiple possible referents.

Applications

 Improves performance in:


o Machine Translation (e.g., ensuring pronouns are translated correctly)
o Text Summarization (e.g., keeping track of who did what)
o Question Answering (e.g., resolving "Who is he?" correctly)
4. Discourse Cohension and Structure

Definition of Discourse Cohesion

 Refers to how parts of a text are linguistically connected.


 Uses cohesive devices like:
o Pronouns (e.g., he, she, it)
o Conjunctions (e.g., and, but, because)
o Lexical repetition (e.g., repeating or using related words)
o Cohesive markers (e.g., however, therefore)

Purpose of Discourse Cohesion

 Creates unity and flow within a text.


 Helps readers understand relationships between ideas.
 Supports overall text coherence.

Definition of Discourse Structure

 Refers to the organization and arrangement of ideas in a text.


 Includes structural elements like:
o Headings and subheadings
o Paragraphs
o Sections or thematic divisions

Purpose of Discourse Structure

 Guides the reader through the text.


 Makes content easier to navigate and comprehend.
 Enhances clarity and contributes to coherence.

Combined Importance

 Cohesion + Structure = Clear, coherent, and memorable communication.


 Aids both written and spoken language understanding.

Applications in NLP

 Crucial for:
o Text Summarization
o Question Answering
o Text Classification
 Helps machines understand how ideas are connected and how information is
structured.
5. n-Gram Models

Definition

 n-gram models are statistical language models used to predict the next word in a
sequence based on the previous n−1 words.

What is an n-Gram?

 An n-gram is a sequence of n consecutive words or characters in a text.


 Examples:
o Unigram (1-gram): "dog"
o Bigram (2-gram): "the dog"
o Trigram (3-gram): "the dog barked"

Markov Assumption

 Assumes that the probability of a word depends only on the previous (n−1) words, not
the entire sentence.
 This simplifies computation but limits context understanding.

Training and Estimation

 Trained on large text corpora.


 Uses Maximum Likelihood Estimation (MLE) or other statistical methods to estimate
word probabilities.

Applications

 Widely used in various NLP tasks:


o Speech Recognition
o Machine Translation
o Text Classification
o Spelling Correction

Baseline Model

 n-gram models are often used as baseline models to compare against more complex
models (e.g., neural networks).

Limitations

 Short context window: Limited to (n−1) previous words.


 Fails to capture long-range dependencies in text.
 Data sparsity: Large n-grams may be rare in the training data.
Alternatives to n-Gram Models

 Advanced models that handle longer context and semantics:


o Recurrent Neural Networks (RNNs)
o Long Short-Term Memory (LSTM)
o Transformer-based models (e.g., BERT, GPT)

n-Gram Type Example Context Used


Unigram (1-gram) "dog" —
Bigram (2-gram) "the dog" 1 previous word
Trigram (3-gram) "the dog barked" 2 previous words

6. Language Model Evaluation

Purpose of Evaluation

 Measures how well a language model performs on specific language tasks or datasets.
 Evaluates the model’s ability to:
o Predict the next word
o Generate coherent and relevant text
o Perform NLP tasks like translation or summarization
 Evaluation Methods

Method Description Example Metrics


Measures how well a model predicts the next word.
1. Perplexity Perplexity score
Lower = better performance.
2. Human Human judges rate output for fluency, coherence, Ratings or qualitative
Evaluation and relevance. feedback
3. Task-Specific Evaluates model performance on tasks like Accuracy, precision,
Evaluation translation, summarization, or sentiment analysis. recall, F1-score
4. Diversity & Assesses how varied and original the generated text Distinct-n, novelty
Novelty is. scores

Key Metrics Explained

 Perplexity:
Lower perplexity = model better predicts next word.
→ Example: A model with perplexity 25 is better than one with 60 on the same dataset.
 Accuracy / Precision / Recall:
Used in classification-based tasks like sentiment analysis.
 Human Ratings:
Useful for creative tasks like story generation or dialogue systems.
 Diversity Metrics:
Evaluate whether the output is not repetitive and shows novel patterns.

Importance of Appropriate Evaluation

 No single metric works for all tasks.


 Choose evaluation methods that match the specific application, such as:
o BLEU / ROUGE for machine translation and summarization
o F1-score for classification tasks
o Perplexity for predictive language modeling

Ongoing Research

 Evaluation is an active research area as models become more complex.


 New metrics are being developed to better assess:
o Context understanding
o Fairness
o Bias
o Factual correctness

7. Parameter Estimation

Definition

 Parameter estimation is the process of determining the best values for model
parameters based on observed data.
 Essential in training NLP models like:
o Language Models
o Part-of-Speech Taggers
o Named Entity Recognition (NER) systems

Objective

 Find parameter values that maximize the likelihood of the observed data.
 This process typically relies on a training corpus (annotated text data).
 Common Methods of Parameter Estimation

Method Description
1. Maximum Likelihood Estimates parameters by maximizing the likelihood of the
Estimation (MLE) observed data.
Uses prior distributions and updates them using Bayes'
2. Bayesian Estimation
theorem.
Method Description
Combines data-driven and prior-based approaches using
3. Empirical Bayes
hierarchical models.

Types of Parameter Estimation

1. Maximum-Likelihood Estimation and Smoothing


o Commonly used with n-gram models.
o Smoothing (e.g., Laplace, Good-Turing) helps address zero-probability issues.
2. Bayesian Parameter Estimation
o Incorporates uncertainty and prior knowledge.
o Useful when data is limited or noisy.
3. Large-Scale Language Models
o Use millions or billions of parameters.
o Require massive datasets and advanced optimization algorithms.

Steps in Parameter Estimation

1. Preprocessing: Clean and format the input text data.


2. Model Selection: Choose an appropriate architecture (e.g., CRF, transformer, HMM).
3. Objective Function: Define a loss function (e.g., cross-entropy) that reflects prediction
accuracy.
4. Optimization Algorithm: Use algorithms to minimize loss and estimate parameters.

8. Language Model Adaptation

Definition and Purpose

 Language model adaptation refers to fine-tuning a pre-trained language model on a


specific domain or task using a small amount of task-specific data.
 Enhances model performance by capturing domain-specific vocabulary and linguistic
patterns.

Approach

 Most common method: Transfer Learning.


o Start with pre-trained weights.
o Fine-tune on the specific task/domain.
 Typically:
o Final layers are updated (task-specific).
o Lower-level layers are kept fixed (general language understanding).

Advantages

1. Improved Task Performance


o Better understanding of domain-specific data.
2. Reduced Training Time & Resources
o Leverages existing models, requiring less new data and compute.
3. Better Handling of Rare/OOV Words
o Pre-trained models already cover a broad vocabulary.

Applications

 Sentiment Analysis
 Text Classification
 Named Entity Recognition (NER)
 Machine Translation

9. Types of Language Models

1. N-gram Models

 Predict the next word using the previous n-1 words.


 Common types:
o Bigram: uses 1 previous word.
o Trigram: uses 2 previous words.
 ✅ Simple and fast;
 ❌Struggles with long-range dependencies.

2. Neural Network Models

 Use deep learning to model word sequences.


 Can learn complex relationships between words.
 Trained on large datasets.
 ✅More accurate than n-grams;
 ❌Requires more data and computation.

3. Transformer-based Models

 Example: GPT, BERT.


 Use self-attention to capture long-range dependencies.
 Achieve state-of-the-art results on many NLP tasks.
 ✅Best performance on diverse NLP tasks;
 ❌Very resource-intensive.

4. Probabilistic Graphical Models

 Represent word relationships as a graph of dependencies.


 Use statistical relationships to predict word sequences.
 ✅Useful in structured prediction tasks;
 ❌Less common today due to deep learning's rise.

5. Rule-based Models

 Use predefined linguistic rules.


 Effective in highly structured domains (e.g., legal, medical).
 ✅Precise in narrow domains;
 ❌Not generalizable or flexible.

10. Language Models


1 Class-Based Language Models
2 Variable-Length Language Models
3 Discriminative Language Models
4 Syntax-Based Language Models
5 MaxEnt Language Models
6 Factored Language Models
7 Other Tree-Based Language Models
8 Bayesian Topic-Based Language Models
9 Neural Network Language Models

 Class-Based Language Models

Definition

 Class-based language models are probabilistic models that group words into classes
based on their distributional similarity.
 They estimate the probability of a word given its class rather than the word itself.

Purpose

 Reduce sparsity in language modeling.


 Improve data efficiency and generalization, especially with limited data.

Steps in Building a Class-Based Language Model

1. Word Clustering
o Words are clustered using algorithms like k-means or hierarchical clustering.
2. Class Construction
o Assign class labels to each cluster.
o Number of classes may be predefined or adaptive.
3. Probability Estimation
oEstimate P(word∣class)P(\text{word}|\text{class})P(word∣class) using maximum
likelihood or Bayesian methods.
4. Language Modeling
o Build a model using these probabilities to predict sequences of words.

Advantages

1. Reduced Sparsity
o Fewer parameters to estimate → improved model accuracy.
2. Improved Data Efficiency
o Less training data needed compared to word-level models.
3. Better Handling of OOV (Out-of-Vocabulary) Words
o Unseen words can be mapped to existing classes based on similarity.

 Class-based models are useful when:

 Data is limited.
 Generalization and efficiency are key.

 Less commonly used today due to the rise of deep learning models, but still relevant in low-
resource settings.

 Variable-Length Language Models

Definition

 Variable-length language models handle input sequences of varying lengths, unlike


traditional fixed-length models (e.g., n-gram models).

Useful for tasks where input/output length varies:

 Machine Translation
 Text Summarization
 Speech Recognition

Modeling Approaches

1. Recurrent Neural Networks (RNNs)


o Use a hidden state updated at each time step.
o Can model sequences regardless of length.
o Capture word dependencies across time.
2. Transformer-Based Models
o Use self-attention instead of recurrence.
o Better at modeling long-range dependencies.
o Also support variable-length input and output.
Evaluation Metrics

 Perplexity:
o Measures how well a model predicts the next word.
o Lower perplexity = better model performance.
 BLEU Score:
o Common in machine translation.
o Compares generated output with reference translations

 Discriminative Language Models

Definition

 Focus on modeling the conditional probability


P(output∣input)P(\text{output}|\text{input})P(output∣input), unlike generative models
that model joint probability P(input,output)P(\text{input}, \text{output})P(input,output).
 Aim to learn a direct mapping from input to output.

Common Tasks

 Text Classification
 Sequence Labeling (e.g., Named Entity Recognition, POS tagging)
 Machine Translation

 Conditional Random Fields (CRFs)

 Probabilistic graphical model for sequence labeling.


 Model conditional probability of output sequence given input.
 Capture dependencies between neighboring output labels.

 Neural Networks

 Feedforward Neural Networks


 Convolutional Neural Networks (CNNs)
 Recurrent Neural Networks (RNNs)
 Suitable for a broad range of NLP tasks.

Evaluation Metrics

 Accuracy
 F1 Score
 Area Under ROC Curve (AUCROC)
 Metric choice depends on task and data characteristics.
 Syntax-Based Language Models

Definition

 Language models that incorporate syntactic information (sentence structure) in addition


to word sequences.
 Model probabilities of syntactic structures (e.g., noun phrases, verb phrases) rather than
just word sequences.

Traditional vs Syntax-Based Models

 Traditional models (n-gram, neural) focus on word sequences.


 Syntax-based models focus on sentence structure.

 Context-Free Grammars (CFGs)

 Represent syntactic structure with production rules.


 Assign probabilities to rules based on training data.
 Generate sentences by recursively applying these rules.

 Dependency Trees

 Model relationships between words (e.g., subject-verb).


 Assign probabilities to entire trees based on training data.
 Use these trees to generate sentences.

Applications

 Text Generation
 Machine Translation
 Question Answering

 Tree-Based Language Models

Definition

 Use tree structures to represent syntactic and/or semantic relationships between words
in a sentence.
 Capture hierarchical and relational information beyond just sequences.

Types of Tree-Based Language Models

1. Semantic Role Labeling (SRL) Models


o Identify semantic roles: subject, object, verb, etc.
o Build trees showing relationships between words and their roles.
o Useful for understanding meaning in sentences.
2. Discourse Parsing Models
o Analyze the structure of discourse (relations between sentences/paragraphs).
o Use trees to represent discourse organization.
o Applied in summarization, information extraction.
3. Dependency Parsing Models
o Identify grammatical relationships (e.g., subject-verb, object-verb).
o Use trees to show dependencies between words.
o Useful for machine translation, sentiment analysis.
4. Constituent Parsing Models
o Identify constituent structures like phrases and clauses.
o Use hierarchical trees representing sentence structure.
o Applied in text generation, summarization.

 Neural Network Language Models

You might also like