1.
Discourse Processing Definition
Discourse processing studies how meaning is conveyed across multiple sentences or
paragraphs.
It focuses on how sentences relate and contribute to the overall structure and meaning of
a text.
Key Tasks in Discourse Processing
Coreference Resolution
Definition: Identifying when different words refer to the same entity in a text.
Example:
"John went to the store. He bought some bread."
→ "He" refers to "John".
Discourse Segmentation
Definition: Dividing a text into meaningful discourse units (e.g., sentences or topics).
Example:
Paragraphs in an article about climate change may be segmented into sections like causes,
effects, and solutions.
Text Coherence
Definition: The logical flow and understandability of a text.
Features:
o Clear topic progression.
o Use of discourse markers (e.g., however, therefore, in contrast).
Example:
A coherent essay would connect ideas clearly from the introduction to the conclusion
using logical transitions.
Text Classification
Definition: Categorizing texts based on content or purpose.
Applications:
o Sentiment Analysis: Classify text as positive, neutral, or negative.
o Spam Filtering: Identify emails as spam or not spam.
o Topic Modeling: Label articles as sports, politics, entertainment, etc.
Example:
A tweet saying "The game last night was incredible!" may be classified as positive
sentiment and sports-related.
Applications of Discourse Processing
Machine Translation: Maintains context across sentences.
Text Summarization: Ensures coherence in condensed versions.
Sentiment Analysis: Improves accuracy by considering the full context of a document.
2. Cohension
Definition of Cohesion
Cohesion refers to the linguistic devices used to connect parts of a text.
It creates a sense of unity and flow in a text, helping readers follow the intended
meaning.
Cohesion vs. Coherence
Coherence = overall clarity and logical structure of the text.
Cohesion = linguistic links (words and grammar) that tie text parts together.
Examples of Cohesive Devices
Pronouns: he, she, it
Conjunctions: and, but, or
Adverbs: however, therefore
Lexical repetition: Repeating the same or related words
Types of Cohesive Devices
Type Description Example
Refers back to something previously
Reference "John saw a dog. It was brown."
mentioned
Replaces a word or phrase with a substitute "John saw a dog. The animal was
Substitution
(e.g., synonym or pronoun) brown."
Omits words that are understood from "John ate pizza for dinner, and
Ellipsis
context Mary pasta." (omits "ate")
Connects clauses or sentences using words "John went to the store, and he
Conjunction
like and, but, or bought some bread."
Lexical Links sentences through repeated or related "John drove his car. The vehicle
Cohesion vocabulary was new."
Purpose of Cohesion
Ensures the text reads smoothly.
Helps readers understand relationships between ideas.
Makes the text more engaging and readable.
3. Reference Resolution
Definition
Reference resolution is the process of identifying which entity a word (typically a
pronoun or noun phrase) refers to in a text.
Importance in NLP
Crucial for understanding meaning and relationships between entities in sentences or
paragraphs.
Supports many NLP applications like:
o Machine Translation
o Text Summarization
o Question Answering
What is an Antecedent?
An antecedent is the word or phrase to which a pronoun or noun phrase refers.
Example:
"John saw a dog. It was brown."
→ "It" refers to "dog" → "dog" is the antecedent.
Types of Reference Resolution
Type Description Example
Anaphora Refers backward to a previously mentioned "John saw a dog. It was
Resolution noun (antecedent comes first) brown." ("it" → "dog")
Cataphora Refers forward to a noun that appears later "When he saw the dog, John ran
Resolution (antecedent comes after the pronoun) away." ("he" → "John")
Challenges in Reference Resolution
Requires contextual understanding.
Pronouns can be ambiguous or refer to different entities in different situations.
Complex in long texts or texts with multiple possible referents.
Applications
Improves performance in:
o Machine Translation (e.g., ensuring pronouns are translated correctly)
o Text Summarization (e.g., keeping track of who did what)
o Question Answering (e.g., resolving "Who is he?" correctly)
4. Discourse Cohension and Structure
Definition of Discourse Cohesion
Refers to how parts of a text are linguistically connected.
Uses cohesive devices like:
o Pronouns (e.g., he, she, it)
o Conjunctions (e.g., and, but, because)
o Lexical repetition (e.g., repeating or using related words)
o Cohesive markers (e.g., however, therefore)
Purpose of Discourse Cohesion
Creates unity and flow within a text.
Helps readers understand relationships between ideas.
Supports overall text coherence.
Definition of Discourse Structure
Refers to the organization and arrangement of ideas in a text.
Includes structural elements like:
o Headings and subheadings
o Paragraphs
o Sections or thematic divisions
Purpose of Discourse Structure
Guides the reader through the text.
Makes content easier to navigate and comprehend.
Enhances clarity and contributes to coherence.
Combined Importance
Cohesion + Structure = Clear, coherent, and memorable communication.
Aids both written and spoken language understanding.
Applications in NLP
Crucial for:
o Text Summarization
o Question Answering
o Text Classification
Helps machines understand how ideas are connected and how information is
structured.
5. n-Gram Models
Definition
n-gram models are statistical language models used to predict the next word in a
sequence based on the previous n−1 words.
What is an n-Gram?
An n-gram is a sequence of n consecutive words or characters in a text.
Examples:
o Unigram (1-gram): "dog"
o Bigram (2-gram): "the dog"
o Trigram (3-gram): "the dog barked"
Markov Assumption
Assumes that the probability of a word depends only on the previous (n−1) words, not
the entire sentence.
This simplifies computation but limits context understanding.
Training and Estimation
Trained on large text corpora.
Uses Maximum Likelihood Estimation (MLE) or other statistical methods to estimate
word probabilities.
Applications
Widely used in various NLP tasks:
o Speech Recognition
o Machine Translation
o Text Classification
o Spelling Correction
Baseline Model
n-gram models are often used as baseline models to compare against more complex
models (e.g., neural networks).
Limitations
Short context window: Limited to (n−1) previous words.
Fails to capture long-range dependencies in text.
Data sparsity: Large n-grams may be rare in the training data.
Alternatives to n-Gram Models
Advanced models that handle longer context and semantics:
o Recurrent Neural Networks (RNNs)
o Long Short-Term Memory (LSTM)
o Transformer-based models (e.g., BERT, GPT)
n-Gram Type Example Context Used
Unigram (1-gram) "dog" —
Bigram (2-gram) "the dog" 1 previous word
Trigram (3-gram) "the dog barked" 2 previous words
6. Language Model Evaluation
Purpose of Evaluation
Measures how well a language model performs on specific language tasks or datasets.
Evaluates the model’s ability to:
o Predict the next word
o Generate coherent and relevant text
o Perform NLP tasks like translation or summarization
Evaluation Methods
Method Description Example Metrics
Measures how well a model predicts the next word.
1. Perplexity Perplexity score
Lower = better performance.
2. Human Human judges rate output for fluency, coherence, Ratings or qualitative
Evaluation and relevance. feedback
3. Task-Specific Evaluates model performance on tasks like Accuracy, precision,
Evaluation translation, summarization, or sentiment analysis. recall, F1-score
4. Diversity & Assesses how varied and original the generated text Distinct-n, novelty
Novelty is. scores
Key Metrics Explained
Perplexity:
Lower perplexity = model better predicts next word.
→ Example: A model with perplexity 25 is better than one with 60 on the same dataset.
Accuracy / Precision / Recall:
Used in classification-based tasks like sentiment analysis.
Human Ratings:
Useful for creative tasks like story generation or dialogue systems.
Diversity Metrics:
Evaluate whether the output is not repetitive and shows novel patterns.
Importance of Appropriate Evaluation
No single metric works for all tasks.
Choose evaluation methods that match the specific application, such as:
o BLEU / ROUGE for machine translation and summarization
o F1-score for classification tasks
o Perplexity for predictive language modeling
Ongoing Research
Evaluation is an active research area as models become more complex.
New metrics are being developed to better assess:
o Context understanding
o Fairness
o Bias
o Factual correctness
7. Parameter Estimation
Definition
Parameter estimation is the process of determining the best values for model
parameters based on observed data.
Essential in training NLP models like:
o Language Models
o Part-of-Speech Taggers
o Named Entity Recognition (NER) systems
Objective
Find parameter values that maximize the likelihood of the observed data.
This process typically relies on a training corpus (annotated text data).
Common Methods of Parameter Estimation
Method Description
1. Maximum Likelihood Estimates parameters by maximizing the likelihood of the
Estimation (MLE) observed data.
Uses prior distributions and updates them using Bayes'
2. Bayesian Estimation
theorem.
Method Description
Combines data-driven and prior-based approaches using
3. Empirical Bayes
hierarchical models.
Types of Parameter Estimation
1. Maximum-Likelihood Estimation and Smoothing
o Commonly used with n-gram models.
o Smoothing (e.g., Laplace, Good-Turing) helps address zero-probability issues.
2. Bayesian Parameter Estimation
o Incorporates uncertainty and prior knowledge.
o Useful when data is limited or noisy.
3. Large-Scale Language Models
o Use millions or billions of parameters.
o Require massive datasets and advanced optimization algorithms.
Steps in Parameter Estimation
1. Preprocessing: Clean and format the input text data.
2. Model Selection: Choose an appropriate architecture (e.g., CRF, transformer, HMM).
3. Objective Function: Define a loss function (e.g., cross-entropy) that reflects prediction
accuracy.
4. Optimization Algorithm: Use algorithms to minimize loss and estimate parameters.
8. Language Model Adaptation
Definition and Purpose
Language model adaptation refers to fine-tuning a pre-trained language model on a
specific domain or task using a small amount of task-specific data.
Enhances model performance by capturing domain-specific vocabulary and linguistic
patterns.
Approach
Most common method: Transfer Learning.
o Start with pre-trained weights.
o Fine-tune on the specific task/domain.
Typically:
o Final layers are updated (task-specific).
o Lower-level layers are kept fixed (general language understanding).
Advantages
1. Improved Task Performance
o Better understanding of domain-specific data.
2. Reduced Training Time & Resources
o Leverages existing models, requiring less new data and compute.
3. Better Handling of Rare/OOV Words
o Pre-trained models already cover a broad vocabulary.
Applications
Sentiment Analysis
Text Classification
Named Entity Recognition (NER)
Machine Translation
9. Types of Language Models
1. N-gram Models
Predict the next word using the previous n-1 words.
Common types:
o Bigram: uses 1 previous word.
o Trigram: uses 2 previous words.
✅ Simple and fast;
❌Struggles with long-range dependencies.
2. Neural Network Models
Use deep learning to model word sequences.
Can learn complex relationships between words.
Trained on large datasets.
✅More accurate than n-grams;
❌Requires more data and computation.
3. Transformer-based Models
Example: GPT, BERT.
Use self-attention to capture long-range dependencies.
Achieve state-of-the-art results on many NLP tasks.
✅Best performance on diverse NLP tasks;
❌Very resource-intensive.
4. Probabilistic Graphical Models
Represent word relationships as a graph of dependencies.
Use statistical relationships to predict word sequences.
✅Useful in structured prediction tasks;
❌Less common today due to deep learning's rise.
5. Rule-based Models
Use predefined linguistic rules.
Effective in highly structured domains (e.g., legal, medical).
✅Precise in narrow domains;
❌Not generalizable or flexible.
10. Language Models
1 Class-Based Language Models
2 Variable-Length Language Models
3 Discriminative Language Models
4 Syntax-Based Language Models
5 MaxEnt Language Models
6 Factored Language Models
7 Other Tree-Based Language Models
8 Bayesian Topic-Based Language Models
9 Neural Network Language Models
Class-Based Language Models
Definition
Class-based language models are probabilistic models that group words into classes
based on their distributional similarity.
They estimate the probability of a word given its class rather than the word itself.
Purpose
Reduce sparsity in language modeling.
Improve data efficiency and generalization, especially with limited data.
Steps in Building a Class-Based Language Model
1. Word Clustering
o Words are clustered using algorithms like k-means or hierarchical clustering.
2. Class Construction
o Assign class labels to each cluster.
o Number of classes may be predefined or adaptive.
3. Probability Estimation
oEstimate P(word∣class)P(\text{word}|\text{class})P(word∣class) using maximum
likelihood or Bayesian methods.
4. Language Modeling
o Build a model using these probabilities to predict sequences of words.
Advantages
1. Reduced Sparsity
o Fewer parameters to estimate → improved model accuracy.
2. Improved Data Efficiency
o Less training data needed compared to word-level models.
3. Better Handling of OOV (Out-of-Vocabulary) Words
o Unseen words can be mapped to existing classes based on similarity.
Class-based models are useful when:
Data is limited.
Generalization and efficiency are key.
Less commonly used today due to the rise of deep learning models, but still relevant in low-
resource settings.
Variable-Length Language Models
Definition
Variable-length language models handle input sequences of varying lengths, unlike
traditional fixed-length models (e.g., n-gram models).
Useful for tasks where input/output length varies:
Machine Translation
Text Summarization
Speech Recognition
Modeling Approaches
1. Recurrent Neural Networks (RNNs)
o Use a hidden state updated at each time step.
o Can model sequences regardless of length.
o Capture word dependencies across time.
2. Transformer-Based Models
o Use self-attention instead of recurrence.
o Better at modeling long-range dependencies.
o Also support variable-length input and output.
Evaluation Metrics
Perplexity:
o Measures how well a model predicts the next word.
o Lower perplexity = better model performance.
BLEU Score:
o Common in machine translation.
o Compares generated output with reference translations
Discriminative Language Models
Definition
Focus on modeling the conditional probability
P(output∣input)P(\text{output}|\text{input})P(output∣input), unlike generative models
that model joint probability P(input,output)P(\text{input}, \text{output})P(input,output).
Aim to learn a direct mapping from input to output.
Common Tasks
Text Classification
Sequence Labeling (e.g., Named Entity Recognition, POS tagging)
Machine Translation
Conditional Random Fields (CRFs)
Probabilistic graphical model for sequence labeling.
Model conditional probability of output sequence given input.
Capture dependencies between neighboring output labels.
Neural Networks
Feedforward Neural Networks
Convolutional Neural Networks (CNNs)
Recurrent Neural Networks (RNNs)
Suitable for a broad range of NLP tasks.
Evaluation Metrics
Accuracy
F1 Score
Area Under ROC Curve (AUCROC)
Metric choice depends on task and data characteristics.
Syntax-Based Language Models
Definition
Language models that incorporate syntactic information (sentence structure) in addition
to word sequences.
Model probabilities of syntactic structures (e.g., noun phrases, verb phrases) rather than
just word sequences.
Traditional vs Syntax-Based Models
Traditional models (n-gram, neural) focus on word sequences.
Syntax-based models focus on sentence structure.
Context-Free Grammars (CFGs)
Represent syntactic structure with production rules.
Assign probabilities to rules based on training data.
Generate sentences by recursively applying these rules.
Dependency Trees
Model relationships between words (e.g., subject-verb).
Assign probabilities to entire trees based on training data.
Use these trees to generate sentences.
Applications
Text Generation
Machine Translation
Question Answering
Tree-Based Language Models
Definition
Use tree structures to represent syntactic and/or semantic relationships between words
in a sentence.
Capture hierarchical and relational information beyond just sequences.
Types of Tree-Based Language Models
1. Semantic Role Labeling (SRL) Models
o Identify semantic roles: subject, object, verb, etc.
o Build trees showing relationships between words and their roles.
o Useful for understanding meaning in sentences.
2. Discourse Parsing Models
o Analyze the structure of discourse (relations between sentences/paragraphs).
o Use trees to represent discourse organization.
o Applied in summarization, information extraction.
3. Dependency Parsing Models
o Identify grammatical relationships (e.g., subject-verb, object-verb).
o Use trees to show dependencies between words.
o Useful for machine translation, sentiment analysis.
4. Constituent Parsing Models
o Identify constituent structures like phrases and clauses.
o Use hierarchical trees representing sentence structure.
o Applied in text generation, summarization.
Neural Network Language Models