0% found this document useful (0 votes)

30 views5 pages

Discourse Segmentation

Discourse segmentation is the process of dividing text into coherent units that enhance understanding and communication, crucial for applications like text summarization, question answering, and dialogue systems. Techniques for discourse segmentation include the TextTiling algorithm and machine learning approaches, which utilize various features such as lexical and syntactic cues. The importance of coherent sequences in discourse analysis lies in their ability to maintain clarity, logical order, and topic unity, facilitating better comprehension and information retrieval.

Uploaded by

ujjualrajesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views5 pages

Discourse Segmentation

Uploaded by

ujjualrajesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Discourse Segmentation

1. Introduction to Discourse and Discourse Structure

Discourse refers to coherent sequences of sentences or utterances that form meaningful

communication beyond individual sentences. In computational linguistics and NLP, discourse
analysis focuses on the structure and organization of text or speech in extended communication
such as articles, conversations, and narratives.

A discourse structure organizes text into meaningful units or segments—such as paragraphs,

sections, or dialogue turns—each contributing to the overall communicative goal. These
segments are often related through coherence relations (e.g., cause-effect, contrast,
elaboration).

2. Importance of Segmenting Text into Coherent Units

Discourse segmentation is the process of dividing text into coherent segments, each typically
focusing on a single topic, subtopic, or communicative intention. It's critical for:

● Text Summarization: Extracting the most relevant content from each coherent unit.

● Question Answering (QA): Locating specific segments related to a question.

● Dialogue Systems: Understanding topic shifts or turns in a conversation.

● Information Retrieval: Improving relevance by segment-aware indexing.

● Sentiment Analysis: Detecting sentiment changes across different segments.

Example: In a news article discussing an election, segmentation can separate parts covering
candidate background, polling data, public opinion, and campaign events—enabling better
content understanding and retrieval.

3. Techniques and Algorithms for Discourse Segmentation

A. TextTiling Algorithm

Developed by Marti Hearst (1997), TextTiling is a pioneering unsupervised method for

segmenting expository text into topically coherent blocks.
Working Principle:

● The text is divided into token sequences (pseudo-sentences).

● For each pair of adjacent blocks, the cosine similarity of their word distributions is
computed.

● Valleys (drops) in similarity scores indicate potential segment boundaries.

Advantages:

● Language-independent and unsupervised.

● Works well for structured texts like essays or reports.

Real-time Example:

● Used in document summarizers to identify thematic units before summary extraction.

● Educational software uses it to segment chapters or lessons for adaptive learning.

B. Machine Learning Approaches

With the availability of annotated corpora, supervised machine learning techniques have
become popular for discourse segmentation.

Features Used:

● Lexical cues: Discourse markers like “however”, “on the other hand”, “furthermore”.

● Syntactic features: Part-of-speech tags, sentence lengths, punctuation.

● Semantic cues: Word embeddings or BERT-like contextual embeddings.

● Topic modeling: LDA or clustering to detect topic shifts.

Algorithms:

● SVMs, Decision Trees: Classical ML methods on structured features.

● CRFs (Conditional Random Fields): Useful for sequential segmentation tasks.

● Neural models (BiLSTM, BERT): Contextual deep learning models fine-tuned on

discourse data.

Real-time Example:

● In customer service chatbots, ML-based discourse segmentation helps detect new

intents or issues when a user shifts topics mid-conversation.

● In legal document analysis, ML models segment contracts into clauses (e.g., payment
terms, liability, termination).

4. Applications of Discourse Segmentation

A. Text Summarization

● Identifies relevant segments that represent key points across the document.

● Prevents inclusion of disjoint or off-topic sentences in summaries.

Example: News summarizers segment reports to isolate important sections like quotes, events,
and statistics before generating a summary.

B. Question Answering (QA)

● Narrows the search space by directing the QA system to specific discourse segments.

● Increases the precision of answer retrieval.

Example: In open-domain QA systems like Google’s passage-based search, discourse

segmentation improves locating relevant answer-containing passages.

C. Dialogue Systems

● Detects when speakers switch topics or intentions.

● Helps maintain coherence in multi-turn conversations.

Example: Virtual assistants (e.g., Siri, Alexa) segment user input into topics (e.g., “weather,”
“calendar”) to understand and respond appropriately across interactions.

Coherent sequences of sentences refer to a group of sentences that are logically connected
and flow smoothly together to express a unified idea, topic, or theme. In a coherent sequence,
each sentence relates meaningfully to the others, maintaining clarity and continuity throughout
the passage.

Key Characteristics of Coherent Sequences:

1. Logical Order: Ideas are presented in a logical progression (e.g., cause-effect,

chronological order).

2. Consistency: Maintains consistent subject, tense, and point of view.

3. Reference and Linkage: Uses devices like pronouns, conjunctions, and transition
words to link sentences (e.g., "however," "because," "this," "such as").

4. Topic Unity: All sentences focus on a single theme or central idea.

5. Smooth Transitions: There are no abrupt shifts in topic or structure.

Example of a Coherent Sequence:

"Air pollution is a growing concern in urban areas. Vehicles and industrial emissions
are the primary sources. To combat this, cities are investing in cleaner
transportation systems and stricter emission regulations. These efforts aim to
improve air quality and public health."

● Each sentence builds on the previous one.

● There is clear topic unity (air pollution).

● Pronouns like "this" and connectors like "to combat this" create cohesion.
In NLP and Discourse Analysis:

In computational linguistics, recognizing coherent sequences helps machines:

● Understand text structure

● Segment discourse meaningfully

● Summarize or answer questions accurately

Let me know if you’d like examples of incoherent sequences for contrast!

NLP Module 5
No ratings yet
NLP Module 5
53 pages
Unit V Discourse Analysis and Lexical Resources
100% (2)
Unit V Discourse Analysis and Lexical Resources
14 pages
NLP Unit V Notes
No ratings yet
NLP Unit V Notes
21 pages
Unit 5
No ratings yet
Unit 5
13 pages
100 NLP Questions
100% (6)
100 NLP Questions
23 pages
Discourse Processing (NLP)
No ratings yet
Discourse Processing (NLP)
16 pages
NLP UNIT 5 Part A
No ratings yet
NLP UNIT 5 Part A
40 pages
Unit V Expert Systems Notes
No ratings yet
Unit V Expert Systems Notes
15 pages
Lecture 05
No ratings yet
Lecture 05
18 pages
Discourse and Pragmatic Processing
No ratings yet
Discourse and Pragmatic Processing
15 pages
NLP Unit Class Notes
No ratings yet
NLP Unit Class Notes
14 pages
NLP Unit V Notes
100% (1)
NLP Unit V Notes
21 pages
Zlib - Pub - Natural Language Processing For Social Media
No ratings yet
Zlib - Pub - Natural Language Processing For Social Media
221 pages
Discourse Segmentation & Coherence
No ratings yet
Discourse Segmentation & Coherence
66 pages
Automatic Slide Generation Based On Discourse Structure Analysis
No ratings yet
Automatic Slide Generation Based On Discourse Structure Analysis
13 pages
Cognizant Hackathon Team 22
No ratings yet
Cognizant Hackathon Team 22
25 pages
NLP Unit5 Discourse and Lexical Resources Elaborated
No ratings yet
NLP Unit5 Discourse and Lexical Resources Elaborated
4 pages
NLP 5
No ratings yet
NLP 5
5 pages
The Ultimate Guide To Prompt Engineering From Beginner To Expert Free Resources Hands-On Practice With Practical Examples (Yadav, Chandradev) (Z-Library)
100% (1)
The Ultimate Guide To Prompt Engineering From Beginner To Expert Free Resources Hands-On Practice With Practical Examples (Yadav, Chandradev) (Z-Library)
76 pages
Discourse and Pragmatic Processing: Natural Language Processing (CSE 5321)
100% (1)
Discourse and Pragmatic Processing: Natural Language Processing (CSE 5321)
18 pages
NLP Unit-5.1 Notes
No ratings yet
NLP Unit-5.1 Notes
26 pages
DS ASSign 1
No ratings yet
DS ASSign 1
6 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
14 pages
Unit - 5
No ratings yet
Unit - 5
21 pages
Unit VI
No ratings yet
Unit VI
45 pages
CAI-364 NaturalLanguageProcessing 05
No ratings yet
CAI-364 NaturalLanguageProcessing 05
30 pages
Disclosure
No ratings yet
Disclosure
7 pages
Wellner Dissertation
No ratings yet
Wellner Dissertation
227 pages
WIREs Data Min Knowl - 2019 - Ferreira Mello - Text Mining in Education
No ratings yet
WIREs Data Min Knowl - 2019 - Ferreira Mello - Text Mining in Education
49 pages
NLPQB2
No ratings yet
NLPQB2
8 pages
Transforming Science With Large Language Models: A Survey On AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation
No ratings yet
Transforming Science With Large Language Models: A Survey On AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation
44 pages
Coherence Relationship
No ratings yet
Coherence Relationship
27 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
57 pages
Atural Anguage Rocessing: Chandra Prakash LPU
No ratings yet
Atural Anguage Rocessing: Chandra Prakash LPU
59 pages
NLP Discourse Analysis Guide
No ratings yet
NLP Discourse Analysis Guide
66 pages
Deep Reinforcement Learning For Unsupervised Video Summarization WithDiversity-Representativeness Reward
No ratings yet
Deep Reinforcement Learning For Unsupervised Video Summarization WithDiversity-Representativeness Reward
9 pages
Unit-5 Aim 502
No ratings yet
Unit-5 Aim 502
7 pages
Automatic Evaluation of Text Coherence: Models and Representations
No ratings yet
Automatic Evaluation of Text Coherence: Models and Representations
6 pages
Module 5
No ratings yet
Module 5
27 pages
1 - 4. An Approach For Video Summarization Based On Unsupervised Learning Using Deep Semantic Features and Keyframe Extraction
No ratings yet
1 - 4. An Approach For Video Summarization Based On Unsupervised Learning Using Deep Semantic Features and Keyframe Extraction
8 pages
Unit 7 - Pragmatics, Discourse, Dialogue, and Natural Language Generation
No ratings yet
Unit 7 - Pragmatics, Discourse, Dialogue, and Natural Language Generation
17 pages
ATSSI Abstractive Text Summarization Using Sentiment Infusion
No ratings yet
ATSSI Abstractive Text Summarization Using Sentiment Infusion
7 pages
Computer Vision Clustering Guide
No ratings yet
Computer Vision Clustering Guide
41 pages
Developing Algorithms For Discourse Segmentation: Diane J. Litman
No ratings yet
Developing Algorithms For Discourse Segmentation: Diane J. Litman
7 pages
Automatic Evaluation of Text Coherence: Models and Representations
No ratings yet
Automatic Evaluation of Text Coherence: Models and Representations
6 pages
Carlson 2001
No ratings yet
Carlson 2001
10 pages
Discourse Structure and Algorithms For Segmentation
No ratings yet
Discourse Structure and Algorithms For Segmentation
6 pages
Tech Students' News Summarization
No ratings yet
Tech Students' News Summarization
42 pages
TextRank Variations for Summarization
No ratings yet
TextRank Variations for Summarization
8 pages
Unit 5
No ratings yet
Unit 5
26 pages
Introduction
No ratings yet
Introduction
49 pages
Airline Reviews Processing Abstractive Summari 2024 International Journal o
No ratings yet
Airline Reviews Processing Abstractive Summari 2024 International Journal o
22 pages
QRC - Workstation 9.0 All About Processing August 25, 2021
No ratings yet
QRC - Workstation 9.0 All About Processing August 25, 2021
4 pages
NLP Unit 1 Notes
No ratings yet
NLP Unit 1 Notes
5 pages
Survery of LLMs For Financial Applications 2024
No ratings yet
Survery of LLMs For Financial Applications 2024
39 pages
Farag Et Al. - 2020 - Analyzing Neural Discourse Coherence Models
No ratings yet
Farag Et Al. - 2020 - Analyzing Neural Discourse Coherence Models
11 pages
Sanders&Noormand 2000
No ratings yet
Sanders&Noormand 2000
25 pages
Coherent Incoherent Text PDF
No ratings yet
Coherent Incoherent Text PDF
8 pages
Unit 5 NLP
No ratings yet
Unit 5 NLP
13 pages
Discourse Linguistics Overview
No ratings yet
Discourse Linguistics Overview
41 pages
Natural Language Processing ..
No ratings yet
Natural Language Processing ..
20 pages
Final Defense Report-4
No ratings yet
Final Defense Report-4
29 pages
LetSum, An Automatic Text Summarization System in Law Field
No ratings yet
LetSum, An Automatic Text Summarization System in Law Field
6 pages
NLP Vi6
No ratings yet
NLP Vi6
11 pages
Automatic Text Summarization Using Python
No ratings yet
Automatic Text Summarization Using Python
8 pages
Literature Review Report
No ratings yet
Literature Review Report
24 pages
NLP Notes Unit-3
No ratings yet
NLP Notes Unit-3
19 pages
Thu 2020
100% (1)
Thu 2020
6 pages
Computational Linguistics Overview
No ratings yet
Computational Linguistics Overview
14 pages
S4-Enhancing Unsupervised Neural Networks Based Text Summarization With Word Embedding and Ensemble Learning
No ratings yet
S4-Enhancing Unsupervised Neural Networks Based Text Summarization With Word Embedding and Ensemble Learning
17 pages
NLP Exam Questions 2023-24
No ratings yet
NLP Exam Questions 2023-24
5 pages
Unit 1
No ratings yet
Unit 1
23 pages
Lecture 05
No ratings yet
Lecture 05
18 pages
Concept of Coherence: Coherence Relation Between Utterances
No ratings yet
Concept of Coherence: Coherence Relation Between Utterances
5 pages
Visvesvaraya Technological University: "Jnana Sangama", Belagavi-590018, Karnataka
No ratings yet
Visvesvaraya Technological University: "Jnana Sangama", Belagavi-590018, Karnataka
12 pages
Applications of NLP
No ratings yet
Applications of NLP
4 pages
Adaca 2012
No ratings yet
Adaca 2012
97 pages
T5 Model: NLP Applications & Insights
No ratings yet
T5 Model: NLP Applications & Insights
10 pages
Ml6team - Keyphrase-Generation-Keybart-Inspec Hugging Face
No ratings yet
Ml6team - Keyphrase-Generation-Keybart-Inspec Hugging Face
9 pages
NLP & ML for Automated Content Tagging
No ratings yet
NLP & ML for Automated Content Tagging
10 pages
Rhetorical Tropes in Political Discourse: Relevant Website
No ratings yet
Rhetorical Tropes in Political Discourse: Relevant Website
101 pages
Ai Data Science in Ms Word
No ratings yet
Ai Data Science in Ms Word
1 page
Video Transcription and Summarization Using NLP
No ratings yet
Video Transcription and Summarization Using NLP
5 pages
Discourse Analysis
No ratings yet
Discourse Analysis
8 pages
Discourse Linguistics: Discourse Structure Text Coherence and Cohesion Reference Resolution
100% (1)
Discourse Linguistics: Discourse Structure Text Coherence and Cohesion Reference Resolution
41 pages
NLP Unit 4,5
No ratings yet
NLP Unit 4,5
20 pages

Discourse Segmentation

Uploaded by

Discourse Segmentation

Uploaded by

Discourse Segmentation

1. Introduction to Discourse and Discourse Structure

Discourse refers to coherent sequences of sentences or utterances that form meaningful

A discourse structure organizes text into meaningful units or segments—such as paragraphs,

2. Importance of Segmenting Text into Coherent Units

● Question Answering (QA): Locating specific segments related to a question.

● Dialogue Systems: Understanding topic shifts or turns in a conversation.

● Information Retrieval: Improving relevance by segment-aware indexing.

● Sentiment Analysis: Detecting sentiment changes across different segments.

3. Techniques and Algorithms for Discourse Segmentation

Developed by Marti Hearst (1997), TextTiling is a pioneering unsupervised method for

● The text is divided into token sequences (pseudo-sentences).

● Valleys (drops) in similarity scores indicate potential segment boundaries.

● Language-independent and unsupervised.

● Works well for structured texts like essays or reports.

● Used in document summarizers to identify thematic units before summary extraction.

● Educational software uses it to segment chapters or lessons for adaptive learning.

B. Machine Learning Approaches

● Syntactic features: Part-of-speech tags, sentence lengths, punctuation.

● Semantic cues: Word embeddings or BERT-like contextual embeddings.

● Topic modeling: LDA or clustering to detect topic shifts.

● SVMs, Decision Trees: Classical ML methods on structured features.

● Neural models (BiLSTM, BERT): Contextual deep learning models fine-tuned on

● In customer service chatbots, ML-based discourse segmentation helps detect new

4. Applications of Discourse Segmentation

● Prevents inclusion of disjoint or off-topic sentences in summaries.

B. Question Answering (QA)

● Increases the precision of answer retrieval.

Example: In open-domain QA systems like Google’s passage-based search, discourse

● Detects when speakers switch topics or intentions.

● Helps maintain coherence in multi-turn conversations.

Key Characteristics of Coherent Sequences:

1. Logical Order: Ideas are presented in a logical progression (e.g., cause-effect,

2. Consistency: Maintains consistent subject, tense, and point of view.

4. Topic Unity: All sentences focus on a single theme or central idea.

5. Smooth Transitions: There are no abrupt shifts in topic or structure.

Example of a Coherent Sequence:

● Each sentence builds on the previous one.

● There is clear topic unity (air pollution).

In computational linguistics, recognizing coherent sequences helps machines:

● Understand text structure

● Segment discourse meaningfully

● Summarize or answer questions accurately

Let me know if you’d like examples of incoherent sequences for contrast!

You might also like