Identifying Sarcastic and Non-Sarcastic
News Headlines
Mohd. Shaquib Khan
B.Tech 5th Sem.
Dept. of CSE
[email protected]
Ashish Tiwari
Assistant Professor
Dept. of CSE
[email protected]
Abstract: Sarcasm has been a tricky concept for people to grasp. Because of its unique language features, the
study of sarcasm detection has become popular among researchers in Natural Language Processing (NLP) in
recent years. However, predicting sarcasm in text is still a challenging task for machines, and we have limited
knowledge about what makes a sentence sarcastic. Previous research on sarcasm detection has mostly used large
datasets collected with tag-based supervision or small datasets that were manually annotated. The first type of
dataset is often messy when it comes to labels and language, while the second type lacks enough examples to train
deep learning models effectively, even though the labels are high-quality. To overcome the existing limitations,
we present an extensive and high-quality dataset composed of headlines sourced from both a satirical news outlet
and a traditional news platform. We detail the distinct characteristics of this dataset and conduct a comparative
analysis with other well-known datasets used in sarcasm detection research. Furthermore, we explore the linguistic
features that typically indicate sarcasm, utilizing a Hybrid Neural Network model. Since its initial release in 2019,
our work has been widely referenced in the NLP community, contributing significantly to the progress in detection
of sarcasm. To encourage further study, we have made both the dataset and the implementation framework publicly
accessible.
Key Terms: News headlines dataset; Deep Learning; Sarcasm detection.