Abstract
Purpose
Social networks have been developed as a great point for its users to
communicate with their interested friends and share their opinions,
photos, and videos reflecting their moods, feelings and sentiments.
This creates an opportunity to analyze social network data for user’s
feelings and sentiments to investigate their moods and attitudes when
they are communicating via these online tools.
Methods
Although diagnosis of depression using social networks data has
picked an established position globally, there are several dimensions
that are yet to be detected. In this study, we aim to perform
depression analysis on Facebook data collected from an online public
source. To investigate the effect of depression detection, we propose
machine learning technique as an efficient and scalable method.
Results
We report an implementation of the proposed method. We have
evaluated the efficiency of our proposed method using a set of various
psycholinguistic features. We show that our proposed method can
significantly improve the accuracy and classification error rate. In
addition, the result shows that in different experiments Decision Tree
(DT) gives the highest accuracy than other ML approaches to find the
depression.
Conclusions
Machine learning techniques identify high quality solutions of mental
health problems among Facebook users.
Keywords: Social network, Emotions, Depression, Sentiment analysis
Go to:
Introduction
The proliferations of internet and communication technologies,
especially the online social networks have rejuvenated how people
interact and communicate with each other electronically. The
applications such as Facebook, Twitter, Instagram and alike not only
host the written and multimedia contents but also offer their users to
express their feelings, emotions and sentiments about a topic, subject
or an issue online. On one hand, this is great for users of social
networking site to openly and freely contribute and respond to any
topic online; on the other hand, it creates opportunities for people
working in the health sector to get insight of what might be happening
at mental state of someone who reacted to a topic in a specific manner.
In order to provide such insight, machine learning techniques could
potentially offer some unique features that can assist in examining the
unique patterns hidden in online communication and process them to
reveal the mental state (such as ‘happiness’, ‘sadness’, ‘anger’,
‘anxiety’, depression) among social networks’ users. Moreover, there
is growing body of literature addressing the role of social networks on
the structure of social relationships such as breakup relationship,
mental illness (‘depression’, ‘anxiety’, ‘bipolar’ etc.), smoking and
drinking relapse, sexual harassment and for suicide ideation [1, 2].
In this study, we aim to analyze Facebook data to detect any factors
that may reflect the depression of relevant Facebook’s users. Various
machine learning techniques are employed for such purpose.
Considering the key objective of this study, the following are
subsequent research challenges addressed in paper.
Define what depression is and what are the common factors
contributing toward depression.
What are the factors to look for depression detection in Facebook
comments?
How to extract these factors from Facebook comments?
What is the relationship between these factors and attitudes toward
depression?
When is the most influential time to communicate within depressive
Indicative Facebook user?
What are the most influential machine learning techniques for
detection of depression in Facebook comments?
In the context of above mentioned challenges, we analyse depression
from Facebook users’ data [3, 4]. As users express their feeling as a
post or comments in the Facebook platform, sometimes their posts
and comments refer to as emotional state such as ‘joy’, ‘sadness’, ‘fear’,
‘anger’, or ‘surprise’ [5, 6]. We analyze various features of Facebook
comments by collecting data through an effective method of machine
learning classification techniques and to make overall judgements
regarding their various parts. In this study, we used publically
available Facebook data (from bipolar, depression and anxiety
Facebook page) containing users’ comments. Once we access the data,
it was cleaned from any inconsistency and then analyzed by a
software application called LIWC [7, 8].
In this study, we examine various linguistic cues which help to detect
emotion cause events: the position of cause event and experiencer
relative to the emotion keyword: emotional process like positive
emotion (e.g. ‘happy’, ‘love’, ‘nice’), negative emotion (e.g. ‘worthless’,
‘loser’, ‘hurt’, ‘ugly’, ‘nasty’), sadness (e.g. ‘worry’, ‘crying’, ‘grief’, ‘sad’),
anger (e.g. ‘stop’, ‘shit’, ‘hate’, ‘kill’, ‘annoyed’) and anxiety (e.g.
‘worried’, ‘fearful’). A temporal process like present focus (e.g. ‘today’,
‘is’, ‘now’), past focus (e.g. ‘ago’, ‘did’, ‘talked’) and future focus (e.g.
‘shall’, ‘may’, ‘will’, ‘soon’). Linguistic words like articles (e.g. ‘a’, ‘an’,
‘the’), prepositions (e.g. ‘for’, ‘in’, ‘of’, ‘to’, ‘with’, ‘above’), auxiliary
verbs (e.g. ‘do’, ‘have’, ‘am’, ‘will’), conjunctions (e.g. ‘and’, ‘but’,
‘whereas’), personal pronoun (e.g. ‘I’, ‘them’, ‘her’, ‘him’), impersonal
pronouns (e.g. ‘it’, ‘it’s’, ‘those’), verbs (e.g. ‘go’, ‘good’) and negation
(e.g. ‘deny’, ‘dishonest’, ‘no’, ‘not’, ‘never’).
The main contributions of this paper are listed as follows:
First, we synthesized the literature on various emotion detection
techniques to detect depression.
Second, we designated four features for our specific research problem
and elaborate on the lesson learned from using each type.
Third, our experiments are carried out on datasets of Facebook user
comments.
Fourth, we suggest machine learning techniques to utilize all factors
and maintain robustness. We also identify that a Decision Tree
classifier outperforms other classifiers (a SVM, KNN and Ensemble)
for our dataset. Finally, our work also shows the importance of
depression detection for mental disorder detection.
The remainder of the paper is organized as follows: “Related work”
presents the related work of detecting depression analysis of social
network data. Methodology is explained in the third section. The
experimental analysis is presented in the fourth section, and its
discussion in the fifth section. Finally, the conclusion and future work
are provided in the last section.
Methodology
In this study, we first focused on four types of factors such as
emotional process, temporal process, linguistic style and all
(emotional, temporal, linguistic style) features together for the
detection and processing of depressive data received as Facebook
posts. We then apply supervised machine learning approaches to
study each factor types independently. The classification techniques
such as ‘decision tree’, ‘k-Nearest Neighbor’, ‘Support Vector Machine’,
and ‘ensemble’ are deemed suitable for each type (refer to Fig. 1).
Fig. 1
A methodological overview of Facebook data analysis for depression analysis
Data set exploration
We worked on Facebook users’ comments for depressive behavioral
exploration and detection. We collected data from the social network
[26]. Preparing of social network data, in particular Facebook user’s
comments is one of the primary challenges which bear information on
whether or not they could contain depression bearing content. To
tackle this issue we use NCapture for collecting data from Facebook
[27, 28]. For qualitative data analysis, NCapture is a powerful tool in
the world today. It is intended to enable to arrange, break down and
discover knowledge in unstructured data like open-ended survey
responses, social media, interviews, articles and web content.
Furthermore it gives a place to arrange and deal with material to
discover knowledge in a more proficient way [29].
Feature extraction
To describe and demonstrate amongst depressive and non-depressive
posts, we extract the different features in view of psycholinguistic
measurements from the user’s post. It is clarified briefly as follows:
Psycholinguistic features LIWC is a psycholinguistic vocabulary
package made by psychological analysts to perceive the different
affective, intellectual, and etymological parts lies on user’s verbal or
written correspondence. It returns more than 70 different factors with
higher level of psycholinguistic features, for example,
Psychological process—affective process, social process, cognitive
process, perceptual process, biological process, drives, time
orientations, relativity, personal concerns
Linguistic process—word count, word/sentence, pronoun, personal
pronoun, articles, prepositions, auxiliary verbs, adverbs, conjunctions,
Negations
Others grammar—verbs, adjectives, comparisons, interrogatives,
number, quantifiers.
These higher-level categories are also divided into subcategories such
as
Biological processes—sexual, body, ingestion and health
Affective processes—anxiety, anger, sadness, positive emotion,
negative emotion
Time orientations—present, past, future
Social processes—family, friends, male, female
Perceptual processes—see, hear, feel.
Measuring depressive behavior
We presented a set of attributes like emotional process, temporal
process, and linguistic style that can be used to characterize the
depressive behaviors of users. Our dataset consists of five emotional
variables (positive, negative, sad, anger, anxiety), three temporal
categories (present focus, past focus and future focus), and 9 standard
linguistic dimensions (e.g., articles, prepositions, auxiliary verb,
adverbs, conjunctions, pronoun, verbs and negations) [30–36]. We
calculate their values by the standard LIWC2015 scales. A complete
list of the standard LIWC2015 scales including examples of our
dataset is included in Table 4.
Emotional processes Emotion process, a complex experience of
consciousness, bodily sensation, and behaviour that reflects the
personal significance of a thing, an event, or a state of affairs. The
analysis of the emotional comments of social network data can be
leveraged to produce reliable predicts in a variety of circumstances
[25]. We use psycholinguistic dimensions for considering five features
of the emotion state manifested in the comments: positive affect (PA),
negative affect (NA), sadness affect (SA), anger affect (AA), and anxiety
affect (AnA) [37–41].
Temporal process
Generally, temporal process word provides information about past
focus category, present focus category and future focus category of
how people are referencing each other and their degree of
emotionality.
Linguistic process
Linguistics process is one of the largest parts of LIWC
psycholinguistics vocabulary package. It was intended to quantify
word use in mentally significant classifications. Also it has been
effectively used to recognize connections between people in social co-
operations, including relative status, trickiness, and the nature of close
relationship. So, In our study we use nine specific linguistics features
(articles, prepositions, auxiliary verbs, adverbs, conjunctions, personal
pronoun, impersonal pronouns, verbs, and negations) to characterize
user comments for our experimental analysis.
Classification model
This stage constructs prediction model for depression post/comments
recognition, by considering the psycholinguistic features as input.
Considering our training corpus B = p1; p2….. pn of n posts/comments,
such that each post/comments pi is labeled with the class either as
depressive or non-depressive, where L = l1|l2. The task of a classifier f
is to find the corresponding label for each posts/comments.
f:B∈Lf(p)=l
In this work, we employ four popular classifiers: Support Vector
Machine (SVM), Decision Tree, Ensemble, and k-Nearest Neighbor
(kNN).
Support Vector Machines (SVM) Support Vector Machines also known
as support vector networks. It is a non-probabilistic linear binary
classifier that analyzes data for classification or anomaly detection. It
builds a hyperplane into high dimensional feature space and finds a
hyperplane that isolates the data into two classes with the biggest
separation to the closest training data purpose of any class.
Decision Tree (DT) Decision tree is a simple and all around used
classification based systematic approach that makes the hierarchical
tree from the training dataset. The state of decision tree is to divide
the data hierarchically that have different characteristics. For instance
of text documents classification, roots are commonly identified in
terms and internal individual nodes may be sub-divided to its children
in view of the yes or no of a term in the document.
Ensemble Ensemble methods use multiple learning algorithms of
decision tree for better predictive performance.
K-Nearest Neighbor (KNN) K-Nearest Neighbor (KNN) is a non-
parametric approach use to discover the distances from point of
interest to points in training set.
Go to:
Experimental analysis
In this study, we examine the execution of various classifiers for
depression detection in a shorter time.
Data analysis
The analysis is conducted using MATLAB 2016b. We applied four
major classifiers: Support Vector Machine (SVM), K-Nearest Neighbors
(KNN), Decision trees (DT), and Ensemble. Each classifier has sub0-
classifiers such as Decision trees—Simple DT, Medium DT, and
Complex DT; SVM—Linear, Quadratic, Cubic, Fine Gaussian, Medium
Gaussian, and Coarse Gaussian; KNN—Fine, Medium, Coarse, Cosine,
Cubic and Weighted, Ensemble—Boosted tree, Bagged tree, Subspace
discriminant, Subspace KNN, RUSBoosted Tree [42–44].
Using the above classification techniques, we examined detection
performance of Facebook user comments. To comprehend the
significance of different feature types, we applied four classifiers
techniques each utilizing: emotional process, linguistic style, temporal
process and all features. The results of the analysis are reported in
Tables 5 and and66 that suggests Decision Tree as best performing
model. Although KNN gives the high precision but Decision Tree gives
the highest result for recall and F-measure relating to the class of
depression indicative comments of Facebook user. Similarly, for
linguistic style Decision Tree gives the highest result for precision,
recall and F-measure.
For a better understanding of the general intuition behind depression,
in this paper, we applied Decision Tree, KNN, SVM and Ensemble
classifier techniques for depression detection of emotional terms. We
showed that all of these classification techniques based on linguistic
style, emotional process, temporal process and all (Linguistic,
emotional and temporal) features are able to successfully extract the
depressive emotional result. Tables 5 and and66 demonstrate the
results of various characterizations with various proportions of four
features. It can be observed that Decision Tree gives the better
outcome. We believe that the current study has laid the ground for
future research on inferences and discovery of additional information
based on cause-event relation, such as detection of implicit emotion or
cause, as well as prediction of public opinion based on cause events,
etc. Moreover, in this paper, we applied total 21 types of attributes of
LIWC software for detecting depression, but we can apply more than
54 attributes. Though we achieved accuracy between 60 and 80%;
there is still some room for improvement. It is important to note that
this study does not identify who the sufferers are; but assess the
Facebook comments for depression detection.
Go to:
Conclusion and future work
In this paper we have exhibited the capability of using Facebook as a
tool for measuring and detecting major depression among its users. To
give a clear understanding of our work, numbers of research
challenges were stated at the start of this paper. The analytics
performed on the selected dataset, provide some insight on the
research challenges. Below is the summary of our findings:
What depression is and what are the common factors contributing
toward depression.
While we feel moody, sad or low from time to time, few people
encounter these emotions seriously, for drawn out stretches of time
(weeks, months or even years) and in some cases with no apparent
reason. Despondency is something other than a low state of mind—it’s
a genuine condition that influences someone’s physical and emotional
feelings.
Depression can influence any of us anytime. However, some phases or
events make us more vulnerable to depression. Physical and
emotional changes associated with growing-up, losing a loved one,
beginning a family, retirement may trigger some emotional influx that
could lead toward depression for few people.
What are the factors to look for depression detection in Facebook
comments?
It is important to remember that depressive emotions have several
signs and symptoms spread across various categories as reported in
Table 8.