Batch-11 DC
Batch-11 DC
MACHINE LEARNING
A Project report submitted in partial fulfillment of the requirements for the award of the
Degree of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted by
Assistant Professor
Eluru-534007
A.Y 2023-2024
i
DETECTION OF EMPLOYEE STRESS USING
MACHINE LEARNING
A Project report submitted in partial fulfillment of the requirements for the award ofthe
Degree of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted by
Assistant Professor
Eluru-534007
A.Y 2023-2024
ii
SIR C R REDDY COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CERTIFICATE
Computer Science during the academic year 2023-2024 as a fulfilment for the
ENGINEERING.
External Examiner
iii
DECLERATION
work and the project has not formed the basis for the award of any degree,
iv
ACKNOWLEDGEMENT
I would like to take this opportunity to thank our management and beloved principal
Dr. K. Venkateswara Rao M.Tech., Ph.D. for providing all the necessary facilities and a great
support to us in completing the project work.
The present project work is the several days study of the various aspects of the project
development. During this effort in the present study, I have received a great amount of help
from our Head of the Department Dr. A. YESUBABU M.Tech, Ph.D., whom I wish to acknowledge
and thank from the depth of my heart.
I am deeply indebted to my project guide, and our project coordinator Dr. G. Nirmala M.Tech,
Ph.D. for providing his opportunity and constant encouragement given by him during this course.
I am grateful for his valuable guidance and suggestions during my project work.
My parents have put us ahead of themselves. Because of their hard work & dedication, we had
opportunities beyond my wildest dreams. Finally, I express my thanks to all other faculty
members, classmates, friends and neighbors who helped me with the completion of my project
and without infinite love and patience this would never have been possible.
v
ABSTRACT
Disorders of stress are very casual thing among the employees who are working in corporate
sectors. As with changing work of people and their living lifestyle, we can see the increment
of stress in the working employees. Even many corporate sectors are providing variety of
schemes related to mental health and trying to reduce the disorders of stress in the working
environment, the disorder is very far from stopping. In our project, we are going to make use
of two techniques of machines to determine the amount of stress the employee is having who
is working in corporate sectors and try to narrow down the issues that identify the stress levels.
We are going to apply two techniques of machine learning.
vi
TABLE OF CONTENTS
1 INTRODUCTION 1
2 LITERATURE SURVEY 2
3 EXISTING SYSTEM 7
3.1 DISADVANTAGES
4 PROPOSED SYSTEM 8
4.1 SCOPE
4.2 OBJECTIVES
5 REQUIREMENT ANALYSIS 9
5.1 FUNCTIONAL REQUIREMENTS
5.2 NON-FUNCTIONAL REQUIREMENTS
5.3 SOFTWARE REQUIREMENTS
5.4 HARDWARE REQUIREMENTS
6 DESIGN AND METHODOLOGY 12
6.1 METHODOLOGY
6.2 SYSTEM DESIGN
7 IMPLEMENTATION 17
8 TESTING 23
8.1 TYPES OF TESTING
9 RESULTS AND DISCUSSION 26
10 CONCLUSION 31
11 REFERENCES 32
vii
viii
CHAPTER-1
INTRODUCTION
Disorders of stress which are related to mental health are not rare for the employees working
in corporate sectors. Some analysis done earlier have created some concern on the very same.
Based on the work done byAssociation of Industry, Assocham, we come to know that above
42% of the professional working employees in the corporate private sectors of India are
suffering from stress or common disorders of anxiety because of late night working hours
and also due to fixed timings. This part of singles is growing as mentioned in the Economic
Times of 2018 article which is dependent on the survey that was managed by the Optum.
There is a survey that considers the replies of nearly eight lakh working employees who are
working from more than seventy huge companies, with eachsingle company having its
employees more than 4,500 working professionals. The workplace which is free form stress
must be given at most importance for higher productivity and happy living for the working
employees. There are many steps which we can take to help the employeescome up with the
disorder of stress for well-being of the mental health likeassistance for counselling, guidance
given for the career, sessions for management of stress, and creating an awareness of health
identification ofworking employees who will need such kind of help will definitely improve
the rates of such kind of measures for becoming victorious. We try to make this happen by
using our machine learning techniques toovercome with a model that predicts the rate of the
stress that is accomplished. This approach is not only going to help company HR managers
to know better about their working professionals, it will also help in taking proper
precautions to reduce the chances of stress in their working employees.
1
CHAPTER-2
LITERATURE SURVEY
2
scales used in psychiatry Freifeld, and Madoff 2009; Chew and Eysenbach 2010; Lamb,
Paul, and Dredze 2013) and modeling (Sadilek, Kautz, andSilenzio 2012), rely on explicit
mentions of illness or health issues; if people are sick, they say so. In contrast, mental health
conditions largely display implicit changes in language and behavior, such as a switch in the
types of topics, a shift in word usage or a shift in frequency of posts. While DeChoudhury
et al. (2013) find some examples of explicit depression mentions, the focus is on more subtle
changes in language (e.g., pronoun use). Second, obtaining labeled data for a mental health
condition is challenging since we are examining implicit features of language. De
Choudhury et al. (2013) rely on (crowdsourced) volunteers to take depression surveys and
offer their Twitter feed for research. While this yields reliable data, it is time-consuming
and challenging to build large data sets for a diverse set of mental health conditions.
Furthermore, the necessary mental health evaluations such as the DSM (Diagnostic and
Statistical Manual of Mental Disorders)3 , are difficult to perform as these evaluations
require a trained diagnostician and have been criticized as unscientific and subjective (Insel
2013). Thus, relying on data from crowdsourced volunteers to build datasets of users with
diverse mental health conditions is difficult, and perhaps untenable. We provide an alternate
method for gathering samples that partially ameliorate these problems – ideally to be used
in concert with existing methods. In this paper, we study PTSD in Twitter data, one of the
first studies to consider social media for a mental health condition beyond depression (De
Choudhury, Counts, and Horvitz 2013; De Choudhury et al. 2013; Rosenquist, Fowler, and
Christakis 2010). Rather than rely on traditional PTSD diagnostic tools (Foa 1995) for
finding data, we demonstrate that some PTSD users can be easily and automatically
identified by scanning for tweets expressing explicit diagnoses. While it is natural to be
suspicious of self-identified reporting, we find that self-identifying PTSD users have
demonstrably different language usage patterns from the random users, according to the
Linguistic Inquiry Word Count (LIWC), a psychometrically validated analysis tool
(Pennebaker et al. 2007). We demonstrate elsewhere (Coppersmith, Dredze, and Harman
2014) that data obtained 3 en.wikipedia.org/wiki/Diagnostic and Statistical Manual
3
of Mental Disorders in this way replicates analyses performed via LIWC on the
crowdsourced survey respondents of De Choudhury et al. (2013). We also demonstrate that
users who self-identify are measurably different from randomusers by learning a classifier
to discriminate between self-identified and random users. We further show how this data can
be used to train a classifier that detects elevated incidences of PTSD in tweets from U.S.
military bases as compared to the general U.S. population, with a further increase around
bases that deployed combat troops overseas. We intend for this initial finding (whichis small,
but statistically significant) to be a demonstration of the types of analysis Twitter data
enables for public health. Given the small effect size, replication and further study are called
for. Data We used an automated analysis to find potential PTSD users, and then refined the
list manually. First, we had access to a large multi-year historical collection from the Twitter
keyword streaming API, where keywords were selected to focus on health topics. We used
a regular expression4 to search for statements where the user self-identifies as being
diagnosed with PTSD. The 477 matching tweets were manually reviewed to determine if
they indicated a genuine statement of a diagnosis for PTSD. Table 1 shows examples from
the 260 tweets that indicated a PTSD diagnosis. Next, we selected the username that
authored each of these tweets and retrieved up to the 3200 most recent tweets from that user
via the Twitter API. We then filtered out users with less than 25 tweets and those whose
tweets were not at least 75% in English (measured using an automated language ID system.)
This filtering left us with 244 users as positive examples. We repeated this process for a
group of randomly selected users. We randomly selected 10,000 usernames from a list of
users who postedto our historical collection within a selected two week window. We then
downloaded all tweets from these users. After filtering (as above) 5728 random users
remain, whose tweets were used as negative examples. Methods We use our positive and
negative PTSD data to train three classifiers: one unigram language model (ULM) examining
individual whole words, one character n-gram language model (CLM), and one from the
LIWC categories above. The LMs have been shown effective for Twitter classification tasks
(Bergsma et al. 2012) and LIWC has been previously used for analysis of
4
mental health in Twitter (De Choudhury et al. 2013). The language models measure the
probability that a word (ULM) or a string of characters (CLM) was generated by the same
underlying process as the training data. Here, one of each language model (clm+ and ulm+)
is trained from the tweets of PTSD users, and a second (clm− and ulm−) from the tweets
from random users. Each test tweet t is scored by comparing proabilities from 4Case
insensitive regex:\Wptsd\W|\Wp\.t\.s\.d\.\W|post[- ]traumatic[- ]stress[- ]disorder[- ] each
LM: s= lm+(t) lm−(t) (1) A threshold of 1 for s divides scores into positive and negative
classes. In a multi-class setting, the algorithm minimizes the cross entropy, selecting the
model with the highest probability. For each user, we calculate the proportion of tweets
scored positively by each LIWC category. These proportions are used as a feature vector in
a loglinear regression model (Pedregosa et al. 2011). Prior to training, we preprocess the text
of each tweet: we replaced all usernames with a single token (USER), lowercased all text,
and removed extraneous whitespace. We also excluded any tweet that contained a URL, as
these often pertain to events external to the user (e.g., national news stories). In total, we
used 463k PTSD tweets and sampled 463k non-PTSD tweets to create a balanced data set.
New noncontact methods have also been developed recently to measure stress states. Some of
them are hyperspectral imaging technique [28], human voice [29, 30], pupil diameter [31],
visible spectrum camera [32], or using stereo thermal and visible sensors [33].
5
However, observing several markers for identifying stress requires an increasing number of
input sensors which in turn increases the overall price and lowers applicability. Prices for heart
rate meters range from $70 to $500 USD; GSR devices range from $100 to $500 USD, while
EMG devices have price ranges from $450 USD up to $1750 USD. Systems combining multiple
sensors are priced much higher. For such systems prices fall between $550 USD and $5700
USD, which already can be considered excessive for a mass telemedical lifestyle counseling
application. Therefore, in an ambient assisted living (AAL) system, the number of input sensors
should be kept minimal. In the rest of the paper, we focus on the simplest and most researched
sensor input, that is, the electrical activity of the heart.
As for the reliability of HRV sensors, there are still surprisingly few reviews reported in the
literature to date on the validation of the information content of low cost sensors compared to a
clinically accepted “gold standard” device. Some devices that were tested for validity are the
Sense Wear HR Armband [34], the Smart Health Watch [35], the Actiheart [36, 37], the
Equivital LifeMonitor [38], and the PulseOn [39]; and also the Bioharness multivariable
monitoring device from Zephyr has been tested for validity [40, 41] and reliability [41, 42]. In
all cases, a gold standard device was used simultaneously with the device under test as a method
for validating data. However, the validated devices above are high-end devices with a
considerable price which present an obstacle for the penetration of telemedicine. For example,
the Bio harness device has a price around $550 USD, whereas the price of low cost heart rate
meters varies from $70 USD to $100 USD. The lack of reliability tests of low cost devices was
our motivation for our device validation study.For automated stress detection, several methods
have been published which use only HRV. In 2008, Kim et al. collected HRV data from sixty-
eight subjects [43]. HRV data were collected during three different time periods. High stress
decreased HRV features. A maximum classification accuracy of 66.1% was achieved. Melillo
et al. in 2011 used nonlinear features of HRV for real-life stress detection [44]. HRV data were
collected two times, during university examination and after holidays, on 42 students. Most of
HRV features significantly decreased during stress period. Stress detection with classification
accuracy of 90% was reported using two Poincaré plot features and Approximate Entropy. One
year later, using the same data, they designed a classification tree for automatic stress detection
based on LF and pNN50 HRV features with sensitivity of 83.33% [45]. In 2013, Karthikeyan
et al. created stress detection classifiers from ECG signal and HRV features [46]. Vanitha and
Suresh used a hierarchical classifier to classify stress into four levels with a classification
efficiency of 92% [47] in 2014.
6
CHAPTER-3
EXISTING SYSTEM
Traditional methods for detecting employee stress include surveys and self-reporting, which
can be subjective and time-consuming. Other methods include physiological measures such as
heart rate variability and cortisol levels, which can be invasive and require specialized
equipment. These methods also require significant expertise to interpret the data accurately.
3.1 DISADVANTAGES
• The disadvantages of existing methods are that they can be time-consuming, subjective.
• Surveys and self-reporting methods rely on the employee's willingness and ability to accurately
report their stress levels, which can be influenced by factors such as social desirability bias or
lack of self-awareness.
• Physiological measures such as heart rate variability and cortisol levels can be invasive and
require specialized equipment and expertise to interpret the data accurately
7
CHAPTER-4
PROPOSED SYSTEM
In this to detect employee stress by using machine learning algorithms such as SVM and
Random Forest Algorithms. To detect stress we are using social media dataset such as tweets
where employee can share their views and by analyzing this views we can identify whether
employee is in relax or stress mood but by analyzing this views manually may take lot of
human efforts so author using machine learning algorithms and the experiment with this
algorithms show stress detection accuracy more than 90%.
4.1 SCOPE
Since this project is associated with the social problem which is one of the enormously growing
field the scope is pretty high and it helps the society in a way which can identify the victims of
stress which is one of the most commonly identified disorder among the adolescents. The scope
of detecting employee stress using twitter dataset by Support Vector Machine and Random
Forest algorithms is significant. SVM excels in classifying data by finding the optimal
hyperplane that separates different classes, while Random Forest utilizes an ensemble of
decision trees to make predictions. However, the effectiveness ultimately depends on the quality
and relevance of the input data and the implementation of the algorithms.
4.2 OBJECTIVE
The objective of using Support Vector Machine (SVM) and Random Forest algorithms for the
detection of employee stress is to develop predictive models that can analyse various features
or factors associated with employees and classify them into stressed or non-stressed categories.
These algorithms aim to accurately predict and identify employees who may be experiencing
stress, which can help organizations take proactive measures to address employee well-being.
8
CHAPTER-5
REQURIMENT ANALYSIS
1. Data Collection
2. Data Preprocessing
4. Modeling
5. Predicting
Initially, we collect a dataset for our personality prediction system. After the collection of the dataset,
we split the dataset into training data and testing data. The training dataset is used for prediction model
learning and testing data is used for evaluating the prediction model. For this project, 90% of training
data is used and 10% of data is used for testing.
Data pre-processing is an important step for the creation of a machine learning model. Initially, data may
not be clean or in the required format for the model which can cause misleading outcomes. In pre-
processing of data, we transform data into our required format. It is used to deal with noises, duplicates,
and missing values of the dataset. Data pre-processing has activities like importing datasets, splitting
datasets, attribute scaling, etc. Preprocessing of data is required for improving the accuracy of the model.
5.1.4 Modelling
Machine learning models are created by training algorithms with either labeled or unlabeled data, or a
mix of both. As a result, there are three primary ways to train and produce a machine learning algorithm:
Supervised learning: Supervised learning occurs when an algorithm is trained using “labelled
data”, or data that is tagged with a label so that an algorithm can successfully learn from it. Training
an algorithm with labelled data helps the eventual machine learning model know how to classify
data in the manner that the researcher desires.
Unsupervised learning: Unsupervised learning uses unlabeled data to train an algorithm. In this
process, the algorithm finds patterns in the data itself and creates its own data clusters.
Unsupervised learning is helpful for researchers who are looking to find patterns in data that are
currently unknown to them.
5.1.5 Predicting
The trained model upon giving the new data makes prediction. When the new input or the
test data is given to the trained model, it predicts the personality of the user based on the
input data which is given. The trained model predicts well since the trained data used is
more than 60%. The algorithm also plays a major role in making the model predict well.
The functional requirements or the overall description documents include the product
perspective and features, operating system and operating environment, graphics
requirements, design constraints and user documentation.
The appropriation of requirements and implementation constraints gives the general
overview of the project in regards to what the areas of strength and deficit are and how to
tackle them.
Jupiter (or)
Google colab
Minimum hardware requirements are very dependent on the particular software being
developed by a given Enthought Python / Canopy / VS Code user. Applications that need to
store large arrays/objects in memory will require more RAM, whereas applications that
need to perform numerous calculations or tasks more quickly will require a faster
processor.
Operating system: windows, linux
Ram : minimum 4 gb
11
CHAPTER-6
DESIGN AND METHODOLOGY
6.1 METHODOLOGY
As the name suggests, "Random Forest is a classifier that contains a number of decision
trees on various subsets of the given dataset and takes the average to improve the predictive
accuracy of that dataset." Instead of relying on one decision tree, the random forest takes
the prediction from each tree and based on the majority votes of predictions, and it predicts
the final output.
The greater number of trees in the forest leads to higher accuracy and prevents
the problem of overfitting.
12
SUPPORT VECTOR MACHINE
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However,primarily, it is
used for Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in
the correct category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector
Machine. Consider the below diagram in which there are two different categories that are
classified using a decision boundary or hyperplane:
13
6.2 SYSTEM DESIGN
UML DIAGRAMS
14
6.2.2.1 SEQUENCE DIAGRAM
15
6.2.3.1 CONTROL FLOW DIAGRAM
16
CHAPTER-7
IMPLEMENTATION
stop_words = set(stopwords.words('english'))
main = tkinter.Tk()
main.title("Detection of Employee Stress Using Machine Learning")
main.geometry("1300x1200")
global model
global filename
global tokenizer
global X 17
global Y
global X_train, X_test, Y_train, Y_test
global XX
word_count = 0
global svm_acc,rf_acc
global model
def upload():
global filename
filename = filedialog.askopenfilename(initialdir = "Tweets")
pathlabel.config(text=filename)
textarea.delete('1.0', END)
textarea.insert(END,'tweets dataset loaded\n')
def preprocess():
global X
global Y
global word_count
X = []
Y = []
textarea.delete('1.0', END)
train = pd.read_csv(filename,encoding='iso-8859-1')
word_count = 0
words = []
for i in range(len(train)):
label = train.get_value(i,2,takeable = True)
tweet = train.get_value(i,1,takeable = True)
tweet = tweet.lower()
arr = tweet.split(" ")
msg = ''
for k in range(len(arr)):
word = arr[k].strip()
if len(word) > 2 and word not in stop_words:
msg+=word+" " 18
if word not in words:
words.append(word);
text = msg.strip()
X.append(text)
Y.append(int(label))
X = np.asarray(X)
Y = np.asarray(Y)
word_count = len(words)
textarea.insert(END,'Total tweets found in dataset : '+str(len(X))+"\n")
textarea.insert(END,'Total words found in all tweets : '+str(len(words))+"\n\n")
featureExtraction()
def featureExtraction():
global X
global Y
global XX
global tokenizer
global X_train, X_test, Y_train, Y_test
max_fatures = word_count
tokenizer = Tokenizer(num_words=max_fatures, split=' ')
tokenizer.fit_on_texts(X)
XX = tokenizer.texts_to_sequences(X)
XX = pad_sequences(XX)
indices = np.arange(XX.shape[0])
np.random.shuffle(indices)
XX = XX[indices]
Y = Y[indices]
X_train, X_test, Y_train, Y_test = train_test_split(XX,Y, test_size = 0.13, random_state = 42)
textarea.insert(END,'Total features extracted from tweets are : '+str(X_train.shape[1])+"\n")
textarea.insert(END,'Total splitted records used for training : '+str(len(X_train))+"\n")
textarea.insert(END,'Total splitted records used for testing : '+str(len(X_test))+"\n")
def SVM():
textarea.delete('1.0', END) 19
global svm_acc
rfc = svm.SVC(C=2.0,gamma='scale',kernel = 'rbf', random_state = 2)
rfc.fit(X_train, Y_train)
textarea.insert(END,"SVM Prediction Results\n")
prediction_data = rfc.predict(X_test)
svm_acc = accuracy_score(Y_test,prediction_data)*100
textarea.insert(END,"SVM Accuracy : "+str(svm_acc)+"\n\n")
def RandomForest():
global rf_acc
global model
rfc = RandomForestClassifier(n_estimators=20, random_state=0)
rfc.fit(X_train, Y_train)
textarea.insert(END,"Random Forest Prediction Results\n")
prediction_data = rfc.predict(X_test)
rf_acc = accuracy_score(Y_test,prediction_data)*100
textarea.insert(END,"Random Forest Accuracy : "+str(rf_acc)+"\n")
model = rfc
def predict():
textarea.delete('1.0', END)
testfile = filedialog.askopenfilename(initialdir = "Tweets")
test = pd.read_csv(testfile,encoding='iso-8859-1')
for i in range(len(test)):
tweet = test.get_value(i,0,takeable = True)
arr = tweet.split(" ")
msg = ''
for j in range(len(arr)):
word = arr[j].strip()
if len(word) > 2 and word not in stop_words:
msg+=word+" "
text = msg.strip()
mytext = [text]
twts = tokenizer.texts_to_sequences(mytext) 20
twts = pad_sequences(twts, maxlen=83, dtype='int32', value=0)
stress = model.predict(twts)
print(stress)
if stress == 0:
textarea.insert(END,text+' === Prediction Result : Not Stressed\n\n')
if stress == 1:
textarea.insert(END,text+' === Prediction Result : Stressed\n\n')
def graph():
height = [svm_acc,rf_acc]
bars = ('SVM ACC','Random Forest ACC')
y_pos = np.arange(len(bars))
plt.bar(y_pos, height)
plt.xticks(y_pos, bars)
plt.show()
pathlabel = Label(main)
pathlabel.config(bg='royal blue', fg='rosy brown')
pathlabel.config(font=font1)
pathlabel.place(x=780,y=150)
21
preprocessButton = Button(main, text="Data Preprocessing & Features
Extraction",command=preprocess)
preprocessButton.place(x=780,y=200)
preprocessButton.config(font=font1)
main.config(bg='cadet blue')
main.mainloop()
22
CHAPTER-8
TESTING
TESTING
Testing is a process of executing a program with the aim of finding error. To make our
software perform well it should be error free. If testing is done successfully, it will remove
all the errors from the software.
Unit testing
Integration Testing
Alpha Testing
Beta Testing
model, the algorithm used to create the model etc. The challenge, however,is to verify the test
outcome against the expected values that are known beforehand.
[16,6,324,0,0,0,22,0,0,0,0,0,0] 0 0
[16,7,263,7,0,2,700,9,10,1153,832,9,2] 1 1
24
The above Fig represents the black box testing procedure for machinelearning algorithms.The
model gives out the correct out the model gives out the correct output when different inputs
are givenwhich are mentioned in Table. Therefore, the program is said to be executed as
expected or correct program
25
CHAPTER-9
RESULTS AND DISCUSSION
26
In above screen select ‘stress_tweets.csv’ dataset and then click on ‘Open’ button to load
datasetand to get below screen.
Upload Tweets Dataset
Data Processing and features extraction
Run Support Vector Machine
Run Random Forest
Predict stress
Accuracy graph
In above screen click on ‘Data Preprocessing & Features Extraction’ button to read dataset
and to clean and extract features such as words from dataset and find total records in
dataset, totalwords and application using how many records for training and testing.
27
In above screen dataset contains total 10314 tweets and all tweets contain 30790 words and
total unique words are 83 and application using 8973 records for training and 1341 for testing.
Now both train and test data is ready and now click on ‘Run SVM Algorithm’ button to trained
28
In above screen SVM got 89.85 correctly predicted accuracy from test data and now click on
In above screen random forest got 97.31 correctly prediction accuracy and now click on ‘Predict
Stress’ button and upload test file which contains tweets and by analyzing those tweets machine
learning algorithm will predict whether tweets contain any stress data or not. Below is the
In above screen uploading ‘test’ file and now click on ‘Open’ button to predict stress
29
In above screen beside each tweet we can see predicted result as Stressed or Not stressed.
From above screen we can see application detecting stress successfully from messages and
In above x-axis represents algorithm name and y-axis represents accuracy of those algorithms
and from above graph we can say random forest is better than Support Vector Machine.
30
CHAPTER-10
CONCLUSION
Gender, also the family background which has the illness, and considering whether a single employer
provides the conceptual benefits of health for their employees was having more significance
compared to the other factors for determining whether an employee can obtain conceptual health
associated issues. From our study, we were able to find that the people who are working in the tech
companies are at more risk of obtaining stress, even though their job role was not based on tech.
These perceptions could be successfully used by business companies tomake more desirable HR
strategies for the working employees. A 75% correctness shows that the application of two Machine
Learning techniques ( i.e.SVM and Random forest) for predicting the stress and conceptual health
conditions provides worthy results and could be searched further, and thus meets the aim of this
project.
31
CHAPTER-11
REFERENCES
[1] Detecting and characterizing Mental Health Related Self-Disclosure in social media.
SairamBalani and Munmun De Choudhury. 2015.In Proceedings of the 33rd Annual ACM
Conference Extended Abstracts on Human Factors in Computing Systems -CHI EA ‟15, pages
1373–1378.
[2] Measuring Post Traumatic Stress Disorder in Twitter. Glen Coppersmith, Mark Dredze, and
Craig Harman. 2014.
[3] Role of social media in Tackling Challenges in Mental Health. Munmun De Choudhury. 2013.
[4] Bhattacharyya, R., &Basu, S. (2018). India Inc looks to deal with rising stress in employees.
Retrieved from „The Economic Times‟
[5] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., &Vanderplas,
J. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct),
2825-2
[6] OSMI Mental Health in Tech Survey Dataset, 2017 from Kaggle.
[7] Van den Broeck, J., Cunningham, S. A., Eeckels, R., &Herbst, K. (2005). Data cleaning:
detecting, diagnosing, and editing data abnormalities. PLoS medicine, 2(10), e267.
[8] Relationship between Job Stress and Self-Rated Health among Japanese Full TimeOccupational
Physicians Takashi Shimizu and Shoji Nagata 2007 Academic Papers inJapanese 2007.
[9] Tomar, D., & Agarwal, S. (2013). A survey on Data Mining approaches for healthcare.
International Journal of Bio-Science and Bio-Technology, 5(5), 241-266.
[10] Gender and Stress. (n.d.). Retrieved from APA press release 2010
[11] Julie Aitken Harris, Robert Slatestone and Maryann Fraboni. (2000) An Evaluationof the Job
Stress Questionnaire with a Sample of Entrepreneurs”2000 JSQ scale Entrepreneurs.
[12] “Demographic and Workplace Characteristics which add to the Prediction of Stress and Job
Satisfaction within the Police Workplace”, Jeremy D. Davey, Patricial L. Obst, and Mary C.
Sheehan 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive
32
Computing (ICCICC). 2015.
[13] Mario Salai, Istv an Vass anyi, and Istv an Kosa, “Stress Detection using low-cost Heart
Rate sensors”, Journal of Healthcare Engineering, pp.1-13,Hindawi Publishing corporation. , 2016
[14] Shwetha, S, Sahil, A, Anant Kumar J, (2017) Predictive analysis using classification
techniques in healthcare domain, International Journal of Linguistics & Computing Research,
ISSN: 2456-8848, Vol. I, Issue.I, June-2017.
[15] O.M.Mozos et al, “Stress detection using wearable physiological and sociometricsensors”.
International Journal of Neural Systems,vol 27,issue 2, 2017.
33
34