0% found this document useful (0 votes)

27 views42 pages

Batch-11 DC

Uploaded by

selfik961

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views42 pages

Batch-11 DC

Uploaded by

selfik961

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

DETECTION OF EMPLOYEE STRESS USING

MACHINE LEARNING
A Project report submitted in partial fulfillment of the requirements for the award of the
Degree of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING

Submitted by

N. Chaitanya Venkata Sai 20B81A05B7

N. Sai Kumar 20B81A05B8
N. Swetha Kumari 20B81A05B9
N. Sasi Priya 20B81A05C0
N. Satya Harika 20B81A05C1

Under the Esteemed Guidance of

Mrs. K. Lakshmi Prasuna

Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SIR C R REDDY COLLEGE OF ENGINEERING
Approved by AICTE-Accredited by NBA
(Affiliated to Jawaharlal Nehru Technological University, Kakinada)

Eluru-534007
A.Y 2023-2024
i
DETECTION OF EMPLOYEE STRESS USING
MACHINE LEARNING
A Project report submitted in partial fulfillment of the requirements for the award ofthe
Degree of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted by

N. Chaitanya Venkata Sai 20B81A05B7

N. Sai Kumar 20B81A05B8
N. Swetha Kumari 20B81A05B9
N. Sasi Priya 20B81A05C0
N. Satya Harika 20B81A05C1

Under the Esteemed Guidance of

Mrs. K. Lakshmi Prasuna

Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SIR C R REDDY COLLEGE OF ENGINEERING
Approved by AICTE-Accredited by NBA
(Affiliated to Jawaharlal Nehru Technological University, Kakinada)

Eluru-534007
A.Y 2023-2024

ii
SIR C R REDDY COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE

This is to certify that the project work entitled DETECTION OF EMPLOYEE

STRESS USING MACHINE LEARNING has been successfully submitted by

N. Chaitanya Venkata Sai 20B81A05B7

N. Sai Kumar 20B81A05B8
N. Swetha Kumari 20B81A05B9
N. Sasi Priya 20B81A05C0
N. Satya Harika 20B81A05C1

in partial fulfillment for the award of the Degree of Bachelor of Technology in

Computer Science during the academic year 2023-2024 as a fulfilment for the

completion of “PROJECT WORK” in COMPUTER SCIENCE AND

ENGINEERING.

Mrs. K. Lakshmi Prasuna Dr. A. Yesu Babu, M.Tech ,Ph.D

Assistant Professor Head of the Department

Department of CSE Department of CSE

External Examiner
iii
DECLERATION

I hereby declare that the project entitled DETECTION TO EMPLOYEE SRESS

USING MACHINE LEARNING submitted for the B. Tech Degree is my original

work and the project has not formed the basis for the award of any degree,

associateship, fellowship or any other similar titles.

Place: Eluru N. Chaitanya Venkata Sai 20B81A05B7

Date: N. Sai Kumar 20B81A05B8
N. Swetha Kumari 20B81A05B9
N. Sasi Priya 20B81A05C0
N. Satya Harika 20B81A05C1

iv
ACKNOWLEDGEMENT

I would like to take this opportunity to thank our management and beloved principal
Dr. K. Venkateswara Rao M.Tech., Ph.D. for providing all the necessary facilities and a great
support to us in completing the project work.
The present project work is the several days study of the various aspects of the project
development. During this effort in the present study, I have received a great amount of help
from our Head of the Department Dr. A. YESUBABU M.Tech, Ph.D., whom I wish to acknowledge
and thank from the depth of my heart.
I am deeply indebted to my project guide, and our project coordinator Dr. G. Nirmala M.Tech,
Ph.D. for providing his opportunity and constant encouragement given by him during this course.
I am grateful for his valuable guidance and suggestions during my project work.
My parents have put us ahead of themselves. Because of their hard work & dedication, we had
opportunities beyond my wildest dreams. Finally, I express my thanks to all other faculty
members, classmates, friends and neighbors who helped me with the completion of my project
and without infinite love and patience this would never have been possible.

N. Chaitanya Venkata Sai 20B81A05B7

N. Sai Kumar 20B81A05B8
N. Swetha Kumari 20B81A05B9
N. Sasi Priya 20B81A05C0
N. Satya Harika 20B81A05C1

v
ABSTRACT

Disorders of stress are very casual thing among the employees who are working in corporate
sectors. As with changing work of people and their living lifestyle, we can see the increment
of stress in the working employees. Even many corporate sectors are providing variety of
schemes related to mental health and trying to reduce the disorders of stress in the working
environment, the disorder is very far from stopping. In our project, we are going to make use
of two techniques of machines to determine the amount of stress the employee is having who
is working in corporate sectors and try to narrow down the issues that identify the stress levels.
We are going to apply two techniques of machine learning.

Keywords: Random Forests, Support Vector Machine

vi
TABLE OF CONTENTS

SNO TITLE PAGE NO

1 INTRODUCTION 1
2 LITERATURE SURVEY 2
3 EXISTING SYSTEM 7
3.1 DISADVANTAGES
4 PROPOSED SYSTEM 8
4.1 SCOPE
4.2 OBJECTIVES
5 REQUIREMENT ANALYSIS 9
5.1 FUNCTIONAL REQUIREMENTS
5.2 NON-FUNCTIONAL REQUIREMENTS
5.3 SOFTWARE REQUIREMENTS
5.4 HARDWARE REQUIREMENTS
6 DESIGN AND METHODOLOGY 12
6.1 METHODOLOGY
6.2 SYSTEM DESIGN
7 IMPLEMENTATION 17
8 TESTING 23
8.1 TYPES OF TESTING
9 RESULTS AND DISCUSSION 26
10 CONCLUSION 31
11 REFERENCES 32

vii
viii
CHAPTER-1
INTRODUCTION

Disorders of stress which are related to mental health are not rare for the employees working
in corporate sectors. Some analysis done earlier have created some concern on the very same.
Based on the work done byAssociation of Industry, Assocham, we come to know that above
42% of the professional working employees in the corporate private sectors of India are
suffering from stress or common disorders of anxiety because of late night working hours
and also due to fixed timings. This part of singles is growing as mentioned in the Economic
Times of 2018 article which is dependent on the survey that was managed by the Optum.
There is a survey that considers the replies of nearly eight lakh working employees who are
working from more than seventy huge companies, with eachsingle company having its
employees more than 4,500 working professionals. The workplace which is free form stress
must be given at most importance for higher productivity and happy living for the working
employees. There are many steps which we can take to help the employeescome up with the
disorder of stress for well-being of the mental health likeassistance for counselling, guidance
given for the career, sessions for management of stress, and creating an awareness of health
identification ofworking employees who will need such kind of help will definitely improve
the rates of such kind of measures for becoming victorious. We try to make this happen by
using our machine learning techniques toovercome with a model that predicts the rate of the
stress that is accomplished. This approach is not only going to help company HR managers
to know better about their working professionals, it will also help in taking proper
precautions to reduce the chances of stress in their working employees.

1
CHAPTER-2
LITERATURE SURVEY

2.1 Measuring Post Traumatic Stress Disorder in Twitter.Glen

Coppersmith, Mark Dredze, and Craig Harman. 2014.
Traditional mental health studies rely on information primarily collectedthrough personal
contact with a health care professional. Recent work hasshown the utility of social media
data for studying depression, but there havebeen limited evaluations of other mental health
conditions. We consider PTSD, a serious condition that affects millions worldwide, with
especially high rates in military veterans. We also present a novel method to obtain a PTSD
classifier for social media using simple searches of available Twitter data, a significant
reduction in training data costcompared to previous work. We demonstrate its utility by
examiningdifferences in language use between PTSD and random individuals, building
classifiers to separate these two groups and by detecting elevated rates ofPTSD at and
around U.S. military bases using our classifiers. IntroductionMental health conditions
affect a significant percentage of the U.S. adultpopulation each year, including depression
(6.7%), eating disorders likeanorexia and bulimia (1.6%), bipolar disorder (2.6%) and post
-traumatic stressdisorder (PTSD) (3.5%).1 PTSD and other mental illnesses are difficult
todiagnose, with competing standards for diagnosis based on self-reports andtestimony
from friends and relatives.2 In recent years, several studies haveturned to social media
data to study mental health, since it provides anunbiased collection of a person’s language
and behavior, which has beenshown to be useful in diagnosing conditions (De Choudhury
2013).Additionally, from a public health standpoint, social media data and Web data in
general have enabled large scale analyses of a population’s health statusbeyond what has
previously been possible with traditional methods (Ayers etal. 2013). While social media
provides ample data for many types of publichealth analysis (Paul and Dredze 2011),
mental health studies still face seriouschallenges. First, other health work in social media,
such as disease surveillance (Brownstein, Copyright c 2014, Association for the
Advancement in the Artificial Intelligence(www.aaai.org). Allrights reserved
www.nimh.nih.gov/health/publications/the-numbers-count-mental-disorders- in-america 2
en.wikipedia.org/wiki/List of diagnostic classification and rating.

2
scales used in psychiatry Freifeld, and Madoff 2009; Chew and Eysenbach 2010; Lamb,
Paul, and Dredze 2013) and modeling (Sadilek, Kautz, andSilenzio 2012), rely on explicit
mentions of illness or health issues; if people are sick, they say so. In contrast, mental health
conditions largely display implicit changes in language and behavior, such as a switch in the
types of topics, a shift in word usage or a shift in frequency of posts. While DeChoudhury
et al. (2013) find some examples of explicit depression mentions, the focus is on more subtle
changes in language (e.g., pronoun use). Second, obtaining labeled data for a mental health
condition is challenging since we are examining implicit features of language. De
Choudhury et al. (2013) rely on (crowdsourced) volunteers to take depression surveys and
offer their Twitter feed for research. While this yields reliable data, it is time-consuming
and challenging to build large data sets for a diverse set of mental health conditions.
Furthermore, the necessary mental health evaluations such as the DSM (Diagnostic and
Statistical Manual of Mental Disorders)3 , are difficult to perform as these evaluations
require a trained diagnostician and have been criticized as unscientific and subjective (Insel
2013). Thus, relying on data from crowdsourced volunteers to build datasets of users with
diverse mental health conditions is difficult, and perhaps untenable. We provide an alternate
method for gathering samples that partially ameliorate these problems – ideally to be used
in concert with existing methods. In this paper, we study PTSD in Twitter data, one of the
first studies to consider social media for a mental health condition beyond depression (De
Choudhury, Counts, and Horvitz 2013; De Choudhury et al. 2013; Rosenquist, Fowler, and
Christakis 2010). Rather than rely on traditional PTSD diagnostic tools (Foa 1995) for
finding data, we demonstrate that some PTSD users can be easily and automatically
identified by scanning for tweets expressing explicit diagnoses. While it is natural to be
suspicious of self-identified reporting, we find that self-identifying PTSD users have
demonstrably different language usage patterns from the random users, according to the
Linguistic Inquiry Word Count (LIWC), a psychometrically validated analysis tool
(Pennebaker et al. 2007). We demonstrate elsewhere (Coppersmith, Dredze, and Harman
2014) that data obtained 3 en.wikipedia.org/wiki/Diagnostic and Statistical Manual

3
of Mental Disorders in this way replicates analyses performed via LIWC on the
crowdsourced survey respondents of De Choudhury et al. (2013). We also demonstrate that
users who self-identify are measurably different from randomusers by learning a classifier
to discriminate between self-identified and random users. We further show how this data can
be used to train a classifier that detects elevated incidences of PTSD in tweets from U.S.
military bases as compared to the general U.S. population, with a further increase around
bases that deployed combat troops overseas. We intend for this initial finding (whichis small,
but statistically significant) to be a demonstration of the types of analysis Twitter data
enables for public health. Given the small effect size, replication and further study are called
for. Data We used an automated analysis to find potential PTSD users, and then refined the
list manually. First, we had access to a large multi-year historical collection from the Twitter
keyword streaming API, where keywords were selected to focus on health topics. We used
a regular expression4 to search for statements where the user self-identifies as being
diagnosed with PTSD. The 477 matching tweets were manually reviewed to determine if
they indicated a genuine statement of a diagnosis for PTSD. Table 1 shows examples from
the 260 tweets that indicated a PTSD diagnosis. Next, we selected the username that
authored each of these tweets and retrieved up to the 3200 most recent tweets from that user
via the Twitter API. We then filtered out users with less than 25 tweets and those whose
tweets were not at least 75% in English (measured using an automated language ID system.)
This filtering left us with 244 users as positive examples. We repeated this process for a
group of randomly selected users. We randomly selected 10,000 usernames from a list of
users who postedto our historical collection within a selected two week window. We then
downloaded all tweets from these users. After filtering (as above) 5728 random users
remain, whose tweets were used as negative examples. Methods We use our positive and
negative PTSD data to train three classifiers: one unigram language model (ULM) examining
individual whole words, one character n-gram language model (CLM), and one from the
LIWC categories above. The LMs have been shown effective for Twitter classification tasks
(Bergsma et al. 2012) and LIWC has been previously used for analysis of

4
mental health in Twitter (De Choudhury et al. 2013). The language models measure the
probability that a word (ULM) or a string of characters (CLM) was generated by the same
underlying process as the training data. Here, one of each language model (clm+ and ulm+)
is trained from the tweets of PTSD users, and a second (clm− and ulm−) from the tweets
from random users. Each test tweet t is scored by comparing proabilities from 4Case
insensitive regex:\Wptsd\W|\Wp\.t\.s\.d\.\W|post[- ]traumatic[- ]stress[- ]disorder[- ] each
LM: s= lm+(t) lm−(t) (1) A threshold of 1 for s divides scores into positive and negative
classes. In a multi-class setting, the algorithm minimizes the cross entropy, selecting the
model with the highest probability. For each user, we calculate the proportion of tweets
scored positively by each LIWC category. These proportions are used as a feature vector in
a loglinear regression model (Pedregosa et al. 2011). Prior to training, we preprocess the text
of each tweet: we replaced all usernames with a single token (USER), lowercased all text,
and removed extraneous whitespace. We also excluded any tweet that contained a URL, as
these often pertain to events external to the user (e.g., national news stories). In total, we
used 463k PTSD tweets and sampled 463k non-PTSD tweets to create a balanced data set.

2.2 Stress Detection Using Low -Cost Heart Rate Sensors

Stress can also be detected using other, less common markers like accelerometer [15], key stroke
dynamics [16], or blinking [17]. It is also common to use a combination of several markers at
the expense of an increased system cost and user involvement. Fernandes et al. used GSR and
blood pressure (BP) markers [18] for determining stress. Sun et al. describe mental stress
detection using combined data from ECG, GSR, and accelerometer [19]. De Santos Sierra et al.
in [20] used GSR and HR. Rigas et al. used ECG, GSR, and respiration for detecting stress
while driving [21]. Wijsman et al. used ECG, respiration, GSR, and EMG of trapezius muscles
for mental stress detection [22]. Riera et al. combined EEG and EMG markers [23]. Singh and
Queyam used GSR, EMG, respiration, and HR [24] for detecting stress during driving. Pupil
diameter, ECG, and photoplethysmogram were used as markers by Mokhayeri et al [25]. Baltaci
and Gokcay used pupil diameter and temperature features in stress detection [26], while Choi
used HRV, respiration, GSR, EMG, acceleration, and geographical location [27].

New noncontact methods have also been developed recently to measure stress states. Some of
them are hyperspectral imaging technique [28], human voice [29, 30], pupil diameter [31],
visible spectrum camera [32], or using stereo thermal and visible sensors [33].

5
However, observing several markers for identifying stress requires an increasing number of
input sensors which in turn increases the overall price and lowers applicability. Prices for heart
rate meters range from $70 to $500 USD; GSR devices range from $100 to $500 USD, while
EMG devices have price ranges from $450 USD up to $1750 USD. Systems combining multiple
sensors are priced much higher. For such systems prices fall between $550 USD and $5700
USD, which already can be considered excessive for a mass telemedical lifestyle counseling
application. Therefore, in an ambient assisted living (AAL) system, the number of input sensors
should be kept minimal. In the rest of the paper, we focus on the simplest and most researched
sensor input, that is, the electrical activity of the heart.

As for the reliability of HRV sensors, there are still surprisingly few reviews reported in the
literature to date on the validation of the information content of low cost sensors compared to a
clinically accepted “gold standard” device. Some devices that were tested for validity are the
Sense Wear HR Armband [34], the Smart Health Watch [35], the Actiheart [36, 37], the
Equivital LifeMonitor [38], and the PulseOn [39]; and also the Bioharness multivariable
monitoring device from Zephyr has been tested for validity [40, 41] and reliability [41, 42]. In
all cases, a gold standard device was used simultaneously with the device under test as a method
for validating data. However, the validated devices above are high-end devices with a
considerable price which present an obstacle for the penetration of telemedicine. For example,
the Bio harness device has a price around $550 USD, whereas the price of low cost heart rate
meters varies from $70 USD to $100 USD. The lack of reliability tests of low cost devices was
our motivation for our device validation study.For automated stress detection, several methods
have been published which use only HRV. In 2008, Kim et al. collected HRV data from sixty-
eight subjects [43]. HRV data were collected during three different time periods. High stress
decreased HRV features. A maximum classification accuracy of 66.1% was achieved. Melillo
et al. in 2011 used nonlinear features of HRV for real-life stress detection [44]. HRV data were
collected two times, during university examination and after holidays, on 42 students. Most of
HRV features significantly decreased during stress period. Stress detection with classification
accuracy of 90% was reported using two Poincaré plot features and Approximate Entropy. One
year later, using the same data, they designed a classification tree for automatic stress detection
based on LF and pNN50 HRV features with sensitivity of 83.33% [45]. In 2013, Karthikeyan
et al. created stress detection classifiers from ECG signal and HRV features [46]. Vanitha and
Suresh used a hierarchical classifier to classify stress into four levels with a classification
efficiency of 92% [47] in 2014.
6
CHAPTER-3
EXISTING SYSTEM

Traditional methods for detecting employee stress include surveys and self-reporting, which
can be subjective and time-consuming. Other methods include physiological measures such as
heart rate variability and cortisol levels, which can be invasive and require specialized
equipment. These methods also require significant expertise to interpret the data accurately.
3.1 DISADVANTAGES
• The disadvantages of existing methods are that they can be time-consuming, subjective.
• Surveys and self-reporting methods rely on the employee's willingness and ability to accurately
report their stress levels, which can be influenced by factors such as social desirability bias or
lack of self-awareness.
• Physiological measures such as heart rate variability and cortisol levels can be invasive and
require specialized equipment and expertise to interpret the data accurately

7
CHAPTER-4
PROPOSED SYSTEM

In this to detect employee stress by using machine learning algorithms such as SVM and
Random Forest Algorithms. To detect stress we are using social media dataset such as tweets
where employee can share their views and by analyzing this views we can identify whether
employee is in relax or stress mood but by analyzing this views manually may take lot of
human efforts so author using machine learning algorithms and the experiment with this
algorithms show stress detection accuracy more than 90%.

4.1 SCOPE

Since this project is associated with the social problem which is one of the enormously growing
field the scope is pretty high and it helps the society in a way which can identify the victims of
stress which is one of the most commonly identified disorder among the adolescents. The scope
of detecting employee stress using twitter dataset by Support Vector Machine and Random
Forest algorithms is significant. SVM excels in classifying data by finding the optimal
hyperplane that separates different classes, while Random Forest utilizes an ensemble of
decision trees to make predictions. However, the effectiveness ultimately depends on the quality
and relevance of the input data and the implementation of the algorithms.

4.2 OBJECTIVE

The objective of using Support Vector Machine (SVM) and Random Forest algorithms for the
detection of employee stress is to develop predictive models that can analyse various features
or factors associated with employees and classify them into stressed or non-stressed categories.
These algorithms aim to accurately predict and identify employees who may be experiencing
stress, which can help organizations take proactive measures to address employee well-being.

8
CHAPTER-5

REQURIMENT ANALYSIS

5.1 FUNCTIONAL REQUIREMENTS

1. Data Collection

2. Data Preprocessing

3. Training and Testing

4. Modeling

5. Predicting

5.1.1 Data Collection

Initially, we collect a dataset for our personality prediction system. After the collection of the dataset,
we split the dataset into training data and testing data. The training dataset is used for prediction model
learning and testing data is used for evaluating the prediction model. For this project, 90% of training
data is used and 10% of data is used for testing.

5.1.2 Data Preprocessing

Data pre-processing is an important step for the creation of a machine learning model. Initially, data may
not be clean or in the required format for the model which can cause misleading outcomes. In pre-
processing of data, we transform data into our required format. It is used to deal with noises, duplicates,
and missing values of the dataset. Data pre-processing has activities like importing datasets, splitting
datasets, attribute scaling, etc. Preprocessing of data is required for improving the accuracy of the model.

5.1.3 Training and Testing

Training a machine learning (ML) model is a process in which a machine learning algorithm is fed with
training data from which it can learn. Model training is the primary step in machine learning, resulting
in a working model that can then be validated, tested and deployed. Both the quality of the training data
and the choice of the algorithm are central to the model training phase. In most cases, training data is
split into two sets for training and then validation and testing. The type of training data that we provide
to the model is highly responsible for the model's accuracy and prediction ability. It means that the better
the quality of the training data, the better the performance of the model will be. Our training data is equal
9
to 90% of the total data.
Once we train the model with the training dataset, it's time to test the model with the test dataset.
This dataset evaluates the performance of the model and ensures that the model can generalize
well with the new or unseen dataset. Test data is a well-organized dataset that contains data for
each type of scenario for a given problem that the model would be facing when used in the real
world. The test dataset is 10% of the total original data for this project

5.1.4 Modelling
Machine learning models are created by training algorithms with either labeled or unlabeled data, or a
mix of both. As a result, there are three primary ways to train and produce a machine learning algorithm:

 Supervised learning: Supervised learning occurs when an algorithm is trained using “labelled
data”, or data that is tagged with a label so that an algorithm can successfully learn from it. Training
an algorithm with labelled data helps the eventual machine learning model know how to classify
data in the manner that the researcher desires.
 Unsupervised learning: Unsupervised learning uses unlabeled data to train an algorithm. In this
process, the algorithm finds patterns in the data itself and creates its own data clusters.
Unsupervised learning is helpful for researchers who are looking to find patterns in data that are
currently unknown to them.

5.1.5 Predicting
The trained model upon giving the new data makes prediction. When the new input or the
test data is given to the trained model, it predicts the personality of the user based on the
input data which is given. The trained model predicts well since the trained data used is
more than 60%. The algorithm also plays a major role in making the model predict well.

5.2 NON-FUNCTIONAL REQUIREMENTS

NON-FUNCTIONAL REQUIREMENT (NFR) specifies the quality attribute of a software
system. They judge the software system based on Responsiveness, Usability, Security,
Portability and other non-functional standards that are critical to the success of the software
system. Example of nonfunctional requirement, “how fast does the website load?” Failing to
meet non-functional requirements can result in systems that fail to satisfy user needs. Non-
functional Requirements allows you to impose constraints or restrictions on the design of the
system across the various agile backlogs. Example, the site should load in 3 seconds when the
number of simultaneous users are > 10000. Description of non-functional requirements is just
10
as critical as a functional requirement.

EXAMPLES OF NON-FUNCTIONAL REQUIREMENTS

 Users must upload dataset

 Privacy of information, the export of restricted technologies,intellectual property

rights, etc. should be audited.

5.3 SOFTWARE REQUIREMENTS

The functional requirements or the overall description documents include the product
perspective and features, operating system and operating environment, graphics
requirements, design constraints and user documentation.
The appropriation of requirements and implementation constraints gives the general
overview of the project in regards to what the areas of strength and deficit are and how to
tackle them.

 Python idel 3.7 version (or)

 Anaconda 3.7 (or)

 Jupiter (or)

 Google colab

5.4 HARDWARE REQUIREMENTS

Minimum hardware requirements are very dependent on the particular software being
developed by a given Enthought Python / Canopy / VS Code user. Applications that need to
store large arrays/objects in memory will require more RAM, whereas applications that
need to perform numerous calculations or tasks more quickly will require a faster
processor.
 Operating system: windows, linux

 Processor : minimum intel i3

 Ram : minimum 4 gb

 Hard disk : minimum 250gb

11
CHAPTER-6
DESIGN AND METHODOLOGY

6.1 METHODOLOGY

RANDOM FOREST ALGORITHM

Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in ML. It
is based on the concept of ensemble learning, which is a process of combiningmultiple
classifiers to solve a complex problem and to improve the performance of the model.

As the name suggests, "Random Forest is a classifier that contains a number of decision
trees on various subsets of the given dataset and takes the average to improve the predictive
accuracy of that dataset." Instead of relying on one decision tree, the random forest takes
the prediction from each tree and based on the majority votes of predictions, and it predicts
the final output.

The greater number of trees in the forest leads to higher accuracy and prevents
the problem of overfitting.

12
SUPPORT VECTOR MACHINE

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However,primarily, it is
used for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in
the correct category in the future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector
Machine. Consider the below diagram in which there are two different categories that are
classified using a decision boundary or hyperplane:

13
6.2 SYSTEM DESIGN

UML DIAGRAMS

The System Design Document describes the system requirements,operating environment,

system and subsystem architecture, files and database design, input formats, output layouts,
human-machine interfaces, detailed design, processing logic, and external interfaces.

6.2.1 CLASS DIAGRAM

In software engineering, a class diagram in the Unified Modeling Language (UML) is a type
of static structure diagram that describes the structure of a system by showing the system's
classes, their attributes, operations (or methods), and the relationships among the classes. It
explains which class contains information.

6.2.2 INTERACTION DIAGRAMS

This interactive behavior is represented in UML by two diagrams known as Sequence

diagram and Collaboration diagram. The basic purpose of both the diagrams are similar. The
purpose of interaction diagrams is to visualize the interactive behavior of the system.
Visualizing the interaction is a difficult task. Hence, the solution is to use different types of
models to capture the different aspects of the interaction.

14
6.2.2.1 SEQUENCE DIAGRAM

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram

that shows how processes operate with one another and in what order. It is a construct of a
Message Sequence Chart. Sequence diagrams are sometimes called event diagrams, event
scenarios, and timing diagrams.

6.2.3 USE CASE DIAGRAM

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented
as use cases), and any dependencies between those use cases. The main purposeof a use
case diagram is to show what system functions are performed for which actor. Roles of the
actors in the system can be depicted.

15
6.2.3.1 CONTROL FLOW DIAGRAM

A control-flow diagram (CFD) is a diagram to describe the control flow of a business

process, process or review. Control diagrams are graphical notations specially designed to
represent event and control flows. Data flow is represented by solid arrow whereas control
flow is represented by dashed or shaded arrow.

16
CHAPTER-7
IMPLEMENTATION

from tkinter import messagebox

from tkinter import *
from tkinter.filedialog import askopenfilename
from tkinter import simpledialog
import tkinter
import numpy as np
from tkinter import filedialog
import pandas as pd
import os
from sklearn.feature_extraction.text import CountVectorizer
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
import re
from sklearn.model_selection import train_test_split
from nltk.corpus import stopwords
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn import svm
import matplotlib.pyplot as plt

stop_words = set(stopwords.words('english'))

main = tkinter.Tk()
main.title("Detection of Employee Stress Using Machine Learning")
main.geometry("1300x1200")

global model
global filename
global tokenizer
global X 17
global Y
global X_train, X_test, Y_train, Y_test
global XX
word_count = 0
global svm_acc,rf_acc
global model

def upload():
global filename
filename = filedialog.askopenfilename(initialdir = "Tweets")
pathlabel.config(text=filename)
textarea.delete('1.0', END)
textarea.insert(END,'tweets dataset loaded\n')

def preprocess():
global X
global Y
global word_count
X = []
Y = []
textarea.delete('1.0', END)
train = pd.read_csv(filename,encoding='iso-8859-1')
word_count = 0
words = []
for i in range(len(train)):
label = train.get_value(i,2,takeable = True)
tweet = train.get_value(i,1,takeable = True)
tweet = tweet.lower()
arr = tweet.split(" ")
msg = ''
for k in range(len(arr)):
word = arr[k].strip()
if len(word) > 2 and word not in stop_words:
msg+=word+" " 18
if word not in words:
words.append(word);
text = msg.strip()
X.append(text)
Y.append(int(label))
X = np.asarray(X)
Y = np.asarray(Y)
word_count = len(words)
textarea.insert(END,'Total tweets found in dataset : '+str(len(X))+"\n")
textarea.insert(END,'Total words found in all tweets : '+str(len(words))+"\n\n")
featureExtraction()

def featureExtraction():
global X
global Y
global XX
global tokenizer
global X_train, X_test, Y_train, Y_test
max_fatures = word_count
tokenizer = Tokenizer(num_words=max_fatures, split=' ')
tokenizer.fit_on_texts(X)
XX = tokenizer.texts_to_sequences(X)
XX = pad_sequences(XX)
indices = np.arange(XX.shape[0])
np.random.shuffle(indices)
XX = XX[indices]
Y = Y[indices]
X_train, X_test, Y_train, Y_test = train_test_split(XX,Y, test_size = 0.13, random_state = 42)
textarea.insert(END,'Total features extracted from tweets are : '+str(X_train.shape[1])+"\n")
textarea.insert(END,'Total splitted records used for training : '+str(len(X_train))+"\n")
textarea.insert(END,'Total splitted records used for testing : '+str(len(X_test))+"\n")

def SVM():
textarea.delete('1.0', END) 19
global svm_acc
rfc = svm.SVC(C=2.0,gamma='scale',kernel = 'rbf', random_state = 2)
rfc.fit(X_train, Y_train)
textarea.insert(END,"SVM Prediction Results\n")
prediction_data = rfc.predict(X_test)
svm_acc = accuracy_score(Y_test,prediction_data)*100
textarea.insert(END,"SVM Accuracy : "+str(svm_acc)+"\n\n")

def RandomForest():
global rf_acc
global model
rfc = RandomForestClassifier(n_estimators=20, random_state=0)
rfc.fit(X_train, Y_train)
textarea.insert(END,"Random Forest Prediction Results\n")
prediction_data = rfc.predict(X_test)
rf_acc = accuracy_score(Y_test,prediction_data)*100
textarea.insert(END,"Random Forest Accuracy : "+str(rf_acc)+"\n")
model = rfc
def predict():
textarea.delete('1.0', END)
testfile = filedialog.askopenfilename(initialdir = "Tweets")
test = pd.read_csv(testfile,encoding='iso-8859-1')

for i in range(len(test)):
tweet = test.get_value(i,0,takeable = True)
arr = tweet.split(" ")
msg = ''
for j in range(len(arr)):
word = arr[j].strip()
if len(word) > 2 and word not in stop_words:
msg+=word+" "
text = msg.strip()
mytext = [text]
twts = tokenizer.texts_to_sequences(mytext) 20
twts = pad_sequences(twts, maxlen=83, dtype='int32', value=0)
stress = model.predict(twts)
print(stress)
if stress == 0:
textarea.insert(END,text+' === Prediction Result : Not Stressed\n\n')
if stress == 1:
textarea.insert(END,text+' === Prediction Result : Stressed\n\n')

def graph():
height = [svm_acc,rf_acc]
bars = ('SVM ACC','Random Forest ACC')
y_pos = np.arange(len(bars))
plt.bar(y_pos, height)
plt.xticks(y_pos, bars)
plt.show()

font = ('times', 16, 'bold')

title = Label(main, text='Detection of Employee Stress Using Machine Learning')
title.config(bg='yellow green', fg='saddle brown')
title.config(font=font)
title.config(height=3, width=120)
title.place(x=0,y=5)

font1 = ('times', 14, 'bold')

upload = Button(main, text="Upload Tweets Dataset", command=upload)
upload.place(x=780,y=100)
upload.config(font=font1)

pathlabel = Label(main)
pathlabel.config(bg='royal blue', fg='rosy brown')
pathlabel.config(font=font1)
pathlabel.place(x=780,y=150)
21
preprocessButton = Button(main, text="Data Preprocessing & Features
Extraction",command=preprocess)
preprocessButton.place(x=780,y=200)
preprocessButton.config(font=font1)

svmButton = Button(main, text="Run SVM Algorithm", command=SVM)

svmButton.place(x=780,y=250)
svmButton.config(font=font1)

rfButton = Button(main, text="Run Random Forest Algorithm", command=RandomForest)

rfButton.place(x=780,y=300)
rfButton.config(font=font1)

classifyButton = Button(main, text="Predict Stress", command=predict)

classifyButton.place(x=780,y=350)
classifyButton.config(font=font1)

modelButton = Button(main, text="Accuracy Graph", command=graph)

modelButton.place(x=780,y=400)
modelButton.config(font=font1)

font1 = ('times', 12, 'bold')

textarea=Text(main,height=30,width=90)
scroll=Scrollbar(textarea)
textarea.configure(yscrollcommand=scroll.set)
textarea.place(x=10,y=100)
textarea.config(font=font1)

main.config(bg='cadet blue')
main.mainloop()

22
CHAPTER-8

TESTING

TESTING
Testing is a process of executing a program with the aim of finding error. To make our
software perform well it should be error free. If testing is done successfully, it will remove
all the errors from the software.

8.1 TYPES OF TESTING

 White Box Testing

 Black Box Testing

 Unit testing

 Integration Testing

 Alpha Testing

 Beta Testing

 Performance Testing and so on

8.1.1 White Box Testing

Testing technique based on knowledge of the internal logic of an application's code and
includes tests like coverage of code statements, branches, paths, conditions. It is performed
by software developers

8.1.2 Performance Testing

Functional testing conducted to evaluate the compliance of a system or component with
specified performance requirements. It is usually conducted by the performance engineer.

8.1.3 Black Box Testing

Blackbox testing is testing the functionality of an application withoutknowing the details of
its implementation including internal program structure, data structures etc. Test cases for
black box testing are created based on the requirement specifications. Therefore, it is also
23
called as specification-based testing. Below represents the black box testing:

Fig.: Black Box Testing

When applied to machine learning models, black box testing would mean testing machine
learning models without knowing the internal details such as features of the machine learning

model, the algorithm used to create the model etc. The challenge, however,is to verify the test
outcome against the expected values that are known beforehand.

Fig.: Black Box Testing for Machine Learning algorithms

Input Actual Output PredictedOutput

[16,6,324,0,0,0,22,0,0,0,0,0,0] 0 0

[16,7,263,7,0,2,700,9,10,1153,832,9,2] 1 1
24
The above Fig represents the black box testing procedure for machinelearning algorithms.The
model gives out the correct out the model gives out the correct output when different inputs
are givenwhich are mentioned in Table. Therefore, the program is said to be executed as
expected or correct program

25
CHAPTER-9
RESULTS AND DISCUSSION

In above screen click on ‘Upload Tweets Dataset’ button to load dataset

26
In above screen select ‘stress_tweets.csv’ dataset and then click on ‘Open’ button to load
datasetand to get below screen.
 Upload Tweets Dataset
 Data Processing and features extraction
 Run Support Vector Machine
 Run Random Forest
 Predict stress
 Accuracy graph

In above screen click on ‘Data Preprocessing & Features Extraction’ button to read dataset

and to clean and extract features such as words from dataset and find total records in

dataset, totalwords and application using how many records for training and testing.

27
In above screen dataset contains total 10314 tweets and all tweets contain 30790 words and

total unique words are 83 and application using 8973 records for training and 1341 for testing.

Now both train and test data is ready and now click on ‘Run SVM Algorithm’ button to trained

data using SVM machine learning algorithm.

28
In above screen SVM got 89.85 correctly predicted accuracy from test data and now click on

Run Random Forest Algorithm’ button to calculate its accuracy

In above screen random forest got 97.31 correctly prediction accuracy and now click on ‘Predict

Stress’ button and upload test file which contains tweets and by analyzing those tweets machine

learning algorithm will predict whether tweets contain any stress data or not. Below is the

screen shots of test tweets which we upload in next screen

In above screen uploading ‘test’ file and now click on ‘Open’ button to predict stress

29
In above screen beside each tweet we can see predicted result as Stressed or Not stressed.

From above screen we can see application detecting stress successfully from messages and

now click on ‘Accuracy Graph’ button to get below comparison graph

In above x-axis represents algorithm name and y-axis represents accuracy of those algorithms
and from above graph we can say random forest is better than Support Vector Machine.

30
CHAPTER-10
CONCLUSION

Gender, also the family background which has the illness, and considering whether a single employer
provides the conceptual benefits of health for their employees was having more significance
compared to the other factors for determining whether an employee can obtain conceptual health
associated issues. From our study, we were able to find that the people who are working in the tech
companies are at more risk of obtaining stress, even though their job role was not based on tech.
These perceptions could be successfully used by business companies tomake more desirable HR
strategies for the working employees. A 75% correctness shows that the application of two Machine
Learning techniques ( i.e.SVM and Random forest) for predicting the stress and conceptual health
conditions provides worthy results and could be searched further, and thus meets the aim of this
project.

31
CHAPTER-11

REFERENCES

[1] Detecting and characterizing Mental Health Related Self-Disclosure in social media.
SairamBalani and Munmun De Choudhury. 2015.In Proceedings of the 33rd Annual ACM
Conference Extended Abstracts on Human Factors in Computing Systems -CHI EA ‟15, pages
1373–1378.
[2] Measuring Post Traumatic Stress Disorder in Twitter. Glen Coppersmith, Mark Dredze, and
Craig Harman. 2014.
[3] Role of social media in Tackling Challenges in Mental Health. Munmun De Choudhury. 2013.
[4] Bhattacharyya, R., &Basu, S. (2018). India Inc looks to deal with rising stress in employees.
Retrieved from „The Economic Times‟
[5] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., &Vanderplas,
J. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct),
2825-2
[6] OSMI Mental Health in Tech Survey Dataset, 2017 from Kaggle.

[7] Van den Broeck, J., Cunningham, S. A., Eeckels, R., &Herbst, K. (2005). Data cleaning:
detecting, diagnosing, and editing data abnormalities. PLoS medicine, 2(10), e267.
[8] Relationship between Job Stress and Self-Rated Health among Japanese Full TimeOccupational
Physicians Takashi Shimizu and Shoji Nagata 2007 Academic Papers inJapanese 2007.
[9] Tomar, D., & Agarwal, S. (2013). A survey on Data Mining approaches for healthcare.
International Journal of Bio-Science and Bio-Technology, 5(5), 241-266.
[10] Gender and Stress. (n.d.). Retrieved from APA press release 2010

[11] Julie Aitken Harris, Robert Slatestone and Maryann Fraboni. (2000) An Evaluationof the Job
Stress Questionnaire with a Sample of Entrepreneurs”2000 JSQ scale Entrepreneurs.

[12] “Demographic and Workplace Characteristics which add to the Prediction of Stress and Job
Satisfaction within the Police Workplace”, Jeremy D. Davey, Patricial L. Obst, and Mary C.
Sheehan 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive

32
Computing (ICCICC). 2015.

[13] Mario Salai, Istv an Vass anyi, and Istv an Kosa, “Stress Detection using low-cost Heart
Rate sensors”, Journal of Healthcare Engineering, pp.1-13,Hindawi Publishing corporation. , 2016
[14] Shwetha, S, Sahil, A, Anant Kumar J, (2017) Predictive analysis using classification
techniques in healthcare domain, International Journal of Linguistics & Computing Research,
ISSN: 2456-8848, Vol. I, Issue.I, June-2017.
[15] O.M.Mozos et al, “Stress detection using wearable physiological and sociometricsensors”.
International Journal of Neural Systems,vol 27,issue 2, 2017.

33
34

MC Female Home Challenge 6.0 Cut
100% (2)
MC Female Home Challenge 6.0 Cut
22 pages
Hospital Management System
88% (8)
Hospital Management System
71 pages
Topic 7 - Challenge Risk and Safety
No ratings yet
Topic 7 - Challenge Risk and Safety
83 pages
Mini Project Campus Predictor Report
0% (1)
Mini Project Campus Predictor Report
46 pages
Online Blood Donation Management System Report
56% (16)
Online Blood Donation Management System Report
66 pages
Job Attentive: Online Job Portal Report
No ratings yet
Job Attentive: Online Job Portal Report
36 pages
6 May File Major Surbhi2024
No ratings yet
6 May File Major Surbhi2024
86 pages
Review 2-Batch 10
No ratings yet
Review 2-Batch 10
15 pages
Final
No ratings yet
Final
58 pages
Parkinson's Disease Detection
100% (1)
Parkinson's Disease Detection
88 pages
Yoga Management Report (Final) - 2
No ratings yet
Yoga Management Report (Final) - 2
51 pages
Final Documentation
No ratings yet
Final Documentation
93 pages
Cotaskreport
No ratings yet
Cotaskreport
43 pages
Courier Service System
No ratings yet
Courier Service System
61 pages
Batch 4 - Revolutionizing Blood Cell Analysis
No ratings yet
Batch 4 - Revolutionizing Blood Cell Analysis
79 pages
Batch 7
No ratings yet
Batch 7
83 pages
Final Minor
No ratings yet
Final Minor
18 pages
Front
No ratings yet
Front
9 pages
PRJCT
No ratings yet
PRJCT
69 pages
Facemask Surveillance Project
No ratings yet
Facemask Surveillance Project
47 pages
Final Reportrrrrttnb
No ratings yet
Final Reportrrrrttnb
60 pages
Predict College Acceptance
No ratings yet
Predict College Acceptance
29 pages
DBMS Mini Project Report Format 2021
No ratings yet
DBMS Mini Project Report Format 2021
17 pages
Mini Project
No ratings yet
Mini Project
40 pages
Project Stage Edit
No ratings yet
Project Stage Edit
39 pages
Jyotheesh RCB
No ratings yet
Jyotheesh RCB
51 pages
Final Report
No ratings yet
Final Report
26 pages
Weather Forecasting Krish Project
No ratings yet
Weather Forecasting Krish Project
27 pages
Machine Learning Based Detection of Cardiovascular Disease Using ECG Signals-1
No ratings yet
Machine Learning Based Detection of Cardiovascular Disease Using ECG Signals-1
22 pages
Blood Banking System
No ratings yet
Blood Banking System
5 pages
QR Code Based Attendence Major Project
No ratings yet
QR Code Based Attendence Major Project
62 pages
Admission Presection
No ratings yet
Admission Presection
49 pages
GSMM Based Using Biomedical
No ratings yet
GSMM Based Using Biomedical
37 pages
Mini Project: Diploma in Computer Engineering
No ratings yet
Mini Project: Diploma in Computer Engineering
30 pages
Hospital Management System A Project Rep
No ratings yet
Hospital Management System A Project Rep
70 pages
ANJALI
No ratings yet
ANJALI
55 pages
Online Blood Donation Management System Report ( ( ( ( (
No ratings yet
Online Blood Donation Management System Report ( ( ( ( (
66 pages
Review 2-Batch 2
No ratings yet
Review 2-Batch 2
14 pages
Electricity Theft Documentationdemo
No ratings yet
Electricity Theft Documentationdemo
80 pages
Bood Bank Report FINAL
No ratings yet
Bood Bank Report FINAL
37 pages
Health Care Awareness Programmes and Their Impact
No ratings yet
Health Care Awareness Programmes and Their Impact
42 pages
Track It Document
No ratings yet
Track It Document
39 pages
Visvesvaraya Technological University: Medicine Database
No ratings yet
Visvesvaraya Technological University: Medicine Database
3 pages
Blood Management System
No ratings yet
Blood Management System
66 pages
Certificate
No ratings yet
Certificate
4 pages
Sample Document
No ratings yet
Sample Document
29 pages
Itfilenew
No ratings yet
Itfilenew
35 pages
Sports Management System
No ratings yet
Sports Management System
59 pages
MY PROJECT-output
No ratings yet
MY PROJECT-output
55 pages
EVENT MANAGEMENT SYSTEM Sai Deepak
No ratings yet
EVENT MANAGEMENT SYSTEM Sai Deepak
51 pages
Dr. Ambedkar Institute of Technology: "Smart Charge Drive: A Next Generation Electric Vehicle With Wireless Charging "
No ratings yet
Dr. Ambedkar Institute of Technology: "Smart Charge Drive: A Next Generation Electric Vehicle With Wireless Charging "
5 pages
IoT Food Spoilage Detection Report
No ratings yet
IoT Food Spoilage Detection Report
53 pages
WWW Scribd
No ratings yet
WWW Scribd
1 page
Srgi E Gayan Portal
No ratings yet
Srgi E Gayan Portal
50 pages
Certificate For Sakshi For - FEEDBACK FORM
No ratings yet
Certificate For Sakshi For - FEEDBACK FORM
1 page
Ajit Tiwari Laptop
No ratings yet
Ajit Tiwari Laptop
69 pages
Document FINAL 9
No ratings yet
Document FINAL 9
61 pages
Shailja Mishra Project Mba - Odt
No ratings yet
Shailja Mishra Project Mba - Odt
20 pages
Major Project 112
No ratings yet
Major Project 112
60 pages
Research Paper 2 Group 3 Watson
No ratings yet
Research Paper 2 Group 3 Watson
6 pages
DM GTU Study Material E-Notes Unit-4 29012022085557AM
No ratings yet
DM GTU Study Material E-Notes Unit-4 29012022085557AM
12 pages
1.introduction To Surveying
No ratings yet
1.introduction To Surveying
10 pages
Project Topics On Law of Evidence
No ratings yet
Project Topics On Law of Evidence
5 pages
Ci Driver Do Motor Do CD Rom Datasheet
No ratings yet
Ci Driver Do Motor Do CD Rom Datasheet
11 pages
Calcaneus Anatomy Overview
No ratings yet
Calcaneus Anatomy Overview
4 pages
In An Artist's Studio
50% (2)
In An Artist's Studio
4 pages
Secure Stock 2081-0709
No ratings yet
Secure Stock 2081-0709
3 pages
Economics of Oil Prices 2
No ratings yet
Economics of Oil Prices 2
8 pages
Technical Vocational Education: Quarter 1-Week4-Module 4
No ratings yet
Technical Vocational Education: Quarter 1-Week4-Module 4
20 pages
Ii M.A. English Men 33 - Contemporary Literary Theory-I
No ratings yet
Ii M.A. English Men 33 - Contemporary Literary Theory-I
16 pages
Martin Et Al Manuscript Final
No ratings yet
Martin Et Al Manuscript Final
74 pages
Career Adaptation Strategies
No ratings yet
Career Adaptation Strategies
4 pages
The Genius Guide To - Divine Archetypes
100% (1)
The Genius Guide To - Divine Archetypes
18 pages
Artistic Skills and Techniques To Contemporary Art Creations
No ratings yet
Artistic Skills and Techniques To Contemporary Art Creations
40 pages
Namma Kalvi 12th Zoology Question Bank em 217045
No ratings yet
Namma Kalvi 12th Zoology Question Bank em 217045
45 pages
Avasthas of Planets
No ratings yet
Avasthas of Planets
13 pages
Android-Controlled Pesticide Spraying Robot
No ratings yet
Android-Controlled Pesticide Spraying Robot
6 pages
Anchoring Script For Sports Day
No ratings yet
Anchoring Script For Sports Day
17 pages
Review of Invisalign System
No ratings yet
Review of Invisalign System
13 pages
Meaning and Discourse: Dr. Manjet Kaur Dr. Omer Mahfoodh
No ratings yet
Meaning and Discourse: Dr. Manjet Kaur Dr. Omer Mahfoodh
59 pages
Colour Dilution Alopecia in Doberman Pinschers With Blue or Fawn Coat Colours - A Study On The Incidence and Histopathology of This Di
No ratings yet
Colour Dilution Alopecia in Doberman Pinschers With Blue or Fawn Coat Colours - A Study On The Incidence and Histopathology of This Di
10 pages
Pol Science H
No ratings yet
Pol Science H
269 pages
Patrolling
No ratings yet
Patrolling
31 pages
Industrial Two Roll Mill Quotation
No ratings yet
Industrial Two Roll Mill Quotation
3 pages
A Study On Customer Satisfaction at HDFC Bank Vijayapura
No ratings yet
A Study On Customer Satisfaction at HDFC Bank Vijayapura
85 pages
Agriengineering 06 00187
No ratings yet
Agriengineering 06 00187
18 pages
2022 Article 3361
No ratings yet
2022 Article 3361
18 pages

Batch-11 DC

Uploaded by

Batch-11 DC

Uploaded by

DETECTION OF EMPLOYEE STRESS USING

N. Chaitanya Venkata Sai 20B81A05B7

Under the Esteemed Guidance of

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

N. Chaitanya Venkata Sai 20B81A05B7

Under the Esteemed Guidance of

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

This is to certify that the project work entitled DETECTION OF EMPLOYEE

STRESS USING MACHINE LEARNING has been successfully submitted by

N. Chaitanya Venkata Sai 20B81A05B7

in partial fulfillment for the award of the Degree of Bachelor of Technology in

completion of “PROJECT WORK” in COMPUTER SCIENCE AND

Mrs. K. Lakshmi Prasuna Dr. A. Yesu Babu, M.Tech ,Ph.D

Assistant Professor Head of the Department

Department of CSE Department of CSE

I hereby declare that the project entitled DETECTION TO EMPLOYEE SRESS

USING MACHINE LEARNING submitted for the B. Tech Degree is my original

associateship, fellowship or any other similar titles.

Place: Eluru N. Chaitanya Venkata Sai 20B81A05B7

N. Chaitanya Venkata Sai 20B81A05B7

Keywords: Random Forests, Support Vector Machine

SNO TITLE PAGE NO

2.1 Measuring Post Traumatic Stress Disorder in Twitter.Glen

2.2 Stress Detection Using Low -Cost Heart Rate Sensors

5.1 FUNCTIONAL REQUIREMENTS

3. Training and Testing

5.1.1 Data Collection

5.1.2 Data Preprocessing

5.1.3 Training and Testing

5.2 NON-FUNCTIONAL REQUIREMENTS

EXAMPLES OF NON-FUNCTIONAL REQUIREMENTS

 Users must upload dataset

 Privacy of information, the export of restricted technologies,intellectual property

5.3 SOFTWARE REQUIREMENTS

 Python idel 3.7 version (or)

 Anaconda 3.7 (or)

5.4 HARDWARE REQUIREMENTS

 Processor : minimum intel i3

 Hard disk : minimum 250gb

RANDOM FOREST ALGORITHM

The System Design Document describes the system requirements,operating environment,

6.2.1 CLASS DIAGRAM

6.2.2 INTERACTION DIAGRAMS

This interactive behavior is represented in UML by two diagrams known as Sequence

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram

6.2.3 USE CASE DIAGRAM

A control-flow diagram (CFD) is a diagram to describe the control flow of a business

from tkinter import messagebox

font = ('times', 16, 'bold')

font1 = ('times', 14, 'bold')

svmButton = Button(main, text="Run SVM Algorithm", command=SVM)

rfButton = Button(main, text="Run Random Forest Algorithm", command=RandomForest)

classifyButton = Button(main, text="Predict Stress", command=predict)

modelButton = Button(main, text="Accuracy Graph", command=graph)

font1 = ('times', 12, 'bold')

8.1 TYPES OF TESTING

 White Box Testing

 Black Box Testing

 Performance Testing and so on

8.1.1 White Box Testing

8.1.2 Performance Testing

8.1.3 Black Box Testing

Fig.: Black Box Testing

Fig.: Black Box Testing for Machine Learning algorithms

Input Actual Output PredictedOutput

In above screen click on ‘Upload Tweets Dataset’ button to load dataset

data using SVM machine learning algorithm.

Run Random Forest Algorithm’ button to calculate its accuracy

screen shots of test tweets which we upload in next screen

now click on ‘Accuracy Graph’ button to get below comparison graph

You might also like