AI Chatbots for Healthcare
AI Chatbots for Healthcare
Chatbot
A Deep Neural Network Based Human to Machine Conversation Model
Abstract— A conversational agent (chatbot) is computer architectures, a detailed survey was conducted, where
software capable of communicating with humans using interrelated literature published over the past few years is
natural language processing. The crucial part of building studied, and a newly presented neural network model is now
any chatbot is the development of conversation. Despite trained with conversational data. Deep learning and NLP
many developments in Natural Language Processing (NLP) techniques are being used in advanced research and
and Artificial Intelligence (AI), creating a good chatbot development projects, AI and ML algorithms are implemented
model remains a significant challenge in this field even in development of conversations. R&D (Research and
today. A conversational bot can be used for countless Development) are still under progress and experimentation in
errands. In general, they need to understand the user's these fields. Conversation agents are predominately used by
intent and deliver appropriate replies. This is a software government administrations, businesses, and non-profit
program of a conversational interface that allows a user to establishments. They are often organized by financial
converse in the same manner one would address a human. institutions like banking, insurance, startup companies, online
Hence, these are used in almost every customer stores and social service sectors. These chatbots are
communication platform, like social networks. At present, implemented by large corporations as well as small startup
there are two basic models used in developing a chatbot. companies. However, chatbots are not under proper
Generative based models and Retrieval based models. The implementation in the medical field. A chatbot can help
recent advancements in deep learning and artificial patients with medical-related works by assisting them via text
intelligence, such as the end-to-end trainable neural messages, applications, or instant messaging. One can find
networks have rapidly replaced earlier methods based on many virtual bot development structures in the market, both
hand-written instructions and patterns or statistical interface-based and code-based. However, both models have
methods. This paper proposes a new method of creating a limitations concerning flexibility and practicality in making
chatbot using a deep neural learning method. In this real conversations. Most of the well-known intelligent
method, a neural network with multiple layers is built to personal assistants like Alexa by Amazon, Google Assistant
learn and process the data. by Google, and Cortana by Microsoft do have some limitations
in functionality. Retrieval based agents are being introduced to
Keywords— machine learning, conversational agent, chatbot, hold conversions that match real interactions alike humans.
machine learning classification technique, Neural Networks, Deep Among many intelligent personal assistants currently
Learning, Natural Language Processing. available, they are structured on rule-based techniques or
retrieval-based techniques that generate decent results [2].
I. INTRODUCTION Recently there is a significant increase in curiosity in the
A chatbot is a software program of a conversational usage of chatbots. Many companies are successfully using
interface that allows a user to converse in the same manner one these dialogue generation devices to fulfill customer
would address a human. A virtual chatbot is a piece of software requirements. Since companies are adopting the chatbot
that is intelligent enough to mimic human interactions. technology, there is a promising demand for the development
Conversational bots are used in almost every customer and advanced research for the conversational agents.
interaction, like instantly messaging the client. Since the
development of the first chatbot, they have evolved in II. RELATED WORK
functionality, interface, and their significance to the technical
world cannot be neglected. However, modelling conversations Machine learning and deep learning is useful for various set
remains a significant challenge in this field even today. Even of problems. One of the applications of this technique is in
though they are a long way from perfect, conversational agents predicting a dependent variable from the values of independent
are now used in various applications [1]. To understand the variables. Machine Learning is a part of Artificial Intelligence.
capabilities and limitations of current chatbot techniques and ML finds a solution to the problems by recognizing patterns in
the databases rather than depending on rules. The Machine III. PROPOSED WORK
Learning techniques such as linear regression and naive Bayes Problem statement: A chatbot is computer software that
methods study the correlation between features and the value communicates with humans using natural language. In the past
of the output class. In other words, ML techniques help few years, several corporations are using chatbots to fulfil
machines to understand some information about the real world. customer's needs. While chatbots are being used in many tech-
"The calculations hang on the correlation between features and related works, they are not under proper implementation in
output value, however, some modular designed features are medical-related tasks. Chatbots entering the healthcare
restricted in these models, this may lead to many difficulties, industry can help ease many of its difficulties.
like representing each entity as a different set of features, to The motivation for this research is to present a better way
verify that, consider the problem of face detection as an in modelling a chatbot that can be used for various medical-
example, the modeller cannot represent a face on a pixel-to- related tasks like to notify patients about a future appointment
pixel basis, but can easily represent with a set of features such or a regular check, to gather advice, to check up on their health
as having a particular shape and structure" [3]. complaints, or to promote a healthcare business. In the
In this approach, the main drawback is the extreme meantime, healthcare workers can spend time caring about
importance of representation of data. A crucial task in patients instead of going through a needless routine. The main
designing a chatbot is to make the discussion between the challenge in designing a conversational agent is to make the
system and the user feel human-like and natural. To increase conversation between the person and the software feel natural
human perception, several models with CUI (conversational and human-like. A simple Graphic User Interface (GUI) to
user interfaces) like virtual bots are programmed to deliver communicate with the chatbot.
delayed replies/responses to mimic the time taken by the
humans to reply. However, a delayed response may negatively Contribution: Chatbot typically uses sentences expressed
affect user fulfillment, mostly in times where quick responses by the user in natural language as input and delivers output
are expected, for example, in customer interactions. response. Among the two main methods of generating
A novel method to implement word segmentation was responses, the old-style method uses hardcoded templates and
introduced by Mohammed Javed et al. [4]. The algorithm instructions, whereas the advanced model that emerged with the
counts/calculates the character spaces, including all types of growth of Deep Learning uses enormous data. Deep neural
gaps, including space between letters, punctuations, and network models are trained on large amounts of
words. The algorithm works on the number of gaps in the data/information to learn the procedure of responding with
sentence. The average gap is calculated to determine the mean relevant and grammatically accurate responses to input
average gap between characters of the sentence. If the utterances. Some advanced models are developed to
calculated space is above the calculated average space, these accommodate visual or spoken inputs.
spaces are identified as tokenization points, this leads to This chatbot performs services like offering support for
tokenization at the blank spaces in-between words. Adverse drug reaction, Blood pressure tracking through patient
Word segmentation implemented by using Natural ID, Hospitals, and Pharmacies search. Indeed, creating a decent
Language ToolKit (NLTK) was proposed by Naeun Lee et al. chatbot remains a tough challenge despite the advancements in
[5]. A Python package NLTK with inbuilt tokenizers provides Artificial Intelligence.
services for Natural Language Processing. "A wide range of
tokenizers are included which are as follows standard, letter, IV. DATASET
word, classic, lowercase, N-gram, pattern, keyword, path, etc. The dataset is taken from the open-sourced Kaggle
The word-punkt tokenizer is basic and simple, the sentences at healthcare services competition. It is a simple dataset with 14
the blank spaces are split. The accuracy, speed, and efficiency different tags, and each tag has patterns and responses. These
of the NLTK tokenizers is commendable" [5]. The package patterns and responses take Question and Answer form. The
executes the algorithm at the backend. dataset has a dictionary mapping of dictionaries. First, intents
Though chatbots are primarily question answering systems, are mapped with tags, patterns, context, and responses, and
if chatbots are used in medical-related tasks like notifying then each of them is mapped with their queries and keywords
patients about an appointment or a regular check, to gather [6].
advice and check up on their health complaints, healthcare This is a small dataset, and training it over deep neural
workers can spend time caring about patients. networks model leads to overfitting the model, to find the
For some minor health issues, the chatbot can be used for optimal learning rate appropriate precautions must be taken
health advice and further guidance rapidly. These chatbots are while building the model.
particularly useful when one struggles to understand
telephonic advice. Therefore, these conversational agents are TABLE I. Dataset description
seen as very useful, time-saving, and also cost-effective in S. No Attribute Name Description
healthcare services.
1. Tags Tags are keywords assigned to
the patterns and responses for
training the chatbot .e.g.,
"greeting", "bye".
2. Patterns These are the types of queries
asked by the user to the
chatbot.
3. Responses Responses are the answers
generated by the chatbot for
the respective queries.
4. Context Context is given for which the
queries require a search and
11th ICCCNT 2020
find operation.
July 1-3, 2020 - IIT - Kharagpur
Authorized licensed use limited to: Auckland UniversityKharagpur, India
of Technology. Downloaded on October 27,2020 at 21:10:59 UTC from IEEE Xplore. Restrictions apply.
IEEE - 49239
yˆ ∈ {0, 1} in (3), and it is a one-hot-encoded label, and the reached in the optimal time. Developing an efficient
model's output is denoted by y. With the aim of training the initialization of a model is a vital step. Often, training is
model, "The gradients are computed by differentiating the cost- performed by initializing the parameters, specifying an
function with respect to the model parameters using a mini- optimization algorithm, computing the cost function, gradients
batch of data sampled from the training data and of a cost function using the backpropagation, and so on
backpropagated to prior layers using the backpropagation updating the initialized parameters. Finally, it ends at updating
algorithm" [10]. the initialized parameters. Therefore, initialization is the
foremost step in better performance of the model, Sartaj Singh
Sodhi et al. [13] came up with an idea of assigning equal initial
C. Optimizers
values to the weight vector. Still, it was discovered that the
Optimizers are the algorithms used to change the weights weights change in groups during the weight update and
and learning rate of hidden layer neurons to minimize the losses maintain symmetry, this causes the problem in training the
of the network. The losses of the network are represented by network systematically. To address this problem the random
Error function E(x), a mathematical function which is also weight initialization technique was introduced. "The initial
known as Objective function invariably depends on the model's weight values are chosen uniformly from a range of [−δ, δ].
internal learnable parameters. The same function is used for Xavier and He normal weight initialization methods have been
computing the target values(Y) from the set of predictors(X) extensively applied since they submitted it" [13].
used in the model. "For example— the Weights(W) and the
Bias(b) values of the neural network as its internal learnable VI. EXPERIMENTAL RESULTS
parameters which are used in computing the output values and In the deep neural network, the most crucial task is
are learned and updated in the direction of an optimal solution, allocating weights to the hidden layers. In accordance with fig.
i.e., minimizing the Loss by the network's training process and 2, it can be seen that different weight parameters are earmarked
also play a major role in the training process of the Neural for each neuron with the different activation function value at
Network Model" [11]. each level. Activation functions and weight values are adjusted
The internal parameters which are crucial in any model play during the training of the model to provide the most optimal
a vital role in training a model efficiently and effectively in value at the output. A keen analysis of the output is done for
order to produce results accurately. "This is why various evaluating the proposed work on each parameter, especially
Optimization strategies and algorithms are used to update and related to the accuracy and cross-entropy error.
calculate appropriate and optimum values of such model's
parameters which influence our model's learning process and
the output of a Model" [11].
D. Activation Function
In DNN, the activation function of artificial neurons takes a
set of input values or inputs and produces the output according
to the activation function used. Some of the efficient functions
are Gaussian, Multiquadric, and Inverse activation function.
According to the Cybenko theorem [12], it states that it is The analysis is divided into three parts, in the first part, a
possible to achieve approximately any continuous function with general network which consists of one hidden layer and most
a feed-forward neural network with a single hidden layer of widely used activation function and optimizer algorithm are
finite neurons and a sigmoidal activation function. used. Then in the next section, analysis of different optimizer
algorithms is performed, their performance is scaled on the
E. Weight Initialization basis of performance metrics.
Then in the next section, different kinds of weight
Weight initialization is a fundamental initialization initializers are used, and their performance is measured, this is
framework for developing a neural network, and an appropriate the third part of the analysis. For proper communication with
initialization parameter is chosen so that optimization can be the chatbot, a user-friendly graphic user interface (GUI) is
created. A software GUI library called 'Tkinter' for Python is larger or smaller in an inversely proportional way to the
used to create a simple user interface. gradient. In this way, the network does not learn too slowly and
Tkinter is one of the fastest ways to create a simple GUI does not skip the minima by taking a large step. RMSprop
application. It is cross-platform, so the same code works for accumulates gradients from most recent iterations and uses
different operating systems. Tkinter is Python's de-facto exponential decay. It just falls behind the SGD. Both RmsProp
standard GUI package [14], so it is easy to use when compared and AdaDelta converge at the eighth epoch. The number of
to other GUI creating applications. A sample conversation with epochs vs. training loss for various optimizers and validation
the chatbot is shown in fig. 3. loss graph is shown below.
Fig. 4. Number of epochs vs. training loss for various optimizers and
validation loss
Fig. 3. GUI of the Chatbot In general, Adam performs better than most optimizer
algorithms, but here it is not the best optimizer. It may be due
A. Accuracy of different Optimizers to a small dataset. It converges at the fifteenth epoch. With the
use of AdaGrad, the learning rate decays slower, which implies
The train and test data are split into 0.2 proportion. In this the dimensions have a gentler slope for learning. AdaGrad
analysis, there are different models created for each optimizer performs worst in this case. It never converges below 20 epochs
algorithm, and analysis is performed on each one of them. Here and approaches global minima, but at a slower rate. The overall
each model has two hidden layers and Relu as an activation performance of optimizer algorithms in increasing order is
function in hidden layers. Five different optimizers were chosen AdaGrad, Adam, RMSprop, AdaDelta, and SGD. The accuracy
for this purpose. They are Stochastic Gradient Descent (SGD), obtained from different models is given in the table below.
AdaGrad, AdaDelta, RmsProp, and Adam. The graph (fig.5) is
training loss and validation loss vs. the number of epochs. From TABLE II. Accuracy of different optimizer algorithms
this graph, one can analyze the performance of different
optimizers using the minima generated in their learning span. SGD AdaGrad AdaDelta RmsProp Adam
SGD optimizer outperforms other optimizer algorithms in Accuracy 100% 89.57% 99.89% 99.64% 95.55%
this specific neural network model. The first minima generated
by the model is with zero validation and training loss, which
implies that the network adapts fast using SGD. It works the B. Accuracy of different Weight Initializers
same as regular gradient descent, but updates for every epoch.
SGD is prone to noise in the data, but here it performs better, Weight initialization is used to control the output of the
which implies the outliers in the data is much less than layers either from increasing or decreasing dramatically. The
expected. The dataset is small. Hence, something from the SGD same neural network with SGD optimizer is used, and the
optimizer is expected, and it gives the best performance in terms weights in each neural network are initialized by using different
of accuracy. Somehow, it finds the minima, which is higher initialization techniques. Seven different kinds of initialization
than other optimizers, but SGD gives the best results while techniques initialize the weights in this neural network, and
solving the problem. SGD converges at the second epoch. they are Glorot Normal, He Normal, Zeros, Ones, Random
RMSprop and Adadelta have the same minimas and give close Normal, Identity, and Orthogonal and this observation is given
accuracies. Adadelta automatically makes the learning rate in table III.
VII. DISCUSSION
TABLE III. Accuracy of different weight initializers
A DNN is used to evaluate the analysis of the dataset. A
machine learning toolbox 'Keras' based on Python is used to
Weight Initializers Accuracy
evaluate the performance based on the analysis. The main aim
Glort_normal 100%
is to study neural networks and the different algorithms used in
He_normal 100% the neural network. The research shows that SGD is the best
Zeros 88.12% optimizer here, producing 100% accuracy, and the He normal
Random_normal 98% weight initialization gives fast results in terms of the number of
Identity 100% epochs. The activation function used for input and the hidden
Orthogonal 98% layer is ReLu. The activation function used for the output is
Ones 100% Softmax. A sequential type of model building is used in this
model. A graphic user interface (GUI) is created to
Initialization with Ones and Identity begins with less training communicate with the chatbot using Tkinter (Python).
loss and converges at the global minima by the second epoch.
This implies that the network does not have exploding/
vanishing gradient problems with simple symmetrical weight VIII. CONCLUSION
initialization. Random normal and He normal look similar in Conversations/interactions between services and people are
the graph, they start with the same validation loss and converge streamlined by the chatbot applications, whereby developing
similar to Ones and Identity weight initializers. Both converge the costumer's experience in a better way. These chatbot
at the second epoch. The large values of validation loss at the applications also give companies innovative opportunities to
start of the training are due to random initialization of weights, maximize the customer's engagement process and their
resulting in inconsistent variance. productivity by reducing the inevitable costs of customer
Random normal, He normal, Ones and Identity initialization service.
of weights results in 100% accuracy of the model. Orthogonal The research discovered that a chatbots' performance could
initialization of weights is similar to Random normal, but with be improved by using neural networks and different algorithms.
the Orthogonal property of the vectors, this initialization begins It is important to acknowledge the limitations like the accuracy
with a higher validation loss. Nonetheless, orthogonal of the model, the lack of empathy, and privacy concerning
initialization also converges at the second epoch. Zero users’ data, and the chatbot could be improved by providing the
initialization is worst at the beginning because all weights are availability of a voice version, which helps visually impaired or
initialized to zero. It converges slowly but has many minimas, illiterate people. While many tasks are done by the chatbots,
initializing with zeros may harm the learning gradient. The they can never replace humans until chatbots understand human
global minima for Zero initialization is achieved around the emotions and human perception. This is even truer in the
fifteenth epoch. Form this. It is evident that Random normal, healthcare sector.
He normal, Identity, and ones initialization of weights works In future research, investigation of other enhanced methods
best for this network. In general, Random normal initialization can be done, that would further raise the standards of chatbots.
gives better results when compared to other initializers in Furthermore, the current advanced developments in neural
different networks. A graph is plotted between the training loss networks, including RNN, deep CNN, and LSTM, a deep brief
vs. the number of epochs (fig. 5). network based on restricted Boltzmann machines, and deep
auto-encoder should be used for further development of
conversational bot.
REFERENCES