Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views4 pages

Convolutional Neural Network For Detection

Research Article

Uploaded by

jkdprince3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views4 pages

Convolutional Neural Network For Detection

Research Article

Uploaded by

jkdprince3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

International Journal of Computer Trends and Technology (IJCTT) – Volume 67 Issue 5- May 2019

Convolutional Neural Network for Detection


of Sign Language
Dr.Mahesh kaluti1, Manoj Athreya C.S2, Manish M.G3,M.P Mahadeva Aradhya4,Raghavendra S5
1
(Associate Professor, Department of computer science and engineering,
2,3,4,5
(Department of computer science and engineering,
P.E.S College of Engineering, Karnataka, India)

Abstract
Sign languages are languages that use
manual communication to convey meaning. This can
include simultaneously employing hand gestures,
movement, orientation of the fingers, arms or body,
and facial expressions to convey speaker’s idea. Sign
language is an incredible advancement that has
grown over the years. Sign language helps the deaf
and dumb communities to go on about their daily
lives. Unfortunately, there has some drawbacks that
has come along with this language. Not everyone
knows how to interpret or understand sign language
while conversing with a deaf -mute person. To solve
this, we need a product that is versatile and robust,
which needs to convert sign language into text or
written format so that it is understood by everyone.

Keywords - Convolutional Neural Networks,


Machine learning, Computer Vision, Sign language.
Figure 1: Sign language alphabets

I. INTRODUCTION
Conversation between deaf and dumb II. RELATED WORKS
community and rest of the world has been in shadow In order to recognize and categorize sign
of misconception since ages. The paper is concerned gestures accurately, based on our knowledge in class,
with a product that can eliminate the barrier between we first thought of using basic machine learning
the deaf and dumb community and rest. This product techniques such as Support Vector machine, and
should interpret 26 sign language alphabets and 10 regular Neural network. Microsoft Kinetic was also a
sign language digits. The alternative of written recent advancement that helped taking 3-d depth
communication is cumbersome, impersonal and even images of sign MATLAB with PCA was also used in
impractical when an emergency occurs. In order to one of our reference papers. We evaluated both pros
overcome this obstacle and to enable mutual and cons presented by the papers discussed below by
communication, we present a sign language taking complexity, runtime, results, feasibility,
recognition system that uses Convolutional Neural flexibility and understandability into consideration.
Networks (CNN) in real time to translate a video of a Selectively, we chose some models to implement on
user’s sign language gestures into text. This is done our dataset. Eventually, we settled on Convolutional
by taking user input i.e., gestures through web cam, Neural Network for sign language recognition.
classifying each frame in the video to letter and In our work, we build on the results of Roel
displaying the most likely word from classification Verschaeren [1]. He proposes a CNN model that
scores wiz, output. However, there are many recognizes a set of 50 different signs in the Flemish
challenges to be considered including the lighting Sign Language with an error of 2.5%, using the
sensitivity, when a sign ends and the next begins. Our Microsoft Kinect. Unfortunately, this work is limited
system takes the input from the video stream and we in the sense that it considers only a single person in a
extract individual frames of the video and generate fixed environment.
letter probabilities for each using Convolutional In [2] using Support Vector Machine, the system is
neural network. able to perform in dynamic and minimally cluttered
background with satisfactory result as it relies on skin

ISSN: 2231-2803 http://www.ijcttjournal.org Page 34


International Journal of Computer Trends and Technology (IJCTT) – Volume 67 Issue 5- May 2019

color segmentation. For speech recognition Sphinx inputs were fixed size high pixel images, 200 by 200
module is used which maps the spoken alphabet to or 400 by 400, being padded and resized to 200 by
text with high accuracy. This text is then mapped to a 200.
picture if it is a static gesture or a video if it is a
dynamic gesture. This system classifies the gesture as A. Data collection and preprocessing
static or dynamic by measuring the distance moved For the data collection, we manually collected
by the hand in subsequent frames. For static gestures, some data from Indian Institute of Sign Language.
the system uses Zernike moments, a well-known Since the dataset was insufficient for relatively better
shape descriptor in image processing. HSV performance of the model, we were supposed to add
Segmentation and Finger-tip detection showed some data manually. We captured training images for
satisfactory results in constrained environment, i.e. gestures using the histogram of Gradients approach
proper lighting and background with limited skin- using the computer vision library. A 10X10
colored objects. Static Gesture recognition was histogram is implemented and only the image in that
carried out on a lexicon of 24 alphabets (a-y, part of ROI is extracted. The image is converted from
excluding j) and it succeeded with approximately BGR to HSV. The processing of images starts from
93% accuracy. here. The histogram is then calculated for its hue and
Artificial neural networks approach used in [3] In this saturation value from the extracted HSV image. The
paper, converting sign language to text by an histogram is then normalized. We use the back-
automated sign language recognition system based on projection operation on the histogram to recognize
machine learning was proposed to satisfy this need. only the skin color and to avoid background noise
Artificial neural Backpropagation algorithm is used [2]. The smoothness of the image is increased by
in the system. Input Layer was designed to contain applying gaussian and median blur on the histogram.
3072 neurons for Raw Features Classifier and 512 The gesture is placed inside the histogram as in figure
neurons for Histogram Features Classifier. Hidden 2 and every gesture is captured.1200 images has been
Layer was designed to contain 10 neurons for each captured and save in a directory. These 1200 images
classifier. Output Layer had 3 neurons for each are the rotated along the vertical axis which adds up
classifier. The system gives 70% and 85% accuracy to 2400 images per gesture. we attempted padding the
rate from respectively Raw Features Classifier and images with black pixels such that they preserved
Histogram Features Classifier. When considered their aspect ratio upon re sizing. This padding also
other studies, the obtained results are average results. allows us to remove fewer relevant pixels upon
The recognition rate can be increased by improving taking random crops.
processing image step as a future work.
In [4] The propose system was able to recognize
single handed gestures accurately bare human hands
using a webcam which is MATLAB interface. The
aim of this project is to recognize the gestures with
highest accuracy and in least possible time and
translate the alphabets of Indian Sign Language into
corresponding text and voice in a vision-based setup. (a) Input image (b) Processed image
MATLAB and PCA algorithm. 260 images were Figure 2: Image processing
included in training and testing database. The images B. Convolutional Neural Network
are captured at a resolution of 380X420 pixels. The
CNNs (based on [7]) are feature extraction
runtime images for test phase are captured using web
models in deep learning that recently have proven to
camera. Otsu algorithm is used for segmentation
be to be very successful at image recognition [6], [3],
purpose. Feature extraction is a method of reducing
[10], [8]. As of now, the models are in use by various
data dimensionality by encoding related information
industry leaders like Google, Facebook and Amazon.
in a compressed representation Sign reorganization
And recently, researchers at Google applied CNNs on
using PCA is a dimensionality reduction technique
video data [11]. CNNs are inspired by the visual
based on extracting the desired number. A MATLAB
cortex of the human brain. The artificial neurons in a
based application performing hand gesture
CNN will connect to a local region of the visual field,
recognition for human-computer interaction using
called a receptive field. This is accomplished by
PCA technique was successfully implemented. The
performing discrete convolutions on the image with
proposed method gave output in voice and text form.
filter values as trainable weights. Multiple filters are
III. METHOD applied for each channel, and together with the
activation functions of the neurons, they form feature
The method used belongs to supervised machine
maps. This is followed by a pooling scheme, where
learning where stochastic gradient descent was used
only the interesting information of the feature maps
along with SoftMax activation function at the output
are pooled together. These techniques are performed
layer. Our goal was to train the neural network for
in multiple layers as shown in Figure 3.
proper classification of sign language gestures. The

ISSN: 2231-2803 http://www.ijcttjournal.org Page 35


International Journal of Computer Trends and Technology (IJCTT) – Volume 67 Issue 5- May 2019

IV. EXPERIMENTAL RESULTS

The experiment results were highly accurate


and up to 99% of accuracy is achieved under standard
lightning conditions. Our model took significantly
long hours to train the model under standard
processor. We can save the trained model, so training
again is not necessary until the dataset is altered. The
Figure 3: CNN Model
confusion matrix provides the necessary details
regarding the labels and the accuracy of the trained
C. Architecture
model.
Most implementations surrounding this task
have attempted it via transfer learning, but our
network was trained from scratch. Our general
architecture was a fairly common CNN architecture,
consisting of multiple convolutional and dense layers.
The architecture included 2 groups of 2 convolutional
layers followed by a maxpool layer and a dropout
layer, and two groups of fully connected layer
followed by a dropout layer and one final output
layer. The activation functions used in both the
convolution layer is Rectified linear unit (ReLU) and
the activation function for the output layer was the
SoftMax function where the loss function is given by,

Loss= (1)

(2)

Equation (2) is the SoftMax function. It takes a


feature vector z for a given training example and
squashes its values to a vector of [0,1]- valued real Figure 5: Confusion Matrix
numbers summing to 1. Equation (1) takes the mean
loss for each training example, xi, to produce the full
SoftMax loss. Using a SoftMax-based classification Model accuracy and loss is increased and decreased
head allows us to output values akin to probabilities respectively as the number of epochs increase. We
for each gesture. The input for the model is given had to stop training near to 20th epoch in order to
through a webcam using the histogram of ordered avoid over fitting.
gradients as discussed above and the corresponding
output is displayed on the screen.

Figure 6: model accuracy


Figure 7: model loss
Figure 4: Architectural Model

ISSN: 2231-2803 http://www.ijcttjournal.org Page 36


International Journal of Computer Trends and Technology (IJCTT) – Volume 67 Issue 5- May 2019

REFERENCES

[1] Verschaeren, R.: Automatische herkenningvan gebaren met


de microsoft Kinect(2012)
[2] Anup Kumar, Karun Thankachan, MevinM Dominic, “sign
language recognition” 2016 3rd International Conference on
Recent Advances in Information Technology(RAIT) IEEE,
11 July 2016
[3] R. Tu¨lay Karayılan and O¨zkan Kılıc¸,“Sign language
translation”, 2017 International Conference on Computer
Science and Engineering (UBMK) ,IEEE, 02 November
2017.
[4] Shreyashi Narayan Sawant and M. S. Kumbhar , ”Sign
Language Translation”, 2014 IEEE International
Conference on Advanced Communications, Control and
Computing Technologies , 26 January 2015
[5] Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep
neural networks for image classification. In: Computer
Vision and Pattern Recognition (CVPR), 2012 IEEE
Conference on. pp. 36423649. IEEE (2012).
[6] J. Jarrett, K., Kavukcuoglu, K.: What is the best multi-stage
architecture for object recognition? Computer Vision, 2009
IEEE 12th International Conference on pp. 21462153
V. CONCLUSIONS (2009)
http://ieeexplore.ieee.org/xpls/absall.jsp?arnumber=545946
9
The earlier sign language translators were [7] Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-
not either efficient or economical. Sign language based learning applied to document recognition.
translator gloves and Microsoft kinetic devices are Proceedings of the IEEE 86(11) (1998)
[8] Spec Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S.,
not economical and other machine learning Shet, V.: Multi-digit number recognition from street view
algorithms was not able to achieve the accuracy and imagery using deep convolutional neural networks. arXiv
efficiency. We implemented and trained the model up preprint arXiv:1312.6082 (2013)
to the mark. Even it can be made better with [9] V. G Adithya, P. R. Vinod, Usha Gopalakrishnan,
”Artificial Neural Network Based Method for Indian Sign
additional dataset and under standard conditions it Language Recognition”, IEEE Conference on Information
can achieve up to 100% of accuracy. The training and Communication Technologies (ICT),pp. 1080-1085,
time can also be significantly reduce using the 2017
inception V3 model and using higher configurations [10] Zeiler, M.D., Fergus, R.: Visualizing and understanding
convolutional neural networks. arXiv preprint
processor. Based on this we are proposing a novel arXiv:1311.2901 (2013)
approach to ease the difficulty in communicating [11] Karpathy, A., Toderici, G., Shetty, S., Leung, T.,
with those having speech and vocal disabilities. Since Sukthankar, R., Fei-Fei, L.: Large scale video classiffication
it follows an image-based approach it can be with convolutional neural networks. In: CVPR (2014)
[12] Suruchi Bhatnagar, Suyash Agrawal :.”Hand Gesture
launched as an application in any minimal system Recognition for Indian Sign Language: A Review” IJCTT
and hence has near zero-cost. vol. 21, number. 3, pp. 121,mar-2015
[13] Akshata Dabade, Anish Apte ,Aishwarya Kanetkar, Sayali
Pisal “Two Way Communication between Deaf & Dumb”
IJCTT vol.40 ,number.3 ,pp.114 ,oct-2016

ISSN: 2231-2803 http://www.ijcttjournal.org Page 37

You might also like