Proceedings of the International Conference on Inventive Computing and Informatics (ICICI 2017)
IEEE Xplore Compliant - Part Number: CFP17L34-ART, ISBN: 978-1-5386-4031-9
Intelligent Tool For Malayalam Cursive
Handwritten Character Recognition Using Artificial
Neural Network And Hidden Markov Model
Thulasi Kishna N.P Seenia Francis
Department Of Computer Science and Engineering Assistant Professor
Master Of Technology Department Of Computer Science and Engineering
Jyothi Engineering College Jyothi Engineering College
Thrissur, Kerala Thrissur, Kerala
[email protected] [email protected]Abstract— This paper represents an attractive method for the binarization, smoothing, normalization. Thus the pre-
conversion of the image into an editable text as it is written by processed image will be given for segmentation and thus it
Optical Character Recognition (OCR). With this offline will be segmented into isolated characters by assigning a
handwritten character recognition method, it shows the ability of number of each character using labeling process. This labeling
a computer to receive and recognize handwritten input.
provides information about a number of characters in the
Computers may find some difficulty in deciphering the exact
handwritten characters with different fonts and styles. This image. Thus the segmented output will be given for extraction
paper mainly focuses on the recognition of handwritten in which the features will get extracted. In this proposed
Malayalam (a South Indian Language) characters. Thus cursive system the feature extraction is based on character geometry.
Malayalam characters can be recognized by Hidden Markov It extracts different line types that from a particular
Model (HMM). The classification is done with Artificial Neural character[1],[2].
Network (ANN). Handwritten character recognition with high
accuracy and efficient method to recognize the cursive letters are
included in the proposed system. Pre-Processing
Keywords— Handwritten character recognition (HCR), Optical
Character Recognition (OCR), Cursive Malayalam characters,
Hidden Markov Model (HMM), Artificial Neural Network (ANN). Segmentation
I. INTRODUCTION
Handwritten character recognition (HCR) has been one of
the most fascinating and challenging research in the field of
image processing and pattern recognition. Many efforts have Feature Extracting
been made in recognizing both online and offline character
recognition automatically. Many approaches have been
proposed, most of them focus on the English language. In this
work mainly two approaches have been included to identify Classification &
the offline handwritten characters. They are Artificial Neural Recognition
Network (ANN) and Hidden Markov Model (HMM). Gradient
descent algorithm for ANN and a statistically based
recognition method to recognize handwritten cursive Fig 1 : System Of Recognition
characters are used here[1].
OCR has become one of the most successful Thus, the extracted features will be given as the input to
applications of technology in the field of Image processing, the final step- Classification and Recognition. Classification is
Pattern recognition, and Artificial intelligence[2]. There are a based on Neural network and Recognition is based on HMM.
number of different technologies are being used and tested for
Malayalam handwritten character recognition. Due to II. DETAILS EXPERIMENTAL
variations of the handwritten characters still, its recognition
didn’t acquire 100% accuracy. Most character recognition A. Malayalam Characters
methods follow the steps shown in figure 1. The figure System The Malayalam language is the mother tongue of the State
of recognition (Figure 1) shows that the input scanned image of Kerala, the southernmost part of India. It contains totally 51
will be given for pre-processing thus to make it noise free, letters which have 13 Vowels and 37 consonants. A set of
978-1-5386-4031-9/17/$31.00 ©2017 IEEE 595
Proceedings of the International Conference on Inventive Computing and Informatics (ICICI 2017)
IEEE Xplore Compliant - Part Number: CFP17L34-ART, ISBN: 978-1-5386-4031-9
characters includes vowels, consonants, vowel signs are the
complete character set of Malayalam. Mainly 12 vowel signs
are in this language. They sometimes called as dependent
vowels as they validated unless there is a combination of both
consonant and conjunct. The Malayalam character collection
is shown in Figure 2 which contains vowels and consonants[1]
and Figure 3 shows the vowel signs in Malayalam
characters[1].
B. Character Recognition Fig.3 : Vowel Signs
This method employs the recognition of both isolated and
combinational handwritten characters in a noiseless
environment. The main aim is to identify the terminologies in
each isolated characters and thus to extend the same into the
combination of linked characters so as to improve the
accuracy of results with less complex algorithms. The figure 5
shows the flow chart of both isolated and combinational
HCR[1] system.
Fig 4: Combinational Characters in Malayalam
i. Image Acquisition
The handwritten character to be recognized is seized by
using an optical scanner[1].
ii. Image Preprocessing
The captured image is not suitable for the recognition
process. Hence, the received image is required to undergo a
preprocessing step to convert it to a usable form for additional
stages of the identification process. The preprocessing stage
carries skew correction and normalization. In optical character
recognition (OCR), the text lines in a document must be
examined properly. While scanning through an optical scanner
few degrees of skew is inescapable. Skew refers to an
inclination in the received image. Skew detection and
correction are the decisive preprocessing steps in character
recognition procedure. This skew in an image is examined by
means of using thinning algorithm together with Hough
transform. And the estimated skew is amended by means of
using coordinate transformation method. Generally,
handwriting variation occurs between people. Not only vary in
a their writing styles, but also vary in geometric features such as
slant. Normalization technique is used to take out slant from
the handwritten characters. It also cites to changing the range
of pixel intensity values. It adapts the pixel values to a
standard range.
Fig.2(a) : Vowels and Consonants
iii. Segmentation
Segmentation is the practice of separating characters in a
word and it is the most challenging part of the cursive
handwritten recognition process. Here the segmentation
regions are recognized from the peaks of the vertical
projection profile. The vertical projection of a binary image
looks like a collection of black hills on a white surface. After
obtaining the extracted segmentation regions, characters are
Fig.2(b) : Vowels and Consonants segmented.
978-1-5386-4031-9/17/$31.00 ©2017 IEEE 596
Proceedings of the International Conference on Inventive Computing and Informatics (ICICI 2017)
IEEE Xplore Compliant - Part Number: CFP17L34-ART, ISBN: 978-1-5386-4031-9
iv. Feature Extraction III. RESULTS AND DISCUSSION
Feature extraction phase employs the extraction of the
texture features from the handwritten characters. For this The system is implemented using MatLab 2014a and the
purpose, a median filter is used. All the segmented characters accuracy of recognition is 93.4%. Hybrid HMM with ANN
are scaled into typical height using image resizing approach. has an important role in the system’s performance. This
Nonessential portions and noise in the segmented characters section describes the performance of proposed handwritten
are taken out using a median filter. character recognition model. Experiments show that the
proposed method gives the accurate results than the existing
v. Classification and Recognition methods. The figure 9 shows an input with combination of
When an input is given to HCR system, its features are Malayalam characters. The output with accuracy is shown in
derived and given as an input to the qualified classifier. Here table 1.
artificial neural network (ANN) handles: Classifiers Microsoft
correlate the input feature and the best matching class for the
input are found out. Hidden Markov Model (HMM) is used to
recognize the segmented attributes. It is an analytical model.
After feature extraction, the extracted features are recognized
using HMM training and ranking process which contains
training and test image as input.
vi. Neural Network Architecture (Multi Layer
Perceptron MLP)
The neural networks based on properties of the brain to
fabricate systems of calculation which able to resolve the type
of problems as human beings live know resolve. They have
different models, and one of them are Perceptron. In the
classification phase the MLP: The number of neurons used in
the network is:
• Input layer contains five neurons (the number five similar
to the values found in the vector of extraction).
• Output layer contains thirty neurons (the number thirty
corresponds to the numbers of characters using in
recognition).
• Hidden layer chooses a number of neurons based on three
conditions:
- Equals the number of neurons in the input layer.
- Equals 74% of a number of neurons in the input
layer.
- Equals the square root of the product of both exit
and entry layers.
As per the above conditions; the number of neurons
in a layer is hidden between four and nine neurons are varied.
The method used here is gradient back-propagation algorithm.
vii. Hidden Markov Model Fig. 5: Flow Chart of Proposed System
HMM is one of the widely using models in pattern
recognition. The main application of HMM is in speech
recognition. But because of the resemblance between speech
and cursive calligraphy recognition, HMMs have become very
prominent in handwriting recognition. In systems with a small
vocabulary, it is conceivable to build an individual HMM for
each word. But for large vocabularies, this method is not a
good option, because of the lack of training data. Therefore, in
this system, an HMM is built for each character. The use of
character models acquiesces to share training data. Each Fig.6: Original Image
instance of a letter in the training set has a brunt on the
training and points to a better parameter evaluation.
978-1-5386-4031-9/17/$31.00 ©2017 IEEE 597
Proceedings of the International Conference on Inventive Computing and Informatics (ICICI 2017)
IEEE Xplore Compliant - Part Number: CFP17L34-ART, ISBN: 978-1-5386-4031-9
in the scanned image by applying a median filter. Also,
Artificial Neural Network (ANN) helps to acquire better
classification and gives a best matching class for input. The
samples used are of high quality to reduce the intricacy in the
recognition process. This approach shows a better result in
terms of speed and accuracy. Thus as a future work, the
combination of both English and Malayalam characters can be
Fig.7: After Preprocessing recognized.
REFERENCES
[1] International Journal of Computer Theory and Engineering, Vol. 3,
No. 3, June 2011 An Efficient Character Recognition System for
Handwritten Malayalam Characters Based on Intensity Variations
Abdul Rahiman M and Rajasree M S.
[2] International Conference on Communication Technology and
System Design 2011Unconstrained Handwritten Malayalam
Character Recognition using Wavelet Transform and Support vector
Fig.8 : Image with detected segment region Machine Classifier Jomy John, Pramod K. V, Kannan Balakrishnan.
[3] Off-Line Arabic Handwriting Character Recognition Using Word
Segmentation Manal A. Abdullah, Lulwah M. Al-Harigy, and
Hanadi H. Al-Fraidi Journalofcomputing.Org
[4] Journal of Language Technology, Viswabharat@tdil, July 2003.
[5] Hmm-Based Handwritten Symbol Recognition Using On-Line And
Off-Line Features Hans-Jiirgen Winkler Institute For Human-
Machine-Communication.
[6] A Literature Review On Hand Written Character Recognition
Mansi Shah And Gordhan B Jethava, Indian Streams Research
Journal.
Fig.9 : An Example - Input Image with Cursive Malayalam Characters [7] J. Pradeepa, E. Srinivasana, S. Himavathib, "Neural Network Based
Recognition System Integrating Feature Extraction and Classification
for English Handwritten", International journal of
Consider the each word in the same sentence, so as to check Engineering,Vol.25, No. 2, pp. 99-106, May 2012.
with the three matching characters stored. Table 1 shows the [8] Malayalam Script Features [Online]. Available:
matching of some input characters. http://scriptsource.org/scr/Mlym
[9] S.M. Milky Mahmud, Nazib Shahrier, A.S.M Delowar Hossain,
Md. Tareque Mohmud Chowdhury, Md.Abdus Sattar,“, An Efficient
TABLE 1: RESULTED OUTPUT WITH ACCURACY Segmentation Scheme for the Recognition of Printed Bangla
characters”,Proceedings of ICCIT, 2003, pp 283-286.
SL The Word Cropped Matched Accura
No. Segment Character cy
(%)
1 Shah 90
Ra
Nna Nna
Ra
(Special
characters
considered
directly)
2 Tha 95
Ta
Ya
Nna Nna
3 Bha 93
Ga
Tha
VI. CONCLUSION
This paper deals with the recognition of cursive handwritten
Malayalam characters using Hidden Markov Model (HMM).
The algorithm used here helps to avoid errors caused by noise
978-1-5386-4031-9/17/$31.00 ©2017 IEEE 598