0% found this document useful (0 votes)

16 views8 pages

Hand Gesture Recognition Using Machine Learning

This paper presents a Machine Learning model for recognizing hand gestures representing letters of the Latin alphabet, utilizing the Random Forest classification algorithm. A dataset of 54,000 RGB images was created, achieving classification accuracies between 98.8% and 74.4%, with an overall accuracy of 92.3%. The study highlights the potential of machine learning in improving sign language recognition and human-computer interaction.

Uploaded by

Rihanna Kila

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views8 pages

Hand Gesture Recognition Using Machine Learning

Uploaded by

Rihanna Kila

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Available online at www.sciencedirect.

com

ScienceDirect
Procedia Computer Science 256 (2025) 198–205

CENTERIS – International Conference on ENTERprise Information Systems / ProjMAN –

International Conference on Project MANagement / HCist – International Conference on
Health and Social Care Information Systems and Technologies 2024

Hand Gesture Recognition using Machine Learning

Caminate Na Ranga, Paulo Jerónimoa, Carlos Morab, Sandra Jardimb, *
a
Polytechnic Institute of Tomar, Quinta do Contador, 2300-313 Tomar, Portugal
b
Smart Cities Research Center, Polytechnic Institute of Tomar, Quinta do Contador, 2300-313 Tomar, Portugal

Abstract

Sign language recognition is a growing area of research, with applications ranging from gestural communication to controlling
devices using gestures. One of the challenges inherent to sign language recognition is the ability to translate gestures into
meaningful information, such as letters, words or even sentences. Machine Learning, which has emerged as a powerful tool for
solving a wide variety of complex problems, namely in the field of computer vision, plays a key role, enabling computers to
understand and interpret complex gestures. In this paper, we present a Machine Learning model focused on classifying hand
gestures that represent the letters of the Latin alphabet. The objective of this work is to create a solution capable of accurately
identifying which letter of the Latin alphabet is being represented by a hand gesture in an image. To classify manual gestures was
used the Random Forest Machine Learning classification model, which is fed with the vector of features extracted from the
region of interest in the image. To implement the proposed approach, a database of RGB images of hand gestures was created. To
extract the characteristics of the gestures, was used the MediaPipe open source framework. The solution presents hand gesture
classification precisions by class ranging between 98.8% and 74.4%, with an accuracy of 92.3%, that represents an improvement
over previous approaches.
© 2025 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the CENTERIS - International Conference on ENTERprise Information
Systems / ProjMAN - International Conference on Project MANagement / HCist - International Conference on Health and Social Care
Information Systems and Technologies

Keywords: Hand gesture recognition; image classification; machine learning; radom forest

* Corresponding author. Tel.: +351 249 328 100; fax: +351 249 328 186.
E-mail address: [email protected]

1877-0509 © 2025 The Authors. Published by Elsevier B.V.

This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the CENTERIS - International Conference on ENTERprise
Information Systems / ProjMAN - International Conference on Project MANagement / HCist - International Conference
on Health and Social Care Information Systems and Technologies
10.1016/j.procs.2025.02.112
Caminate Na Rang et al. / Procedia Computer Science 256 (2025) 198–205 199

1. Introduction

In the domain of sign language, a sign encompasses three primary components: manual features, which involve
gestures performed with the hands to convey meaning through hand shape and motion; non-manual features,
including facial expressions or body posture, which can either be integrated into a sign or alter its meaning; and
finger spelling, a method of gesturally spelling out words in the spoken language of the local community [1]. Sign
language is as intricate as any spoken language, with several practical applications, from helping disabled people to
communicate to enabling humans to interact with devices in an intuitive and natural way. Each sign language
comprises numerous signs, each distinguished from the next by subtle variations in hand shape, movement,
positioning, non-manual features, or contextual usage. Frequently, these elements involve swift and intricate
movements, posing considerable challenges in the task of recognizing sign language.
As fundamental components of sign language, hand gestures encompass a set of factors such as hand movement,
shape, orientation, alignment, and positioning of the fingers in relation to the hands and body [2]. This critical
component of sign language interpretation presents its own set of challenges, as it is marked by notable similarities
between gestures across different categories, considerable variations within the same category, and frequent
obstructions in hand shapes [1]. The challenge is to translate complex and variable gestures into comprehensive
legible information, as alphabetic letters, words, or even complete sentences. Using Machine Learning (ML)
techniques, it is possible to develop computer models capable of perceiving these differences and accurately
identifying and classifying different hand gestures [3].
Machine Learning, a branch of Artificial Intelligence (AI) and computer science focusing on using data and
algorithms to enable computational applications/systems to imitate the way that humans learn, gradually improving
its accuracy, has emerged as a revolutionary discipline, transforming the way we approach complex problems in
several areas. In this context, sign language recognition is gaining prominence as a fascinating and increasingly
relevant application [4]. Driven by access to large volumes of data and technological evolution, particularly
regarding computational processing capacity, the ML area has evolved at an impressive pace.
Being a subfield of ML, Deep Learning (DL) distinguishes itself in the approaches it implements by the type of
data it uses, as well as the methods it implements for the learning process. ML algorithms rely on structured and
labeled data to generate predictions, meaning that specific attributes are identified in the input data and organized
into structured formats. In the case of unstructured data, it is normally pre-processed with a view to transforming it
into a structured layout. On the other hand, DL streamlines the preprocessing steps inherent in traditional machine
learning approaches. These sophisticated algorithms can manipulate and analyze unstructured data, such as texts and
images, excelling in automatic feature extraction, thus reducing dependence on human intervention.
DL algorithms exhibit remarkable complexity, with several types of artificial neural networks tailored to tackle
specific challenges or datasets. Artificial neural networks strive to emulate the functioning of the human brain by
integrating data inputs, weights, and biases. Through this interconnected framework, they aim to effectively identify,
classify, and describe patterns present within the dataset. Among the different types of neural networks,
Convolutional Neural Networks (CNN) have proven to be particularly effective in computer vision tasks, such as
image classification [5].
Due to its ability to recognize objects in images, several approaches have been proposed for the automatic
recognition of hand gestures using ML or DL models. For the most part, existing methods are limited in terms of the
number of hand gestures they are capable of recognizing, presenting accuracies and processing times that are still
incompatible with applicability in a real context.
This paper presents a proposed approach for the recognition of static hand gestures, where the classification of
gestures is achieved using the Random Forest ML model. The complete pipeline of a solution is presented, ranging
from capturing images of hand gestures, to recognizing the letter of the Latin alphabet they represent, through
extracting features from the collected images, and creating an annotated dataset, with which the aforementioned
model is trained and tested.
This paper is structured as follows: Section 2 presents a literature review of related work to highlight research
developed on sign language recognition and. Section 3 illustrates the methodology behind our model, while Section
4 covers the results and evaluation of the proposed approach. Finally, in Section 5 are presented the conclusions of
the work developed, also indicating possible research directions to improve the results achieved.
200 Caminate Na Rang et al. / Procedia Computer Science 256 (2025) 198–205

2. Related work

Over the past few years, progress in AI-powered methodologies has brought substantial changes to the domain of
sign language hand gesture recognition. Researchers have used a blend of specialized hardware alongside ML and
DL frameworks to forge ahead with more sophisticated models. For example, Rautaray et al. [6] design a system for
gestural interaction between the user and a computer in a dynamic environment. The gesture recognition system
employs image processing techniques for detection, segmentation, tracking, and recognition of hand gestures to
convert them into meaningful commands. The proposed interface can be effectively applied to various applications
such as image browsers, games, etc.
Some years later, Haria et al. [6 - 7] proposed a markerless hand gesture recognition system for tracking both
static and dynamic hand gestures. In the authors' proposed system, while detected static gestures are translated into
actions, dynamic gestures are utilized for interactions. The results presented by the authors indicate that human-
computer interaction can be achieved with minimal hardware requirements.
Ceolini et al. [3] present a fully neuromorphic sensor fusion approach for hand gesture recognition, consisting of
an event-based vision sensor and three different neuromorphic processors. They utilize the event-based camera,
called DVS, along with two neuromorphic platforms, Loihi and ODIN + MorphIC. EMG signals are recorded using
traditional electrodes and then converted into spikes to be fed into chips. They collected a dataset of five sign
language gestures where visual and electromyographic signals are synchronized.
More recently, Chang et al. [5] developed a hand gesture recognition system aimed at enhancing authentic,
efficient, and effortless human-computer interactions without additional devices, particularly for the speech-impaired
community, which relies solely on hand gestures for communication. The algorithm of this system consists of two
phases. The first phase involves Region of Interest Segmentation based on the color space segmentation technique,
with a predefined color range that removes pixels (hand) from the background region of interest (pixels not in the
desired interest area). The second phase of the system involves inputting the segmented images into a Convolutional
Neural Network (CNN) model for image categorization. For image training, the Python Keras package was used.
The system addressed the need for image segmentation in hand gesture recognition.
Nogales et al. [8] proposed the evaluation of a model with both manual feature extraction and automatic feature
extraction. Manual feature extraction was conducted using statistical functions of central tendency, while automatic
extraction was performed through CNN and BiLSTM. These features were also assessed with classifiers such as
Softmax, ANN, and SVM.
Cruz et al. [9] introduced a Reinforcement Learning (RL) approach to classify EMG-IMU signals obtained using
a Myo Armband sensor. For this purpose, they developed an agent based on the Deep Q-learning (DQN) algorithm
to learn a policy from online experiences for classifying EMG-IMU signals. They then tested the HGR system to
control two different robotic platforms. The first is a three-degree-of-freedom (3DoF) tandem helicopter test bench,
and the second is a six-degree-of-freedom (6DoF) UR5 virtual robot. They employ a designed system for hand
gesture recognition (HGR) and the inertial measurement unit (IMU) integrated into the Myo sensor to command and
control the movement of both platforms. The motion of the helicopter test bench and the UR5 robot is controlled by
a PID controller scheme. Experimental results demonstrate the effectiveness of using the proposed HGR system
based on DQN to control both platforms with rapid and precise responses.
Harini et al. [10] proposed a methodology for hand gesture recognition using Self-Organizing Map (SOM) with
Deep Convolutional Neural Network (DCNN). The experiments were conducted on a dataset consisting of 30 static
gestures and 6 dynamic gestures, and evaluated on an IIITA-ROBITA ISL gesture database to demonstrate
effectiveness. The proposed algorithm was then implemented to control household appliances.
John et al. [11] proposed a DenseNet-based architecture called Multidilated Convolution DenseNet (MDCDN),
which combines multidilated convolution and DenseNet to automatically extract features. The benefits of high-level
deep learning techniques are leveraged for hand gesture recognition. Python is used for architecture evaluation. The
proposed outcome is estimated in terms of accuracy, recall, F-measure, precision, etc., using real datasets ASL, ISL,
Massey, and HSR. Each dataset contains a large number of gesture classes, and their images have an equal amount
of uniform and complex backgrounds.
Caminate Na Rang et al. / Procedia Computer Science 256 (2025) 198–205 201

3. Materials and Methods

In this section are outlined in detail the methodology followed to develop a hand gesture recognition model. This
section is divided into three subsections, detailing the data collection process, image feature extraction, and the hand
gesture recognition model.

3.1. Data Collection

The data collection was carried out by capturing images with the webcam of a Dell Mobile Precision Workstation
7560, an HD RGB camera with a resolution of 0,92 megapixel and a diagonal viewing angle of 74.9 degrees. The
image capture process took place under controlled lighting and surrounding conditions to ensure data consistency
and quality. For each of the 26 letters of the Latin alphabet, were captured 2,000 images of the corresponding hand
gestures, totaling 52,000 images. In this way, the database created is made up of a total of 54,000 RGB images,
where 52,000 represent hand gestures corresponding to the 26 letters of the Latin alphabet and 2,000 corresponding
to hand gestures not representing any of these letters. The images have a resolution of 224 x 224 pixels. To ensure
data comprehensiveness, the hand gestures were performed by 5 individuals.

3.2. Dataset

For the dataset creation was used the MediaPipe open-source framework developed by Google [12], which
allows the processing, analysis, and extraction of information from different types of media, such as images, videos,
and audios. This code structure provides a wide range of functionalities for object detection and tracking, facial
recognition, pose estimation, hand detection, among others. For this work, the Hands Landmarks functionality [13]
was used, with which it is possible to detect the keypoint localization of 21 hand-knuckle coordinates within the
detected hand regions (Fig. 1). This functionality is based on two models: a palm detection model, which locates the
hand in the image; and a hand landmark detection model, which identifies specific hand landmarks in the cropped
hand image defined by the palm detection model.

Fig. 1. The 21 hand reference points detectable with MediaPipe framework.

To build the dataset, the collected images were processed with the Hands Landmarks model from the MediaPipe
framework, and the coordinates of the landmarks of the hand of each image were extracted and structured into an
annotated dataset. Each image was classified according to the alphabet letter represented by the hand. Fig. 2
illustrates the hand gesture images and the corresponding reference points for the letters B, D, and I.
a b c

Fig. 2. Keypoints detection for hand gesture corresponding to (a) the letter B; (b) the letter D; (c) the letter I.
202 Caminate Na Rang et al. / Procedia Computer Science 256 (2025) 198–205

3.3. Hand Gesture Recognition Model

To classify hand gestures, was used the Random Forest ML model [14], an algorithm widely used to solve
classification and regression problems. Random Forest belongs to the category of ensemble methods, where
different models are combined to obtain a single result. This characteristic makes the algorithms more robust and
complex, which leads to a higher computational cost, usually accompanied by better results. More specifically,
Random Forest is a combination of many decision trees based on the Bagging (Bootstrap Aggregating) method,
where are generated a set of base learners trained on bootstrap samples, which are combined to predict a result.
The main idea behind the Random Forest model is combining many decision trees into a big forest via the
Bagging method, making predictions based on the outputs of those trees.
The key points of Random Forest Model are the following:
• Bootstrap Sampling: During training, multiple training samples are created using bootstrap sampling technique.
This involves randomly selecting instances from the training set with replacement. These samples are used to
train individual trees.
• Decision Tree Construction: For each training sample, is constructed a decision tree. Often, the created decision
trees are limited in depth and have randomness in node splitting decisions.
• Majority Voting: When making a prediction, each tree in the ensemble provides a prediction, and the most
frequent class or value is chosen as the final prediction (for classification problems), or the average is calculated
(for regression problems).
• Variance Reduction: Random Forest helps reduce variance compared to a single decision tree, making it less
prone to overfitting. This is achieved by combining multiple independent trees.

In the context of this study, the Random Forest model is fed with the vector of features extracted from hand
gesture images along with the corresponding class, where each class corresponds to a letter of the Latin alphabet.
The additional class "nothing" was also considered, which classifies images of hand gestures for which the
corresponding letter cannot be identified.
To train the classification model, were used 67% of the collected images, corresponding to approximately 35,000
images. To ensure the minimization of bias and the maximization of the representativeness of the training dataset, a
random selection of images was carried out, balanced across the 27 existing classes.
The model was tested with the remaining images, which represent 33% of the total, corresponding to
approximately 17,000 images.

Fig. 3. Model architecture (a) training phase; (b) testing phase.

Caminate Na Rang et al. / Procedia Computer Science 256 (2025) 198–205 203

To evaluate the performance of the model in real and dynamic situations, was developed an application in Python
that uses the integrated camera of a computer, with which hand gestures are captured in real time, through the
integration of the MediaPipe framework. The application makes use of the pre-trained Random Forest model,
classifying the image corresponding to the user's hand gesture, the result of which can be viewed in real time. This
practical approach, without reserving a set of reserved images, demonstrates the capacity of the classification model
in situations very close to real ones, providing a robust assessment of its effectiveness.

4. Results and Discussion

The performance of the developed model was evaluated using not only test images, but also in real-time use
situations. Fig. 4 illustrates the visualization of the hand gesture classification results provided by the developed
application.
A

Fig. 4. Hand gesture classification results.

To evaluate the performance of the developed classification system, the metrics precision, recall and F1 Score
were calculated for each class. Additionally, to provide a more concise summary of the classification system
performance, macro- and micro-averaging metrics were also computed.
Considering, for each class, TP the number of true positives (hand gestures correctly classified), FN the number
of false negatives (hand gestures considered not to belong to the class in which they should be classified), and FP
the number of false positives (hand gestures classified in a wrong class), the metrics precision, recall and F1 Score
are defined by equations 1, 2 and 3, respectively.

𝑇𝑇𝑇𝑇
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 =
𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹
(1)

𝑇𝑇𝑇𝑇
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 =
𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹
(2)

𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃∗𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅
𝐹𝐹1 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 2 ∗
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃+𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅
(3)

Table 1 presents the classification performance evaluation metrics for each class.

Table 1. Classification evaluation metrics per class. All values are presented as a percentage (%)
F1 F1
Class Precision Recall Class Precision Recall Class Precision Recall F1 Score
Score Score
A 92.8 97.7 95.2 J 96.4 93.6 94.9 S 79.7 75.0 77.3
B 96.7 98.0 97.3 K 97.6 93.7 95.6 T 74.4 73.6 74.0
C 96.8 96.5 96.6 L 98.8 97.1 97.9 U 88.6 94.9 91.6
D 92.3 98.6 95.3 M 82.8 84.0 83.4 V 96.1 95.9 96.0
E 96.2 97.7 97.0 N 83.7 82.6 83.1 W 98.3 96.9 97.6
F 98.0 96.6 97.3 O 96.5 96.8 96.6 X 98.5 98.5 98.5
G 97.1 76.6 85.6 P 90.5 92.0 91.3 Y 98.3 90.2 94.1
H 80.7 97.7 88.4 Q 91.9 90.4 91.1 Z 98.6 95.0 96.7
I 93.8 96.5 95.1 R 97.6 91.7 94.6 “nothing” 81.7 93.8 87.3
204 Caminate Na Rang et al. / Procedia Computer Science 256 (2025) 198–205

Regarding the overall performance of the classification system, and considering N the total number of classes,
were computed the macro- and micro-averaged precision and recall, according to equations 4, 5, 6, and 7,
respectively.

𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐴𝐴 + 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐵𝐵 + ⋯ + 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑛𝑛𝑛𝑛𝑛𝑛ℎ𝑖𝑖𝑖𝑖𝑖𝑖

𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀−𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = (4)
𝑁𝑁
𝑇𝑇𝑇𝑇𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐴𝐴 + ⋯ + 𝑇𝑇𝑇𝑇𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑛𝑛𝑛𝑛𝑛𝑛ℎ𝑖𝑖𝑖𝑖𝑖𝑖
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀−𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = (5)
(𝑇𝑇𝑇𝑇𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐴𝐴 + 𝐹𝐹𝐹𝐹𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐴𝐴 ) + ⋯ + (𝑇𝑇𝑇𝑇𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑛𝑛𝑛𝑛𝑛𝑛ℎ𝑖𝑖𝑖𝑖𝑖𝑖 + 𝐹𝐹𝐹𝐹𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑛𝑛𝑛𝑛𝑛𝑛ℎ𝑖𝑖𝑖𝑖𝑖𝑖 )

𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐴𝐴 + 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐵𝐵 + ⋯ + 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑛𝑛𝑛𝑛𝑛𝑛ℎ𝑖𝑖𝑖𝑖𝑖𝑖

𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀−𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = (6)
𝑁𝑁
𝑇𝑇𝑇𝑇𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐴𝐴 + ⋯ + 𝑇𝑇𝑇𝑇𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑛𝑛𝑛𝑛𝑛𝑛ℎ𝑖𝑖𝑖𝑖𝑖𝑖
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀−𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = (7)
(𝑇𝑇𝑇𝑇𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐴𝐴 + 𝐹𝐹𝐹𝐹𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐴𝐴 ) + ⋯ + (𝑇𝑇𝑇𝑇𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑛𝑛𝑛𝑛𝑛𝑛ℎ𝑖𝑖𝑖𝑖𝑖𝑖 + 𝐹𝐹𝐹𝐹𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑛𝑛𝑛𝑛𝑛𝑛ℎ𝑖𝑖𝑖𝑖𝑖𝑖 )

Table 2 presents the macro- and micro-averaged evaluation metrics of the developed classification system.

Table 2. System performance evaluation.

Evaluation Metric
Macro-average precision 92.4%
Micro-average precision 92.3%
Macro-average recall 92.3%
Micro-average recall 92.3%

By analyzing the results in table 2, we found that, as expected, the values of micro-average precision and micro-
average recall are the same. This derives from the fact that a false negative for a given class corresponds to a false
positive for another class.
The results obtained demonstrate a remarkable performance of the Random Forest model in recognizing most of
the hand gestures representing the letters of the Latin alphabet. The accuracy of 92.3%, which equals micro-average
precision and recall, is a positive indicator of the success of the Random Forest model in classifying hand gestures,
which, as mentioned, is one of the components of sign language. However, detailed analysis by class reveals
variations in performance according to different hand gestures.
When examining performance by class, we observe that classes S and T present precisions below 80%, while
class G has a very similar recall value. The same can be observed for classes M and N, where precision values are
just above 80%. Such results are justifiable given the similarity of hand gestures corresponding to different letters,
such as the letters M and N, as well as the letters S and T, where the model presents some difficulty in distinguishing.
Another challenge to consider is the position of the fingers which, for some letters, may vary slightly from person to
person.
Comparing the results obtained with previous studies on hand gesture recognition, we can state that the approach
proposed in this paper achieved competitive performance. However, it is important to highlight that direct
comparison with previous studies should not be seen in absolute terms, depending on several factors such as the
databases (images), experimental configurations and/or algorithms used.
The practical implications of these results are significant, as the precise recognition of hand gestures has
applications in several areas, such as the communication of individuals unable to communicate verbally, human-
machine interaction, virtual reality, among others. The performance obtained by the proposed model reveals its
potential to be incorporated into application systems and devices in a real context, in more or less complex
situations.
Caminate Na Rang et al. / Procedia Computer Science 256 (2025) 198–205 205

5. Conclusions and Future Work

Being the oldest method of human communication, sign language is a form of non-verbal communication that
uses various parts of the body, where interpretation focuses on hand gestures, facial emotions, and body posture. In
addition to their importance in interpreting sign language, static hand gestures have application in several areas, such
as human-machine interaction, augmented reality, the aviation industry, among others. This applicability is one of
the factors that has driven the interest of the scientific community in the study and development of efficient
approaches for automatic hand gesture recognition. The performance of these approaches must be analyzed not only
from the point of view of the accuracy achieved in gesture recognition, but also regarding their processing time,
which must be compatible with applications in a real context.
In this paper we present a solution for hand gesture recognition, using the Random Forest ML classification
model. To implement the proposed approach, a database of 54.000 RGB images of hand gestures was created, from
which characteristics were extracted using the MediaPipe framework. The solution presents hand gesture
classification precisions by class that vary between 98.8% and 74.4%, with an accuracy of 92.3%, representing an
improvement compared to previous approaches. Despite these results, there is still room to improve the recognition
accuracy of some hand gestures, namely those corresponding to the letters of the Latin alphabet G, M, N, S and T.
Possible directions for future research include increasing the size of the dataset, investigating different image
preprocessing techniques, as well as exploring different machine/deep learning algorithms.

Acknowledgements

This work has been funded by (Portuguese) Foundation for Science and Technology (FCT), under the Project
UIDB/05567/2020.

References
[1] Alaghband, Marie, Hamid Reza Maghroor and Ivan Garibay. (2023) “A survey on sign language literature.” Machine Learning with
Applications 14: 100504. https://doi.org/10.1016/j.mlwa.2023.100504.
[2] Oudah, Munir, Ali Al-Naji and Javaan Chahl. (2020) “Hand Gesture Recognition Based on Computer Vision: A Review of Techniques.”
Journal of Imaging 6 (8): 73. https://doi.org/10.3390/jimaging6080073.
[3] Ceolini, Enea, Charlotte Frenkel, Sumit Bam Shrestha, Gemma Taverni, Lyes Khacef, Melika Payvand and Elisa Donati. (2020) “Hand-
Gesture Recognition Based on EMG and Event-Based Camera Sensor Fusion: A Benchmark in Neuromorphic Computing.” Frontiers in
Neuroscience 14:637. https://doi.org/10.3389/fnins.2020.00637.
[4] Eid, Ahmed and Friedhelm Schwenker. (2023) “Visual Static Hand Gesture Recognition Using Convolutional Neural Network.” Algorithms
16 (8): 361. https://doi.org/10.3390/a16080361.
[5] Chang Victor, Rahman Olamide Eniola, Lewis Golightly and Qianwen Ariel Xu. (2023) “An Exploration into Human–Computer Interaction:
Hand Gesture Recognition Management in a Challenging Environment.” SN Computer Science 4:441. https://doi.org/10.1007/s42979-023-
01751-y.
[6] Rautaray, Siddharth S. and Anupam Agrawal. (2012) “Real Time Gesture Recognition System for Interaction in Dynamic Environment.”
Procedia Technology 4:595–599. https://doi.org/https://doi.org/10.1016/j.protcy.2012.05.095.
[7] Haria, Aashni, Archanasri Subramanian, Nivedhitha Asokkumar, Shristi Poddar, and Jyothi S Nayak. (2017) “Hand Gesture Recognition for
Human Computer Interaction.” Procedia Computer Science 115:367–374. https://doi.org/10.1016/j.procs.2017.09.092.
[8] Nogales, Rubén E. and Marco E. Benalcázar. (2023) “Hand Gesture Recognition Using Automatic Feature Extraction and Deep Learning
Algorithms with Memory.” Big Data and Cognitive Computing 7:102. https://doi.org/10.3390/bdcc7020102.
[9] Cruz, Patricio J., Juan Pablo Vásconez, Ricardo Romero, Alex Chico, Marco E. Benalcázar, Robin Álvarez, Lorena Isabel Barona López and
Ángel Leonardo Valdivieso Caraguay. (2023) “A Deep Q-Network based hand gesture recognition system for control of robotic platforms.”
Scientific Reports 13:2045–2322. https://doi.org/10.1038/s41598-023-34540-x.
[10] Harini, K. and S. Uma Maheswari. “A novel static and dynamic hand gesture recognition using self organizing map with deep convolutional
neural network.” Automatika 64:1128–1140. https://doi.org/10.1080/00051144.2023.2251229.
[11] John, Jogi and DeshpandeShrinivas. (2023) “Static hand gesture recognition using multi-dilated DenseNet-based deep learning architecture.”
The Imaging Science Journal 71:221–243. https://doi.org/10.1080/13682199.2023.2179965.
[12] Google Developers. MediaPipe Framework, available on https://developers.google.com/mediapipe.
[13] Google Developers. MediaPipe Framework: Hand landmarks detection guide, available on
https://developers.google.com/mediapipe/solutions/vision/hand_landmarker.
[14] Breiman, Leo. (2001). Random forests. Machine learning 45 (1), 5–32.

PFX 48420843
No ratings yet
PFX 48420843
6 pages
Sign Language Recognition Using Machine Learning
No ratings yet
Sign Language Recognition Using Machine Learning
8 pages
Aksdcbfd12324 4354665tgdfhghdf
No ratings yet
Aksdcbfd12324 4354665tgdfhghdf
11 pages
Deep Learning-Based Sign Language Recognition System For Static Signs
No ratings yet
Deep Learning-Based Sign Language Recognition System For Static Signs
12 pages
Mathematics 11 03729
No ratings yet
Mathematics 11 03729
20 pages
Hand Gesture Based Sign Language Recognition Using Deep Learning
No ratings yet
Hand Gesture Based Sign Language Recognition Using Deep Learning
5 pages
IEEE Conference Template 1
No ratings yet
IEEE Conference Template 1
5 pages
SIGNLANGUAGE PPT
100% (1)
SIGNLANGUAGE PPT
15 pages
Real Time Hand Gesture Recognition Research
No ratings yet
Real Time Hand Gesture Recognition Research
11 pages
Deep Learning for Sign Language
No ratings yet
Deep Learning for Sign Language
5 pages
2021a1r002 1
No ratings yet
2021a1r002 1
14 pages
Python and Opencv For Sign Language Recognition
No ratings yet
Python and Opencv For Sign Language Recognition
5 pages
At 2 Manuscript
No ratings yet
At 2 Manuscript
2 pages
Research Paper 4
No ratings yet
Research Paper 4
4 pages
Indian Sign Language Classification and Recognition Using Machine Learning
No ratings yet
Indian Sign Language Classification and Recognition Using Machine Learning
6 pages
Journal Paper - Sign Language
No ratings yet
Journal Paper - Sign Language
10 pages
Sign Language Detection Using Deep Learning
No ratings yet
Sign Language Detection Using Deep Learning
7 pages
Convolutional Neural Network For Detection
No ratings yet
Convolutional Neural Network For Detection
4 pages
Sign 1
No ratings yet
Sign 1
10 pages
Sign Language To Text-Speech Translator Using Machine Learning
No ratings yet
Sign Language To Text-Speech Translator Using Machine Learning
5 pages
Dynamic Gesture Recognition For Sign Language Using Long Short Term Memory Networks
No ratings yet
Dynamic Gesture Recognition For Sign Language Using Long Short Term Memory Networks
7 pages
New Project Report
No ratings yet
New Project Report
48 pages
Real-time Sign Language Recognition
No ratings yet
Real-time Sign Language Recognition
9 pages
Final PPT Capstone Project
No ratings yet
Final PPT Capstone Project
17 pages
A Survey On Sign Language Recognition Systems
No ratings yet
A Survey On Sign Language Recognition Systems
27 pages
Sign Lang Detection Project
No ratings yet
Sign Lang Detection Project
16 pages
G7 Synopsis
No ratings yet
G7 Synopsis
14 pages
Recognition of Indian Sign Language Alphanumeric Gestures Based On Global Features
No ratings yet
Recognition of Indian Sign Language Alphanumeric Gestures Based On Global Features
6 pages
Sign Language Detection Using Mediapipe and Deep Learning
No ratings yet
Sign Language Detection Using Mediapipe and Deep Learning
6 pages
Paper 3+ijisae
No ratings yet
Paper 3+ijisae
15 pages
Visual Language Interpreter
No ratings yet
Visual Language Interpreter
7 pages
Blackbook
No ratings yet
Blackbook
35 pages
Sign Language Recognition System Using Convolutional Neural Network and Computer Vision
No ratings yet
Sign Language Recognition System Using Convolutional Neural Network and Computer Vision
6 pages
Deep Learning-Based Approach For Sign Language Gesture Recognition With Efficient Hand Gesture Representation
No ratings yet
Deep Learning-Based Approach For Sign Language Gesture Recognition With Efficient Hand Gesture Representation
16 pages
Aditya Engineering College (II Shift Polytechnic) : Sign Language Recognition System
No ratings yet
Aditya Engineering College (II Shift Polytechnic) : Sign Language Recognition System
18 pages
A Real Time Hand Gesture Recognition For Indian Sign Language Using Advanced Neu
No ratings yet
A Real Time Hand Gesture Recognition For Indian Sign Language Using Advanced Neu
5 pages
Hand Gesture Detection Using Deep Learning Demo
No ratings yet
Hand Gesture Detection Using Deep Learning Demo
9 pages
Staticsign CNN
No ratings yet
Staticsign CNN
8 pages
Presentation 1
No ratings yet
Presentation 1
12 pages
Deep Learning For Sign Language Recognition A Comp
No ratings yet
Deep Learning For Sign Language Recognition A Comp
40 pages
Sign Language Recogntion Report
No ratings yet
Sign Language Recogntion Report
29 pages
EPH-124 JISEM Vol.+10+No.+3+ (2025) Real-time+Recognition+of+American+Sign+Language
No ratings yet
EPH-124 JISEM Vol.+10+No.+3+ (2025) Real-time+Recognition+of+American+Sign+Language
17 pages
A Survey of Sign Language Recognition
No ratings yet
A Survey of Sign Language Recognition
6 pages
Enhancing Accessibility With Long Short-Term Memory-Based Sign Language Detection Systems
No ratings yet
Enhancing Accessibility With Long Short-Term Memory-Based Sign Language Detection Systems
8 pages
Hand - Gesture - Recognition IEEE 2023
No ratings yet
Hand - Gesture - Recognition IEEE 2023
4 pages
Sign Language
No ratings yet
Sign Language
9 pages
Hand Gesture
No ratings yet
Hand Gesture
8 pages
Research Paper - Sign Language Dect
No ratings yet
Research Paper - Sign Language Dect
9 pages
Sinais de Linguagem - Redes Neurais
No ratings yet
Sinais de Linguagem - Redes Neurais
12 pages
Gesture To Speech Deep Learning Models For Real-Time Sign Language Interpretation
No ratings yet
Gesture To Speech Deep Learning Models For Real-Time Sign Language Interpretation
5 pages
Sign Language Detection Presentation
No ratings yet
Sign Language Detection Presentation
9 pages
2-Deep Learning Approach For Sign Language Recognition
No ratings yet
2-Deep Learning Approach For Sign Language Recognition
10 pages
Research Paperours
No ratings yet
Research Paperours
6 pages
All Research
No ratings yet
All Research
133 pages
Fin Irjmets1682255678
No ratings yet
Fin Irjmets1682255678
5 pages
Sensors 24 06262
No ratings yet
Sensors 24 06262
16 pages
Bachelor's-Project Report - (Sign Language To Text Conversion)
No ratings yet
Bachelor's-Project Report - (Sign Language To Text Conversion)
30 pages
Recognizing and Transforming Sign Language To Speech
No ratings yet
Recognizing and Transforming Sign Language To Speech
23 pages
Deepcnn Handgestures
No ratings yet
Deepcnn Handgestures
9 pages
Review of Geoengineering Approaches To Mitigating Climate Change
No ratings yet
Review of Geoengineering Approaches To Mitigating Climate Change
10 pages
01 Guideline of Ipec 1
No ratings yet
01 Guideline of Ipec 1
178 pages
Geoengineering The Climate - An Overview and Update
No ratings yet
Geoengineering The Climate - An Overview and Update
10 pages
Woe 2 Card
No ratings yet
Woe 2 Card
16 pages
From Pixels To Letters A High-Accuracy CPU-real-time American Sign Language Detection Pipeline
No ratings yet
From Pixels To Letters A High-Accuracy CPU-real-time American Sign Language Detection Pipeline
15 pages
Software System Architecture Development For Intelligent Analysis of Web Application Performance Metrics
No ratings yet
Software System Architecture Development For Intelligent Analysis of Web Application Performance Metrics
10 pages
AffecTube - Chrome Extension For YouTube Video Affective
No ratings yet
AffecTube - Chrome Extension For YouTube Video Affective
7 pages
A Case Study Comparing Static Analysis Tools For Evaluating SwiftUI Projects
No ratings yet
A Case Study Comparing Static Analysis Tools For Evaluating SwiftUI Projects
10 pages
SAST Tool Evaluation in The Age of 5G and IoT Semgrep vs. Codacy
No ratings yet
SAST Tool Evaluation in The Age of 5G and IoT Semgrep vs. Codacy
6 pages
500 Quadratic Equation Questions Worksheet
No ratings yet
500 Quadratic Equation Questions Worksheet
94 pages
ML Basics for Beginners
No ratings yet
ML Basics for Beginners
20 pages
Association Rules 1. Data Yang Digunakan Adalah Sebagai Berikut
No ratings yet
Association Rules 1. Data Yang Digunakan Adalah Sebagai Berikut
7 pages
Niranjan Kumar Singh Encryption Algorithm
No ratings yet
Niranjan Kumar Singh Encryption Algorithm
3 pages
Chapter 1 - Lesson 1 - Course Intro and Discrete or Continuous Random Variables
No ratings yet
Chapter 1 - Lesson 1 - Course Intro and Discrete or Continuous Random Variables
21 pages
Data Structure Questions
No ratings yet
Data Structure Questions
13 pages
Discrete Time Signals PDF
100% (1)
Discrete Time Signals PDF
13 pages
Immigrant Disease Testing Decision Analysis
No ratings yet
Immigrant Disease Testing Decision Analysis
4 pages
Computational Intelligence (CS3030/CS3031) : School of Computer Engineering, KIIT-DU, BBS-24, India
No ratings yet
Computational Intelligence (CS3030/CS3031) : School of Computer Engineering, KIIT-DU, BBS-24, India
2 pages
Column Generation With Gams
No ratings yet
Column Generation With Gams
19 pages
Student Compound Interest Problems
No ratings yet
Student Compound Interest Problems
2 pages
A Tableau-Based Theorem Proving Method For Intuitionistic Logic
No ratings yet
A Tableau-Based Theorem Proving Method For Intuitionistic Logic
8 pages
Chapter Two - DS Algorithm Analysis
No ratings yet
Chapter Two - DS Algorithm Analysis
32 pages
Enhancing Exercise Form A Pose Estimation Approach With Body Landmark Detection
No ratings yet
Enhancing Exercise Form A Pose Estimation Approach With Body Landmark Detection
6 pages
Assignement 1 ECE 434 AI
No ratings yet
Assignement 1 ECE 434 AI
4 pages
Chapter 9
No ratings yet
Chapter 9
6 pages
Mathematics SS2 3RD Term
No ratings yet
Mathematics SS2 3RD Term
29 pages
Data Structures Algorithms U5
No ratings yet
Data Structures Algorithms U5
83 pages
Lecture3 PDF
No ratings yet
Lecture3 PDF
15 pages
A Comprehensive Approach Towards Data Preprocessing Techniques & Association Rules
No ratings yet
A Comprehensive Approach Towards Data Preprocessing Techniques & Association Rules
9 pages
Turtle Programming - Encryption in Python Final PDF
No ratings yet
Turtle Programming - Encryption in Python Final PDF
14 pages
Computer Vision: Chapter 5. Segmentation
100% (1)
Computer Vision: Chapter 5. Segmentation
16 pages
Differential Geometry Exam 2017
No ratings yet
Differential Geometry Exam 2017
2 pages
Advanced FEM Solver for Aerofoil Flow
No ratings yet
Advanced FEM Solver for Aerofoil Flow
6 pages
ICPC Problem
No ratings yet
ICPC Problem
3 pages
MATHS Mini Project Sem4
No ratings yet
MATHS Mini Project Sem4
10 pages
Week 12 Introduction To Artificial Intelligence Week 12
No ratings yet
Week 12 Introduction To Artificial Intelligence Week 12
4 pages
ch10 Sequence Modelling - Recurrent and Recursive Nets
No ratings yet
ch10 Sequence Modelling - Recurrent and Recursive Nets
45 pages
Swapandeep Kaur 1910941059 Research Paper 8
No ratings yet
Swapandeep Kaur 1910941059 Research Paper 8
28 pages
Dokumen - Tips - Mmds 2014 Talk Distributing ML Algorithms From Gpus To The Cloud
No ratings yet
Dokumen - Tips - Mmds 2014 Talk Distributing ML Algorithms From Gpus To The Cloud
34 pages

Hand Gesture Recognition Using Machine Learning

Uploaded by

Hand Gesture Recognition Using Machine Learning

Uploaded by

Available online at www.sciencedirect.

CENTERIS – International Conference on ENTERprise Information Systems / ProjMAN –

Hand Gesture Recognition using Machine Learning

1877-0509 © 2025 The Authors. Published by Elsevier B.V.

3. Materials and Methods

3.1. Data Collection

Fig. 1. The 21 hand reference points detectable with MediaPipe framework.

3.3. Hand Gesture Recognition Model

Fig. 3. Model architecture (a) training phase; (b) testing phase.

4. Results and Discussion

Fig. 4. Hand gesture classification results.

Table 2. System performance evaluation.

5. Conclusions and Future Work

You might also like