SDS 2
SDS 2
By
Junaid Sajid 2021-BS-AI-018
Zalwisha Khan 2021-BS-AI-019
Muhammad Abdullah 2021-BS-AI-020
Supervised by
Dr Uzair Saeed
ii
Application Evaluation History
Comments (by committee) Action Taken
*Include the ones given at scope time
both in documentation and
presentation
Make it more clear Using 2 languages to make it more clear
and easy to understand.
Supervised by
DR Uzair Saeed
Signature: __________________
iii
Abstract
There are many sign languages that are used around the world, by more than million
deaf people or by those who have hearing difficulty. The most commonly used sign
language standard is ASL (American Sign Language). However, this communication
gap has existed for years and is now being narrowed by introducing various techniques
to automate the detection of sign gestures. In this project, the user can be able to capture
hand images through the web camera and then the system must be able to tell what
gesture or hand sign is being performed. The gesture then can be translated into text or
voice language. For deaf people understanding voice language transformed into sign
language. The captured images of hand gestures and voice are processed using image
processing tools and with the help of Machine Learning algorithm. Additionally, the
system incorporates voice recognition capabilities to enhance user interaction and a
place where user can also learn sign languages. The final solution integrates these
components into a user-friendly interface.
iv
Table of Contents
CHAPTER 1: INTRODUCTION .................................................................................. 1
1.8.7 Revenue......................................................................................................... 6
v
2.1.5 Real-time Processing .................................................................................. 11
vi
4.4 Data Representation ..................................................................................... 41
References ................................................................................................................ 48
vii
List of Tables
Table 3.1 Software requirements ................................................................................. 28
viii
CHAPTER 1: INTRODUCTION
1.1 Introduction
Gestures play a significant role in communication. For deaf and mute people sign
language plays an important role in their life. Sign language has different vocabulary,
syntax, and meaning which are not understood by a normal person if he/she does not
have basic knowledge about sign language then it causes many communication
problems. Using deep learning algorithms and image processing we can able to classify
sign language gestures to their corresponding words or letters. CV (computer vision)
enables machines to interpret and process body movements and hand gestures visually.
It tracks, detects, and analyzes body and hand movements in real-time to translate them
into meaningful output. Algorithms like CNN identify hand movement by focusing on
the shape and position of fingers, and then the system extracts key features from
gestures to differentiate between signs. Once the sign is recognized it is mapped into
meaningful textual data and then the text is passed from text to speech engine which
converts textual data into voice and vice versa. [1][2]. A user can also learn sign
languages. Different sign languages courses will be available.
Sign language consists of many gestures with include various hand and arm
movements. The existing Systems lack real-time responsiveness and may not always
be readily available, leading to communication challenges for the deaf and hard-of-
hearing community. They provide facilities to learn sign language but do not interpret
sign language to voice in real time efficiently. Some of existing system uses hardware
to convert sign language into voice. These systems are only limited to one or two sign
languages. To use this system a person should have knowledge about sign language.
1
1.4 Proposed System
The scope of voice and sign language recognition systems is to facilitate the
communication between spoken language users and sign language user. To build a
multi-model that enables real-time processing of both voice and gesture inputs. Handle
simultaneous input from both voice and hand gesture then combine both inputs into
actionable response. Support multiple languages if required (start with one single
language and extend later). Build user user-friendly interface for users to interact with
the system without greater knowledge about the system.
2
Constraints:
• Overcome the gap between sign language users and spoken users.
• Improved access to public services.
• Facilitate accessibilities in different sectors.
• Employment opportunities.
• Reduce need of interpreters.
• Real-time communication.
3
1.8.2 Problem
• Subscription prices.
Introducing the right balance between affordability and profitability is difficult.
Incorrect pricing can result in either underuse or financial loss.
• Customer retention.
If a user finds app confusing to use or is not satisfied with its performance, they may
cancel their subscription.
1.8.3 Solution
1.8.4 Customers
• Educational institutions.
Educational centres with students requiring special assistance in communication
they need real time translation tools that support sign language user environment
by enabling communication between students with and without disabilities.
• Healthcare sectors.
Hospitals, clinics, and healthcare professionals that need quick and efficient
communication with patients.
4
• Government agencies.
Organizations offering services to people including disabled people need tools that
allow seamless interaction between them.
1.8.5 Competitors
Limitations
• This app only supports limited languages.
• Focuses only on conversion of spoken or text input into sign language but do
not support voice recognition or sign language to text or voice output.
• Designed only for students.
• Available only on mobile app
Saksham app
Limitations
• Limited vocabulary: The app mainly covers basic words and phrases, which
makes it less useful for complex conversations.
• Inconsistent translation: It sometimes struggles to accurately translate certain
phrases or words due to the nuances of sign language.
Limitations:
• It doesn't support sign language, so it's not a complete solution for all users.
• Our project supports a dual recognition system for both voice input and hand sign
language recognition.
• Uses real-time AI-based hand voice recognition.
• Designed for individuals, institutions, businesses.
• Offers for both mobile and enterprise level.
5
1.8.6 Marketing Plan
• Digital marketing
Publish blog content, run targeted ads on Google and social media platforms,
and collaborate with influencers and organizations.
• Partnerships and collaborations.
Partner with non-profit organizations, and companies interested in improving
workplace accessibility.
• Workshops.
Organize workshops for schools, healthcare and businesses to demonstrate the
system benefits. Show case systems relevant exhibitions to attract potential
partners and customers.
• Promotional trails and offers.
Offer limited trials to allow organizations and individuals to experience the
system.
1.8.7 Revenue
• Subscription model.
Charge user fee for continuous access to the software. Offer a basic free version with
essential feature and charge for advance features.
• Courses.
Offering multiple sign languages courses to learn.
Strength
• Dual recognition system.
Combination of voice and hand sign language recognition enhances
communication.
• Inclusivity and accessibility focus.
Support for deaf, mute user and hearing user making system highly inclusive.
• Multi-platform deployment.
Availability on different devices offers flexibility.
• Real time performance
Fast performance reduce communication delay.
6
Weaknesses
• Potential technical challenges.
Gesture detection errors or voice misrecognition issues like these in noisy
environments.
• Limited initial language support.
At initial stage support few languages but need to expand languages coverage.
Opportunities
• Technology partnerships
Collaboration with hardware and software manufactures to enhance
functionality.
• Global market expansion
Opportunity to deploy system in global market so that international users can
also use app.
Threats
• Market saturation risk
Entry of new competitors or alternative solutions could impact market share.
• Competition from established platforms
Competitors like Google, Microsoft, and Hand Talk pose a threat.
• Technology evolution
Rapid advancements in AI may require continuous updates to remain competitive.
7
• Feature: User-friendly interface.
Attribute: Features with visual aids for better usability.
Benefit: No high-level knowledge required to operate app.
• Feature: Secure data handling.
Attribute: Handle user data securely.
Benefit: Build user trust by protecting personal information.
The project aims to bridge communication gaps between sign language users and non-
sign language speakers using advanced technology. Recognizing the critical role of
sign language for the deaf and mute community, the system utilizes deep learning and
computer vision to translate gestures into voice output and text. Objectives include
facilitating real-time communication across various sectors, such as healthcare and
education. Current systems struggle with real-time responsiveness and often require
users to know sign language. This project proposes a dual-function system that
converts gestures to voice and vice versa. The scope encompasses multi-language
support and a user-friendly interface, ensuring accessibility for all. Assumptions
include efficient noise cancellation and data security, while constraints highlight the
need for high-performance hardware and internet connectivity.
8
CHAPTER 2: LITERATURE REVIEW/BACKGROUND
AND EXISTING WORK
2.1 Background
Sign language is structured form of hand motion which are used in non-verbal
communication, because it is not possible to talk to everybody with one language or
simple hand gestures. Sign language is useful in non-verbal communication when the
person you’re speaking with cannot speak or hear. In this case, our project is useful to
facilitate two-way communication. Different authors use different techniques for sign
language recognition, for classification which are 3D CNN, Long Short-Term Memory
combined with CNN and object detection algorithm YOLO v5 for detecting hand
movements. These models give impressive performance in object detection model for
identifying dynamic hand gestures, but they used less amount of dataset so their model
give accuracy of 82% and their model is only for one way communication. Our
proposed system uses CNN model for detecting hand gesture while maintaining good
accuracy. [8], [9]
Most common sign languages are (ASL) American sign language, (BSL) British sign
language, (FSL) French Sign Language. Countries have their own versions of sign
language Russian Sign Language (RSL), and Japanese Sign Language (JSL), each with
their own vocabulary structure and unique features. Hand gesture recognition has
complex processes such as motion analysis, pattern recognition, motion modeling.
Different viewpoints cause gestures to appear differently on 2D space, some researchers
use wrist bands or colored gloves to help in hand segmentation process it reduces the
complexity of segmentation process. Different evaluation criteria are used to measure
the performance of system and overall accuracy.[10] Processing sign languages through
system include several techniques like computer vision, gesture recognition, voice
recognition and machine learning algorithm.[11]
The article uses deep learning model for detecting and recognizing words from Indian
sign language ISL. The authors use a model Long Short-Term Memory (LSTM) and
Gated Recurrent Units (GRU) to identify signs from ISL video frames. The model was
trained on a custom dataset, IISL2020, which includes 11 static signs and 630 samples.
9
The best-performing model, consisting of a single layer of LSTM followed by GRU,
achieved around 97% accuracy over 11 different signs.[1]
In these article the author designs a desktop application to American Sign Language
(ASL) into text in real-time. The system uses Convolutional Neural Networks (CNN)
for classification, achieving an accuracy of 96.3% for the 26 letters of the alphabet. The
application processes hand images through a filter and classifier to predict the class of
hand gestures, facilitating communication for individuals with hearing or speech
impairments. The researchers used the ASL Hand Sign Dataset from Kaggle, which
contains 24 classes with Gaussian blur applied. The model uses a Convolutional Neural
Network (CNN) with two layers. The first layer processes the hand image to predict
and detect frames, while the second layer checks and predicts symbols or letters that
look similar. The model was trained using 30,526 images for training data and 8,958
images for testing data. The training involved 20 epochs to optimize accuracy and
reduce loss. The application uses a live camera to capture hand gestures, which are then
processed and converted to text in real-time. The system includes an autocorrect feature
using the Hunspell library to improve accuracy. Performance was measured using a
confusion matrix, achieving a final accuracy rate of 96.3%.[12][13][14]
10
2.1.4 Feature Extraction
To process sign language, key points on the hands, such as finger tips and joints,
are detected and analyzed. These key points help the computer understand the
shape and movement of the hands, which are essential for recognizing signs.
Key features are extracted from speech signal to represent its acoustic
properties. The signals are then converted into spectrogram then it will pass
from CNN model to extract features
American Sign Language (ASL) is expressive visual language that is used by deaf
and mute people. ASL has its own grammar, syntax, and vocabulary, making it different
from English despite sharing the same geographical region.
11
• Facial expression
In ASL, facial expressions and head movements play an essential role in
grammar. These non manual signals are important as hand gesture in creating
meaning.
• Grammar and Structure
ASL has its own set of grammatical rules that are separate from English.In ASL,
the typical sentence structure is Subject-Object-Verb (SOV), which differs
from the English Subject-Verb-Object (SVO) structure.
• Finger Spelling
ASL incorporates a system for spelling out words that don’t have any specific
signs. Each letter of the English alphabet have a unique hand gestures, and words
are spelled out by forming these letter shapes sequentially.
British Sign Language (BSL) is the primary sign language used by Deaf
communities in the United Kingdom. Like ASL, BSL is a visual and spatial
language, but it entirely different in its grammar, syntax, and vocabulary. BSL is
not a variant of English, but rather a fully developed language with its own
grammar and linguistic features.
• Distinct Vocabulary
BSL has its own unique signs for each word, unlike ASL, which have same
signs as English. ASL and BSL are entirely different from each other.
• Facial Expressions and Body Posture
Similar to ASL, BSL has facial expressions, eye contact, and body posture are
critical for effective communication. Non-manual features help clarify meaning
and convey emotion.
• Word Order and Syntax
Similar to ASL, BSL often follows a Topic-Comment structure, where the
topic is introduced first, followed by a comment.
12
• Fingerspelling
Like ASL, BSL uses fingerspelling for proper names, places, and words that do
not have established signs. BSL fingerspelling uses the British Alphabet, which
is different from the ASL alphabet.
1. Saksham
Overview: Saksham is an app developed by the Saksham foundation in Pakistan to
bridge the communication gap between the deaf and the hearing world. The app
translates text or voice into Pakistan Sign Language (PSL). It serves as a dictionary and
translator to make communication easier for people who are deaf or hard of hearing.
2. Google's Live Transcribe
Overview: Google's Live Transcribe app is available in many languages, including
Urdu. It transcribes speech into real-time text, allowing people with hearing
impairments to follow conversations. While it does not directly involve sign language,
it helps bridge the communication gap by providing written translations of spoken
words.
Our Model:
Our model is designed for facilitate communication between persons with hearing or
speaking difficulty through multiple sign languages like ASL, PSL, BSL. The idea is
to develop a multilingual platform capable of reaching a wide number of users
from different linguistic backgrounds and making things easier for them to
communicate and be understood. Our model will recognize hand gesture and
convert it into sign language for sign language user to understand what sign
language user is trying to say. To facilitate two-way communication voice is also
converted into hand sign language to better understanding of what normal person is
13
trying to say to sign language user. Including these languages will help us reach
out to the global deaf community and help them communicate effectively in their
native sign languages. The existing system only do one way communication and
mostly they provide courses to lean sign language not real time communication. The
core of our model lies in its simplicity. The app also features real-time translation of
common phrases, which enhances the learning experience and makes it practical for
daily use
Hand Talk Real-time text- Multiple spoken Deaf and hearing One-way (text/voice to
App to-spoken and sign individuals sign language)
language and languages
sign language
translation using
animated 3D
characters.
Our Multilingual ASL, PSL, BSL. Global deaf and Two-way (gesture ↔
Model platform hearing voice/sign)
supporting real- communities
time two-way
communication.
14
2.3 Literature Summary
Hand gesture recognition involves complex stages such as motion analysis, pattern
recognition, and feature extraction. Key points on the hand, such as fingertips and
joints, are identified to capture the gesture's shape and motion. CNN models are
commonly used to analyze these features, ensuring accurate recognition. For real-time
applications, advanced algorithms are employed to process gestures continuously,
enabling seamless communication. Some systems incorporate additional tools like
wristbands or gloves to simplify the segmentation process.
Speech recognition systems use microphones to capture audio, which is then filtered to
remove background noise and enhance clarity. Natural Language Processing (NLP)
models predict word sequences based on recognized phonemes. When integrated with
sign language systems, these models enable two-way communication by converting
spoken words into sign gestures and vice versa..
15
CHAPTER 3: REQUIREMENTS ANALYSIS
3.1 Stakeholders List (Actors)
• Project Manager
Manage project planning, coordination, and monitoring. Guarantee that the
project is completed on schedule, affordable, and to quality standards. Stays in
touch with all parties and sorts out any issues that emerge throughout the
project.
• Software Development Team
Generates and applies key functions including voice recognition and sign
language recognition. Develops and upgrades algorithms for gesture detection,
machine learning, and natural language processing. Provides recurring reports
on development progress and manages technical issues.
• Machine Learning Engineers
Creates, instructs and upgrades machine learning models for speech and sign
language recognition. Assemble and prerender data to improve the validity of
recognition algorithms. Works jointly with the software team to apply AI
models in the system.
• UI and UX Designers
Create an instinctive and reachable user interface that considers the needs of
those who have hearing or speech disablement. Organizes usability testing to
make sure that the platform is easily operated. Focus on visual elements such as
sign language symbols, controls, and text-to-speech or voice-to-text changing
interfaces.
• Quality Assurance (QA) Team
Test software usefulness, performance, and accuracy. Supervises that the sign and
speech recognition systems fulfill the required criteria. Cooperates with end users to
collect response, fix errors, and enhances the application's solidness.
• Sponsors or Investors
Economic aid for the project. Supervises project milestones and gives advice to
ensure that the project cooperates with company goals.
16
3.2 Requirements Elicitation
For the project “Voice and Hand Sign Language Recognition,” elicitation makes
certain that the system meets the needs of end-users, namely the individuals with
hearing or speech disablement. The goal is to collect non- functional, functional, and
domain-specific demands to lead the system’s development, counting the identification
of errorless voice command interpretation and various sign languages.
Various methods were employed to cluster together the requirements for the project.
The following techniques were selected to make certain the extensive and precise data
collection:
1. Interviews
2. Observation
• Purpose: Notices how individuals come in contact using voice and sign language in
real-life scenarios.
• Objective: Recognizes gaps in current systems and recognizes user behaviour.
4. Prototyping
17
• Objective: Gather user feedback in order to refine conditions and upgrade system
design repeatedly.
5. Document Analysis
• Purpose: Inspect existing datasets, research papers and technical documentation set
referred to voice and sign language recognition.
• Objective: Influence initial work and excellence to define system demands.
6. Focus Groups
• Purpose: States precisely the real-life scenarios where the system comes into function.
• Objective: Defines functional demands, such as interpret hand signs into text or
distinguishing voice commands in rowdy environments.
The functional requirements are defined as the central functionalities of the Voice and
Hand Sign Language Recognition System. These requirements state the expected
behaviour of the system to confirm it meets user needs fruitfully.
Expected Behaviour:
Description: The system shall recognize hand signs from individuals and
translate them into speech or text in real-time.
18
Expected Behaviour:
• Determine and translate gestures from multiple sign languages (e.g., ASL, BSL).
• Faultlessly identify hand movements and fixed gestures by using a camera.
• Work expertly under different brightness and angles.
Description: The system shall operate both hand sign inputs and voice
commands concurrently.
Expected Behaviour:
• Categorize inputs based on user preference (e.g., prioritize voice when both inputs are
active).
• Consistently changes between voice and sign recognition as required.
4. Translation Output
Description: The system shall translate recognized voice and hand signs into
speech or text output.
Expected Behaviour:
5. User Authentication
Description: The system shall provide a safe ser log in for customized settings.
Expected Behaviour:
6. Language Support
Description: Multiple languages shall be supported by the system for both sign
and voice recognition.
Expected Behaviour:
19
7. Customizable Interface
Expected Behaviour:
• Permit users to manage system settings, like font size, output preferences and language.
• Provide users with both auditory and visual feedback for interactions.
Description: The system shall detect any errors during recognition and notify
the users about them.
Expected Behaviour:
• Provide suggestions on how to correct gestures or commands that are not recognized.
• Present clear error messages for system problems and invalid issues.
9. System Performance
Description: The system shall operate expertly and give real-time responses.
Expected Behaviour:
• Hand sign and voice inputs are processed within minimum time.
• Sustain recognition correctness of at least 90%.
1. Usability Requirements
• The user interface must be intuitive, easy to use, and accessible to individuals with
different levels of technical proficiency.
• The system must provide visual and auditory response for inputs and outputs to
increase user interaction.
• Users shall be able to change settings, like language preferences, input modes and font
size to match their needs.
20
2. Scalability Requirements
3. Reliability Requirements
• The system shall work continuously unaccompanied by the failure for at least 99.5%
uptime.
• The system shall handle unexpected inputs gracefully and provide meaningful error
messages.
4. Portability Requirements
5. Maintainability Requirements
• The system’s codebase shall be modular and well-documented to facilitate updates and
debugging.
• Regular updates shall be provided to improve performance, add features, and fix bugs.
• A version control system shall be used to manage changes in the codebase.
6. Environmental Requirements
• The system shall ensure unbiased recognition of all voices and hand signs, regardless
of gender, ethnicity, or cultural background.
21
• All collected data shall be used solely for improving system performance and shall not
be shared without user consent.
22
providing guidance on
correcting errors.
This section outlines the use cases for the Voice and Hand Sign Language
Recognition System. The use cases describe key interactions between the users and
the system, illustrating how the system supports its functionality.
23
Use Case 1: Recognize Voice Commands
Main Flow:
3. The system translates the voice into text or performs the associated
action.
Main Flow:
Alternative Flow: If the sign is not recognized, the system displays an error
message and suggests corrective actions.
24
Use Case 3: Translate Inputs into Outputs
Goal: Convert recognized inputs (voice or hand signs) into outputs (text or
speech).
Main Flow:
Alternative Flow: If translation fails, the system requests the user to repeat
the input.
Main Flow:
25
Use Case 5: Courses
Main Flow:
2. User can login to profile and select any sign language to learn.
The agile model was chosen for the development of speech and sign language
recognition systems. Due to its cyclical and adaptive nature, it is very suitable for the
dynamic needs of the project and the need for frequent feedback.
• Dynamic Demands
The project includes latest features like audio and sign language recognition,
which may require continuous clarifying based on user response or
26
technological progress. Agile’s flexibility secures the system matures in
reaction to stakeholder requires.
• Stakeholder Cooperation
• Repetitive Conveyance
• Concentrate on Quality
Continuous trials in every repetition ensures that the system is authentic and
meets performance standards. Feedback loops increase precision, usability, and
approachability with every sprint.
27
3.5.2 Software Requirement
Operating system
window 10.
Programming
Python
Voice Recognition
• PyAudio
Hand Sign Language Recognition
• OpenCV
Data Processing & Analysis
• Matplotlib
• Pytorch
Web Frameworks
• Django
Development Tools
Requirements Versions
Python 3.11.4
PyAudio 0.2.13
OpenCV 4.8.0
Matplotlib 3.8.0
Pytorch 3.9
Django 5.1.4
28
CHAPTER 4: SOFTWARE DESIGN SPECIFICATION
4.1 Design Models
• Voice Recognition:
Objects like Audio Input , Signal Processor, and Voice Recognition Model
encapsulate specific tasks. Enables modular representation of processes such as
feature extraction, model inference, and output formatting.
1. Data Preparation
29
2. Model Development
• Voice Recognition Model
▪ Develop a deep learning model for voice-to-text
▪ Fine-tune using a pre-trained speech recognition model
• Hand Gesture Recognition Model
▪ Train a CNN-based model for static gestures.
▪ Fine-tune CNN models for dynamic gestures
• Model Optimization
Implement techniques like pruning to improve model performance on edge
devices
3. System Integration
1. Application Layer
Purpose: Handles user interaction and delivers the system’s functionalities.
Responsibilities:
• Provide an intuitive and accessible User Interface (UI) for
communication.
• Allow users to input voice or gestures and receive corresponding
outputs.
• Manage user settings, preferences, and feedback collection.
Components:
• Graphical User Interface (GUI)
• Input/Output Control ( microphone, camera, speaker, screen)
2. Presentation Layer
30
Responsibilities:
Components:
3. Processing Layer
Responsibilities:
Components:
• Translation Engine
Responsibilities:
31
• Log user interactions and model usage for performance improvements.
• Enable training and updates for recognition models using stored data.
Components:
• APIs
5. Communication Layer
Responsibilities:
Components:
6. Input/Output Layer
Purpose: Interface with external hardware for data collection and output
delivery.
Responsibilities:
• Capture raw input data (voice via microphone and gestures via camera).
Components:
32
4.3.1 Block Diagram
1. Input Layer:
2. Preprocessing Layer:
3. Recognition Modules:
33
▪ Hand Sign Recognition: Employs computer vision algorithms to
detect and classify hand gestures.
4. Translation Engine:
5. Output Layer:
• Output Delivery:
The translated content is routed to the appropriate output device for user
communication.
34
4.3.2 Component Diagram
1. Input System:
2. Processing System:
3. Translation System:
35
• Sign-to-Text Engine: Maps recognized gestures into text or
commands for output.
5. Output System:
The system is structured into six core layers, each containing subsystems and modules
designed for specific functions. The layers ensure clarity in operations and scalability
for future enhancements.
36
1. Presentation Layer (User Interface)
This layer serves as the system's interface for user interaction. It includes:
• Input Interfaces:
• Output Interfaces:
▪ Speaker:
▪ Screen/Display:
• Workflow Manager:
37
• Communication Manager:
3. Recognition Layer
This layer focuses on converting raw user inputs into machine-readable formats.
• Multi-Language Support:
Both modules are designed to support multiple spoken and sign languages (e.g.,
ASL, BSL, or ISL for sign language; English, Spanish, or Hindi for voice).
4. Translation Layer
This layer handles the conversion of recognized inputs into meaningful outputs.
38
Converts transcribed speech into sign language. Uses animated avatars or
graphical displays to depict sign language gestures. Incorporates contextual
understanding to improve the accuracy of sign generation.
Stores pre-trained models for ASR, gesture recognition, and translation. Allows
periodic updates for enhanced accuracy and additional language support.
6. Integration Layer
This layer connects hardware and external services with the software.
• Device Drivers:
• API Gateway:
39
▪ Gesture recognition libraries for specialized sign languages.
1. Input Capture:
• Audio signals are processed to remove noise, then transcribed into text.
3. Translation:
4. Output Generation:
5. Error Management:
• If inputs are unclear, the system requests clarification from the user or
switches to a fallback mode (e.g., text-based interaction).
6. Data Storage:
• Translations, user preferences, and logs are stored for future reference
and system improvements.
40
4.4 Data Representation
• Voice Input Module: Processes user-provided speech into actionable text using STT.
• Hand Sign Input Module: Captures gestures, extracts features, and translates them to
text.
• Translation Module: Processes text for interpretation or conversion into the desired
output.
• Output Modules: Converts processed data into either spoken words via TTS or visual
sign language using a virtual avatar.
41
• Gesture: Captures hand gesture details with a unique GestureID and links to
languages.
• Voice Input: Represents the spoken input, associated with languages and a
unique VoiceID.
• Translation: Converts the source language (text, gesture, or voice) into a target
type (text, sign, or video).
• Output Modules: Includes "Text/Video" and "Sign" output, with relevant IDs
and languages.
• User Class: Contains attributes like userID, name, and output format
preferences, along with methods to set and get preferences.
• Output Class: Manages the generation of output content (text, video, or sign)
in the desired format.
Significance:
42
• Highlights the reusability and scalability of the components for different types
of input and output.
• Root System:
The entire system is represented as the root, encompassing all major modules.
• Modules:
Input Module: Handles user inputs (voice and gestures), preprocessing, and
feature extraction.
43
Translation Module: Maps inputs to appropriate outputs using a language
model and data mapping processes.
• Leaf Nodes:
• Input Stage:
The system receives input from the user in two forms Voice Input: The user
speaks into a microphone. Gesture Input: The user performs gestures in front
of a camera.
• Preprocessing Stage:
Voice Input: Processes the audio signal to remove noise and extracts
transcribable data. Gesture Input: Analyzes captured images or videos to
recognize gestures.
44
• Recognition Stage:
• Translation Stage:
Converts the recognized input (text, gesture, or voice) into the desired output
type: text, speech, or sign.Uses language models, translation databases, and
gesture mapping.
4.5.1 Flowchart
• Input Stage:
45
• Preprocessing Stage:
For voice input, the system preprocesses the audio to remove noise and
enhance clarity.
For gesture input, the system processes image or video data to recognize
hand movements.
• Recognition Stage:
• Translation Stage:
Translates the recognized input (text or gesture) into the desired output format
(text, sign, or speech).
• Output Stage:
46
• Input Stage:
▪ The InputModule forwards the raw input data to the Preprocessor for
cleaning and feature extraction.
• Output Generation:
47
References
[1] D. Kothadiya, C. Bhatt, K. Sapariya, K. Patel, A. B. Gil-González, and J. M.
Corchado, “Deepsign: Sign Language Detection and Recognition Using Deep
Learning,” Electronics (Switzerland), vol. 11, no. 11, Jun. 2022, doi:
10.3390/electronics11111780.
[2] P. V.V and P. R. Kumar, “Segment, Track, Extract, Recognize and Convert Sign
Language Videos to Voice/Text,” International Journal of Advanced Computer
Science and Applications, vol. 3, no. 6, 2012, doi:
10.14569/ijacsa.2012.030608.
[4] Mr. K. Singh, Mr. M. Shahi, and Mr. N. Gupta, “Conversion of Sign Language
to Text,” Int J Res Appl Sci Eng Technol, vol. 12, no. 5, 2024, doi:
10.22214/ijraset.2024.60947.
[5] W. Li, H. Pu, and R. Wang, “Sign Language Recognition Based on Computer
Vision,” in 2021 IEEE International Conference on Artificial Intelligence and
Computer Applications, ICAICA 2021, 2021. doi:
10.1109/ICAICA52286.2021.9498024.
[6] Prof. S. R. Pandit, Mr. Jayesh Pawar, Mr. Rohit Pawar, Ms. Priyanka Pote, and
Ms. Priyanka Pote, “Sign Language Recognition using Deep Learning,”
International Journal of Advanced Research in Science, Communication and
Technology, 2023, doi: 10.48175/ijarsct-10797.
48
[9] H. Adhikari, M. S. Bin Jahangir, I. Jahan, M. S. Mia, and M. R. Hassan, “A
Sign Language Recognition System for Helping Disabled People,” in 2023 5th
International Conference on Sustainable Technologies for Industry 5.0, STI
2023, Institute of Electrical and Electronics Engineers Inc., 2023. doi:
10.1109/STI59863.2023.10465011.
[10] M. J. Cheok, Z. Omar, and M. H. Jaward, “A review of hand gesture and sign
language recognition techniques,” International Journal of Machine Learning
and Cybernetics, vol. 10, no. 1, pp. 131–153, Jan. 2019, doi: 10.1007/s13042-
017-0705-5.
[14] D. Shah, “Sign Language Recognition for Deaf and Dumb,” Int J Res Appl Sci
Eng Technol, vol. 9, no. 5, 2021, doi: 10.22214/ijraset.2021.34770.
49