0% found this document useful (0 votes)

13 views7 pages

Lipreading Using A Comparative Machine Learning Approach

Paper 2

Uploaded by

shambhuteja27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views7 pages

Lipreading Using A Comparative Machine Learning Approach

Paper 2

Uploaded by

shambhuteja27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Lipreading Using a Comparative Machine Learning

Approach
Ziad Thabet Amr Nabih Karim Azmi
Faculty of Computer Science Faculty of Computer Science Faculty of Computer Science
MISR INTERNATIONAL UNIVERSITY MISR INTERNATIONAL UNIVERSITY MISR INTERNATIONAL UNIVERSITY
Cairo, Egypt Cairo, Egypt Cairo, Egypt
[email protected] [email protected] [email protected]

Youssef Samy Ghada Khoriba Mai Elshehaly

Faculty of Computer Science Faculty of Computers and Information School of Computing
MISR INTERNATIONAL UNIVERSITY Helwan University University of Leeds
Cairo, Egypt Cairo, Egypt Leeds, UK
[email protected] ghada [email protected] [email protected]

Abstract—Lipreading is the process of interpreting spoken The recent advent of novel machine learning and signal
word by observing lip movement. It plays a vital role in human processing approaches have increased researchers’ interest
communication and speech understanding, especially for hearing- in automating the process of lipreading. This attention is
impaired individuals. Automated lipreading approaches have
recently been used in such applications as biometric identifi- motivated by the promising results of lipreading in application
cation, silent dictation, forensic analysis of surveillance camera areas such as human-computer interaction, forensic analysis
capture, and communication with autonomous vehicles. However, of surveillance camera capture, biometric identification, silent
lipreading is a difficult process that poses several challenges to dictation, and autonomous vehicles [1].
human- and machine-based approaches alike. This is due to the However, the recognition of lip motion presents several
large number of phonemes in human language that are visually
represented by a smaller number of lip movements (visemes). challenges to linear classifiers. Mainly because the features
Consequently, the same viseme may be used to represent several used in the classification are calculated from a sequence of
phonemes, which confuses any lipreader. In this paper, we shapes that the lip takes, also known as “visemes”. The number
present a detailed study of the machine learning approach for of visemes that the lip can take is between 10 and 14 [2],
the real-time visual recognition of spoken words. Our focus whereas the number of phonemes (i.e. acoustic sounds) that
on real-time performance is motivated by the recent trend of
using lipreading in autonomous vehicles. In this paper, machine can be produced by these visemes exceeds 50. This mismatch
learning approaches are applied to recognize lip-reading and nine between visual and audio signals creates new horizons in
different classifiers has been implemented and tested, reporting machine learning research. It motivates the quest for improved
their confusion matrices among different groups of words. The visual features and classifiers to bridge the gap between what
classification process went on more than one classifier but these has been spoken and what is visually perceived.
three classifiers got the best results which are GradientBoosting,
Support Vector Machine(SVM) and logistic regression with In this paper, we present LipDrive: a novel system for
results 64.7%, 63.5% and 59.4% respectively. visual speech recognition that targets autonomous vehicles
Index Terms—Lipreading, Classification, Autonomous Vehi- as an application. The focus here is on the application area
cles, Speech Recognition. of autonomous vehicles due to its thriving nature and the
possibilities that lipreading can offer. Human-computer inter-
I. I NTRODUCTION action approach is taken to characterize the challenges and
Lipreading, widely known as visual speech recognition opportunities of lipreading in facilitating the communication
(VSR), is a process that aims to interpret and understand between humans and autonomous vehicles, especially in noisy
spoken words by using only the visual signal produced by car environments. Furthermore, a comparative analysis of
lip movement. Lipreading plays a crucial role in both human- nine different linear classifiers that we tested in LipDrive is
human and human-computer interaction. For example, people presented . Their performance were studied in lipreading using
use lipreading in their daily conversations to understand one raw visual features as well as using a preprocessed feature
another in noisy environments and in situations where the set. Through presenting our experimental results, we aim to
audio speech signal is not readily comprehensible. Therefore, propose a set of guidelines for researchers working in the
the skill of lipreading has long been mastered by individuals area of lipreading that can steer their choice of classification
with hearing impairment. It enables them to understand speech method and preprocessing steps.
and maintain social activities without relying on the perception The main contribution of this paper can be summarized as
of sounds. follows:

978-1-5386-5083-7/18/$31.00 ©2018 IEEE

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on August 22,2024 at 08:20:54 UTC from IEEE Xplore. Restrictions apply.
19
• A novel lipreading system called LipDrive that is to be algorithm passes by two main steps which are features extrac-
deployed in an autonomous vehicle setting tion and classification of the word. The features extraction
• A comparative analysis of nine classifiers for lipreading process passes by five steps: Video Acquisition, Face and
• Experimental results using preprocessed and raw visual Mouth Detection, Intensity Equalization, Keypoint Extraction
features for classification and Geometric Feature extraction. The word classification is
• A set of design guidelines for visual speech recognition done using Learning Vector Quantization neural network.
In Section II, A full description of the state-of-the-art in
lipreading research is provided. The rest of the paper is Lesani et al. [7] introduced a new method for mobile phones
organized as follow, the description of LipDrive system in Sec- security which is lip-authentication. This method could be
tion III. The experimental approach is outlined in Section IV, used in mobile banking application to ensure the security of
and the results of our comparative analysis are presented in customer’s accounts. The mobile phone camera has the ability
Section V. Finally, Section VI offers our concluding remarks to extract the lip movements and send them to lip-reading
and lays the foundation for our future work. algorithms to classify the security word like password.

III. S YSTEM OVERVIEW

II. R ELATED W ORK
In order to reach high accuracy with real-time recognition
Assael et al. [3] showed that lips movements can be of spoken words, LipDrive system constitutes of six different
extracted while speaking and convert those movements into data processing stages. Figure 1 depicts these six stages in the
written text and researchers also showed that the conversion form of a pipeline and a detailed description of each stage is
process could be based on sentence level instead of working on described in this section.
word level. There were different problems cited by researchers
during there experiments such as designing and learning the
facial features, and the prediction of sentence itself. They
worked on different deep learning approaches to extract the
lips movements and to classify the spoken word.

Chung et al. [4] showed that lip recognition systems have

the ability to understand spoken words using only visual
features and those systems could help in recognizing the
spoken words in corrupted videos without their audio files.
Researchers were aiming to build a system that read lips Fig. 1. System Overview
independently. Researchers had collected a large dataset from
TV broadcasts and they built a deep learning architectures that A. Image Acquisition
effectively learn and recognize hundreds of words. The image acquisition stage receives a raw video as in-
put. This video captures a spoken work within a specific
Garg et al. [5] have discussed different methods for words environment. This stage aims to create a sequence of frames
and phrases prediction from videos without their audio files or images from the captured video and to reduce the effect
and also they have discussed that the process of visual lip- of environmental factors on the quality of the frames in the
reading is important in Human computer interaction and it sequence.
can replace the audio speech recognition technology as it The captured video is first sliced into individual frames
may be difficult in noisy environments and the variation of using openCV Python Library. Next, the acquired frames are
inputs as different people speak different accents. Researchers converted to gray scale using openCV Python Library. The
have concatenated a fixed number of images on the pre- resulting frames are then passed along to the next stage for
trained VGGnet model, they have used the nearest neighbor feature extraction.
interpolation to normalize the number of images per sequence
and they have fed to LSTM and RNN the extracted features B. Feature Extraction
by VGGnet model to classify the word. The goal of the feature extraction stage is to reduce the size
of the images that are received from the acquisition stage. This
Rathee [6] has defined lip-reading as the recognition of lip step is motivated by the famous curse of dimensionality in
movements patterns while speaking. The author also added machine learning [8]. Namely, if we were to use the original
that the visual speech recognition has motivated researchers images as input to the classifier, each pixel would represent a
towards lip-reading. The author has mentioned that speech feature. The reliability and efficiency of reading the lips from
recognition systems are facing a major problem due to noisy the images recieved is vital, because the number of pixels is
environments and added that lip-reading can help hearing typically large and varies with image resolution and camera
impaired or dumb people to communicate normally with other quality. Furthermore, the majority of the captured pixels are
people. An automated lip-reading is proposed, the proposed irrelevant to the classifier.

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on August 22,2024 at 08:20:54 UTC from IEEE Xplore. Restrictions apply.
20
Therefore, instead of passing videos and pictures, we extract Where,
only the needed features from the videos. This is realized θ
through passing the gray scale images to the features extraction R=
XM ax − px
stage using ”DLib” which is a modern C++ library that
H = YM ax − py
implements a multitude of machine learning algorithms [9].
“Shape Predictor 68 Face Landmarks” is used to detect the D. Concatenation
human face in images and to extract the 68 landmarks of the Individual frames are passed through the face detection,
face. These landmarks represent points on the mouth, nose, feature extraction, cropping and normalizing processes de-
eyes and so forth as shown in Figure 2. scribed above. However, since classifying a word based on
Furthermore, the number of landmarks is reduced to twenty individual frames is rarely ever the case, we concatenate the
points from each frame, that represents the features of the lips, frames back to form a sequence of feature vectors (Figure 4).
as shown in Figure 3. These points of each frame are then This process creates a training dataset that has the sequence
translated to the Z-order, by calculating a z-value that has the of feature vectors as input and the spoken word as class label.
ability to translate a 2D point (x, y) to a one-dimensional For example, if the word ”ABOUT” is captured in 10 frames,
value. This value is calculated by interleaving the binary each of which contains 20 features, this will lead 200 features
representations of its coordinate values. that produces the sequence for that word.

Fig. 2. Face Detection using DLib

Fig. 4. Frames Concatenation

E. Training and Validation

In this stage, sequences of feature vectors are taken for a
number of words. We strive to create a large enough training
dataset, so for each word we consider a number of videos that
capture the same word as spoken by different individuals. The
feature vector sequences for a set of words is then fed into the
Fig. 3. Lip Feature Extraction classifier for training and model generation purposes. Next, a
different dataset is fed to the classifier for validation.
C. Cropping
F. Classification
Taking into consideration the various positions of the user Once a classifier model is built and validated during the
in front of the camera, which yields different positions of the previous stage, we get to the point of real time lipreading. In
speaker’s facial landmarks, we crop the image to the mouth this stage, we extract the same features from the user’s face.
level to reduce the environmental variability in the extracted We note here that videos are captured via a portable device to
features. In addition, the distance between the speaker’s face be streamed continuously to the server for feature extraction.
and the camera could vary from one speaker to another, which The extracted features are then fed to the classifier model, so
makes the features extracted ambiguous at times. In order to the classifier can predict the spoken word which then translated
unify it, all images are normalized to the same width and to a command. Finally, the server will respond by executing
relevant height to its ratio. Equation 1 defines the calculation the command intended.
of a normalized point (p∗x , p∗y ) from each point (px , py ) on the
lips, in which a normalization scale is defined as θ. IV. D ESIGN OF E XPERIMENTS
The six-stage approach is tested with ten different clas-
(p∗x , p∗y ) = ((px × R), (py × (H × R)/H)) (1) sifiers. The purpose of our experiments is to gain a deeper

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on August 22,2024 at 08:20:54 UTC from IEEE Xplore. Restrictions apply.
21
understanding of the strengths and weaknesses of the classi-
ﬁers while attempting to classify different words with varying
visual and phonetic similarities.

A. Dataset

In order to cover a breadth of training words and to have a

large dictionary, we have to work on a large-scale dataset. In
our experiments, we used a benchmark dataset that consists of
about one million instances of 500 different words of different
speakers [10]. Each word is consisted of 1000 training videos,
50-100 validation videos and 50-100 testing videos. All videos
have the length of 29-frames with duration of 1.16 seconds,
the spoken word exists in the middle of the video. In addition,
the speakers’ position are not the ﬁxed, meaning that their face
is not always looking directly to the camera, some are talking
facing to someone next to them or their faces are far from the Fig. 5. Naive Bayes’ Confusion Matrix
camera which makes it more challenging.

2) Experiment 2 - Quadratic Discriminant Analysis (QDA):

B. Procedure We used Quadratic Discriminant Analysis that fits class den-
sities to the data and based on Bayes theorem and we got the
accuracy of 32.3%.
Using our large-scale dataset, composed of sequences of
vectors and labels, we tested different classifiers in order to Figure 6 depicts the confusion matrix for this experiment.
fit the data. After extracting lips features, we passed through
different stages. First, we used 5 words from our training
dataset to feed each classifier, those words are ”ABOUT”,
”AROUND”, ”ATTACK”, ”BENEFITS” and ”BETWEEN”.
These words were chosen due to their visual and phonetic
similarity. For example, the similarity between ”ABOUT” and
”AROUND”, they have almost the same lips movements which
makes it more challenging for the classifiers to detect them.
Second, we predicted different sequences for the same 5 words
but from our testing dataset. Third, we calculated the accuracy
of each classifier using predefined scoring functions. Fourth,
we visualized each classifiers’ confusion matrix.

V. E XPERIMENTAL R ESULTS

In this section, the achieved accuracy from each classiﬁer

is reported and visualized the confusion matrix resulting from
each, and discussing some general guidelines based on our
ﬁndings. Fig. 6. Quadratic Discriminant Analysis’s Confusion Matrix

1) Experiment 1 - Naive Bayes (NB): We used Naive

Bayes classifier that is based on Bayes’ theorem for objects
classification and we got the accuracy of 26.6%. Naive Bayes’.
3) Experiment 3 - SGDClassifier: We used SGDClassifier
Figure 5 depicts the confusion matrix for this experiment as
and we got the accuracy of 45.9%.
we can see the naive bayes is very weak when its dealing with
large number of features it. Figure 7 depicts the confusion matrix for this experiment.

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on August 22,2024 at 08:20:54 UTC from IEEE Xplore. Restrictions apply.
22
Fig. 7. SGDClassiﬁer’s Confusion Matrix Fig. 9. AdaBoost Classiﬁer’s Confusion Matrix

4) Experiment 4 - Multi-Layer Perceptron Classifier (MLP): 6) Experiment 6 - Linear Discriminant Analysis Classifier
We used Multi-Layer Perceptron which is a neural net- (LDA): We used Linear Discriminant Analysis classifier that
work classifier and that optimizes the log-loss function using fits class densities to the data and based on Bayes theorem.
LBFGS. We got the accuracy of 48.3%. We got the accuracy of 56.1%.
Figure 8 depicts the confusion matrix for this experiment. Figure 10 depicts the confusion matrix for this experiment.

Fig. 8. Multi-Layer Perceptron Classiﬁer’s Confusion Matrix Fig. 10. Linear Discriminant Analysis Classiﬁer’s Confusion Matrix

5) Experiment 5 - AdaBoost Classifier: We used AdaBoost 7) Experiment 7 - Logistic Regression Classifier (LR): We
classifier that fits the model with the training dataset and then used Logistic Regression classifier that analyzes independent
fits the model with additional copies of the pre-trained model. variables to determine an outcome. We got the accuracy of
We got the accuracy of 54.5%. 59.4%.
Figure 9 depicts the confusion matrix for this experiment. Figure 11 depicts the confusion matrix for this experiment.

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD.23
Downloaded on August 22,2024 at 08:20:54 UTC from IEEE Xplore. Restrictions apply.
Fig. 11. Logistic Regression Classiﬁer’s Confusion Matrix Fig. 13. Gradient Boosting Classiﬁer’s Confusion Matrix

A. Discussion of Results
8) Experiment 8 - Support Vector Machine Classifier The results have showed that most of the classifiers get con-
(SVM): We used Support Vector Machine classifier that an- fused between word ”About” and ”Between” and the accuracy
alyzes data for classification and regression analysis. We got between the classifiers are almost close to each others, this is
the accuracy of 63.5%. due to the small number of words being trained. However, on
Figure 12 depicts the confusion matrix for this experiment. increasing the number of words needed to be trained, the linear
classifiers’ accuracy starts to decrease directly proportional by
increasing the words. Thus, using neural networks classifiers
is essential for large scale dataset and this was clear when we
started to use this large data on MLP classifier. Meanwhile, the
usage of CNN is recommended to be used in order to have
promising results. In addition, to ensure high accuracy with
real-time processing, we recommend to use RNN and LSTM
classifiers.

VI. C ONCLUSION AND FUTURE WORK

Lip-Reading is a new way for enhancing speech recognition

however, there are some constraints to reach this accuracy.
One of these constraints is the variant light conditions that the
camera could face, the lip-reading process is mainly conducted
under the ideal lighting conditions. In addition, the position of
the speaker’s face to the camera matters, the speaker has to
look directly to the camera in order to ensure clear detection.
Not only the position of the speaker, but also the distance
between the speaker and the camera has to be near enough to
Fig. 12. Support Vector Machine’s Confusion Matrix detect the lips clearly. Speaker should consider not to be far to
deliver the command. Furthermore, working on the phoneme
level would widen the words being detected and would make
it more easier. We also believe that using audio-visual methods
9) Experiment 9 - Gradient Boosting Classifier: We used would increase the accuracy, meaning that we can depend on
Gradient Boosting classifier that produces prediction model in both the sound and lips recognition to ensure high accuracy.
the form of an ensemble of weak prediction models, typically By using new datasets like extracting features using DWT may
decision trees. We got the accuracy of 64.7%. give a huge boost to the accuracy of classifiers since it reduces
Figure 13 depicts the confusion matrix for this experiment. number of dimensions that is being processed.

24Downloaded on August 22,2024 at 08:20:54 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD.
R EFERENCES
[1] A. Hassanat, “Visual speech recognition,” arXiv preprint
arXiv:1409.1411, 2014.
[2] Y. Lan, B.-J. Theobald, R. Harvey, E.-J. Ong, and R. Bowden, “Improv-
ing visual features for lip-reading,” in Auditory-Visual Speech Processing
2010, 2010.
[3] Y. M. Assael, B. Shillingford, S. Whiteson, and N. de Freitas, “Lipnet:
end-to-end sentence-level lipreading,” 2016.
[4] J. S. Chung and A. Zisserman, “Lip reading in the wild,” in Asian
Conference on Computer Vision, pp. 87–103, Springer, 2016.
[5] M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, and S. Liu, “Towards better
analysis of deep convolutional neural networks,” IEEE transactions on
visualization and computer graphics, vol. 23, no. 1, pp. 91–100, 2017.
[6] N. Rathee, “A novel approach for lip reading based on neural network,”
in Computational Techniques in Information and Communication Tech-
nologies (ICCTICT), 2016 International Conference on, pp. 421–426,
IEEE, 2016.
[7] F. S. Lesani, F. F. Ghazvini, and R. Dianat, “Mobile phone security
using automatic lip reading,” in e-Commerce in Developing Countries:
With focus on e-Business (ECDC), 2015 9th International Conference
on, pp. 1–5, IEEE, 2015.
[8] P. Domingos, “A few useful things to know about machine learning,”
Communications of the ACM, vol. 55, no. 10, pp. 78–87, 2012.
[9] R.-L. Hsu, M. Abdel-Mottaleb, and A. K. Jain, “Face detection in
color images,” IEEE transactions on pattern analysis and machine
intelligence, vol. 24, no. 5, pp. 696–706, 2002.
[10] J. S. Chung and A. Zisserman, “Lip reading in the wild,” in Asian
Conference on Computer Vision, 2016.

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on August 22,2024 at 08:20:54 UTC from IEEE Xplore. Restrictions apply.
25

Internal Combustion Engine Fundamentals 2nd Edition
94% (17)
Internal Combustion Engine Fundamentals 2nd Edition
426 pages
S14 Zenki ECU Pinout Guide
No ratings yet
S14 Zenki ECU Pinout Guide
1 page
Introduction C
100% (1)
Introduction C
28 pages
Parental Personality and Parenting Style
No ratings yet
Parental Personality and Parenting Style
13 pages
Elevator Installation Safety Guide
No ratings yet
Elevator Installation Safety Guide
30 pages
Lip Reading Using CNN and LTSM
No ratings yet
Lip Reading Using CNN and LTSM
9 pages
Problems and Solutions - C4
83% (6)
Problems and Solutions - C4
25 pages
Ccnet 10f Lec02 ch2
No ratings yet
Ccnet 10f Lec02 ch2
42 pages
Lip Reading
No ratings yet
Lip Reading
4 pages
Deep Audio-Visual Speech Recognition
No ratings yet
Deep Audio-Visual Speech Recognition
13 pages
Prajwal Sub-Word Level Lip Reading With Visual Attention CVPR 2022 Paper
No ratings yet
Prajwal Sub-Word Level Lip Reading With Visual Attention CVPR 2022 Paper
11 pages
Lip Reading with CNN for Noisy Environments
No ratings yet
Lip Reading with CNN for Noisy Environments
5 pages
Carbon Black Surface Area Analysis
No ratings yet
Carbon Black Surface Area Analysis
39 pages
Ps 6 - Material Balance With Chemical Reactions
No ratings yet
Ps 6 - Material Balance With Chemical Reactions
4 pages
Pae Receiver Type t6r Maintenance Handbook
80% (5)
Pae Receiver Type t6r Maintenance Handbook
80 pages
TENARIS Pipes-For-Civil-Industrial-Installation
No ratings yet
TENARIS Pipes-For-Civil-Industrial-Installation
28 pages
Lip Reading Word Classification: Abiel Gutierrez Stanford University Zoe-Alanah Robert Stanford University
No ratings yet
Lip Reading Word Classification: Abiel Gutierrez Stanford University Zoe-Alanah Robert Stanford University
9 pages
Assignment 1 CHE544 20232
No ratings yet
Assignment 1 CHE544 20232
5 pages
2 T24Updates
No ratings yet
2 T24Updates
24 pages
Paper 28
No ratings yet
Paper 28
6 pages
Tyco Drenchers - TFP807 - 07 - 2014
100% (1)
Tyco Drenchers - TFP807 - 07 - 2014
14 pages
LRW-1000: A Naturally-Distributed Large-Scale Benchmark For Lip Reading in The Wild
No ratings yet
LRW-1000: A Naturally-Distributed Large-Scale Benchmark For Lip Reading in The Wild
8 pages
A Novel Machine Lip Reading Model A Novel Machine Lip Reading Model
No ratings yet
A Novel Machine Lip Reading Model A Novel Machine Lip Reading Model
6 pages
Lip Reading With Hahn Convolutional Neural Networks
No ratings yet
Lip Reading With Hahn Convolutional Neural Networks
28 pages
Lip-Reading Dataset Construction
No ratings yet
Lip-Reading Dataset Construction
6 pages
Chung 18
No ratings yet
Chung 18
28 pages
Lip Reading via Mutual Information Maximization
No ratings yet
Lip Reading via Mutual Information Maximization
8 pages
Deep Learning for Visual Lip Reading
No ratings yet
Deep Learning for Visual Lip Reading
15 pages
Lip Reading Using External Viseme Decoding: 1 Javad Peymanfard 2 Mohammad Reza Mohammadi 3 Hossein Zeinali
No ratings yet
Lip Reading Using External Viseme Decoding: 1 Javad Peymanfard 2 Mohammad Reza Mohammadi 3 Hossein Zeinali
5 pages
Deep Learning for Lip Reading
No ratings yet
Deep Learning for Lip Reading
5 pages
Deep Learning For Lip Reading Using Audio-Visual Information For Urdu Language
No ratings yet
Deep Learning For Lip Reading Using Audio-Visual Information For Urdu Language
5 pages
A Lip Reading Method Based On 3D Convolutional Vision Transformer
No ratings yet
A Lip Reading Method Based On 3D Convolutional Vision Transformer
8 pages
Icassp19 Zhoupan
No ratings yet
Icassp19 Zhoupan
5 pages
ANN Paper
No ratings yet
ANN Paper
6 pages
Engineering Science and Technology, An International Journal
No ratings yet
Engineering Science and Technology, An International Journal
10 pages
Lipx
No ratings yet
Lipx
9 pages
Developing Phoneme Based Lip Reading Sentences System For Silent Speech Recognition
No ratings yet
Developing Phoneme Based Lip Reading Sentences System For Silent Speech Recognition
10 pages
2001 08702v1
No ratings yet
2001 08702v1
6 pages
Lipsound2: Self-Supervised Pre-Training For Lip-To-Speech Reconstruction and Lip Reading
No ratings yet
Lipsound2: Self-Supervised Pre-Training For Lip-To-Speech Reconstruction and Lip Reading
11 pages
External Reciprocating Steam Engine
No ratings yet
External Reciprocating Steam Engine
8 pages
Deformation Flow Based Two-Stream Network For Lip Reading
No ratings yet
Deformation Flow Based Two-Stream Network For Lip Reading
7 pages
Multi-Grained Spatio-Temporal Modeling For Lip-Reading: Wangchenhao17@mails - Ucas.ac - CN
No ratings yet
Multi-Grained Spatio-Temporal Modeling For Lip-Reading: Wangchenhao17@mails - Ucas.ac - CN
11 pages
DL Review
No ratings yet
DL Review
4 pages
Deep Learning Lip-Reading Survey
No ratings yet
Deep Learning Lip-Reading Survey
22 pages
Deep Learning Lip Reading Model
No ratings yet
Deep Learning Lip Reading Model
6 pages
Qalambartar (QB) For Windows and Mac: 10, 2 M Flower @
No ratings yet
Qalambartar (QB) For Windows and Mac: 10, 2 M Flower @
3 pages
Pavement Engineering Solutions
No ratings yet
Pavement Engineering Solutions
1 page
Deep Learning Lip-Reading Survey
No ratings yet
Deep Learning Lip-Reading Survey
8 pages
Analysis of Lip-Reading Using Deep Learning Techniques A Review
No ratings yet
Analysis of Lip-Reading Using Deep Learning Techniques A Review
6 pages
Developing Phoneme-Based Lip-Reading Sentences System For Silent Speech Recognition
No ratings yet
Developing Phoneme-Based Lip-Reading Sentences System For Silent Speech Recognition
10 pages
Analyzing Lower Half Facial Gestures For Lip Reading Applications Survey On Vision Techniques
No ratings yet
Analyzing Lower Half Facial Gestures For Lip Reading Applications Survey On Vision Techniques
45 pages
Dolder 1961
No ratings yet
Dolder 1961
19 pages
LipReadNet: Advancing Lip Reading
No ratings yet
LipReadNet: Advancing Lip Reading
6 pages
1 s2.0 S2666764923000450 Main
No ratings yet
1 s2.0 S2666764923000450 Main
10 pages
Vision Based Lip Reading System Using Deep Learning: July 2022
No ratings yet
Vision Based Lip Reading System Using Deep Learning: July 2022
7 pages
Toward Language-Independent Lip Reading A Transfer Learning Approach
No ratings yet
Toward Language-Independent Lip Reading A Transfer Learning Approach
4 pages
Grade 7 Science: Heat & Energy
No ratings yet
Grade 7 Science: Heat & Energy
9 pages
Act. 2 - Micropipetting Techni
No ratings yet
Act. 2 - Micropipetting Techni
29 pages
Transformers Noise Questions and Answers - Sanfoundry
No ratings yet
Transformers Noise Questions and Answers - Sanfoundry
9 pages
Extraction of Visual Features For Lipreading
No ratings yet
Extraction of Visual Features For Lipreading
16 pages
ICDE 2024 Managing The Future Route Planning Influence Evaluation in Transportation Systems
No ratings yet
ICDE 2024 Managing The Future Route Planning Influence Evaluation in Transportation Systems
15 pages
Lip Decoder
No ratings yet
Lip Decoder
11 pages
Silent Speech Interpretation Using Ai: Dr.M. Hemalatha, M. Akshayaa
No ratings yet
Silent Speech Interpretation Using Ai: Dr.M. Hemalatha, M. Akshayaa
3 pages
Authnet: A Deep Learning Based Authentication Mechanism Using Temporal Facial Feature Movements
No ratings yet
Authnet: A Deep Learning Based Authentication Mechanism Using Temporal Facial Feature Movements
7 pages
NMITCON 2024 Submitted 23 24 B27 Lip Reading Using Deep Learning Project Paper
No ratings yet
NMITCON 2024 Submitted 23 24 B27 Lip Reading Using Deep Learning Project Paper
8 pages
Ashrith Miniproject 2
No ratings yet
Ashrith Miniproject 2
11 pages
ANN Paper
No ratings yet
ANN Paper
7 pages
2 Base
No ratings yet
2 Base
5 pages
Batch A3
No ratings yet
Batch A3
7 pages
Proplem Chapter 2.pdf - 2023.02.03 - 12.38.41pm
No ratings yet
Proplem Chapter 2.pdf - 2023.02.03 - 12.38.41pm
7 pages
584 Camera Ready
No ratings yet
584 Camera Ready
6 pages
Lip Reading Using Deep Learning in Turkish Language
No ratings yet
Lip Reading Using Deep Learning in Turkish Language
12 pages
Cep Report
No ratings yet
Cep Report
21 pages
SpeakVision: A Comprehensive Survey On End-to-End Sentence Level Lipreading
No ratings yet
SpeakVision: A Comprehensive Survey On End-to-End Sentence Level Lipreading
4 pages
1LE2321-1CA11-4GA3 Datasheet en
No ratings yet
1LE2321-1CA11-4GA3 Datasheet en
1 page
University Semester Practical Exam Schedule NOv-Dec 2024 - 3 - 5 - Semester
No ratings yet
University Semester Practical Exam Schedule NOv-Dec 2024 - 3 - 5 - Semester
6 pages
Project Report (Org) 4
No ratings yet
Project Report (Org) 4
49 pages
An Introduction To Role Provisioning and De-Provisioning in Oracle Fusion HCM Cloud Application
No ratings yet
An Introduction To Role Provisioning and De-Provisioning in Oracle Fusion HCM Cloud Application
6 pages
Ultra Sensitive TSH Test Report
No ratings yet
Ultra Sensitive TSH Test Report
1 page
Hybrid Attention Mechanisms in 3D CNN For Noise-Resilient Lip Reading in Complex Environments
No ratings yet
Hybrid Attention Mechanisms in 3D CNN For Noise-Resilient Lip Reading in Complex Environments
11 pages
Continuous Lipreading Based On Acoustic Temporal Alignments: Empirical Research Open Access
No ratings yet
Continuous Lipreading Based On Acoustic Temporal Alignments: Empirical Research Open Access
15 pages
Class12 CS Practical File Slides Guidelines
No ratings yet
Class12 CS Practical File Slides Guidelines
12 pages
Q1 LE Mathematics-8 Lesson-2 Week-2
No ratings yet
Q1 LE Mathematics-8 Lesson-2 Week-2
25 pages
Decoder-Encoder LSTM For Lip Reading: Souheil Fenghour Daqing Chen Perry Xiao
No ratings yet
Decoder-Encoder LSTM For Lip Reading: Souheil Fenghour Daqing Chen Perry Xiao
5 pages
23MCI10142, 23MCI10007 - Project Report
No ratings yet
23MCI10142, 23MCI10007 - Project Report
38 pages
2.1 s2.0 S0925231225009610 Main
No ratings yet
2.1 s2.0 S0925231225009610 Main
10 pages
Review I - Documentation Format
No ratings yet
Review I - Documentation Format
20 pages
Second Paper
No ratings yet
Second Paper
7 pages

Lipreading Using A Comparative Machine Learning Approach

Uploaded by

Lipreading Using A Comparative Machine Learning Approach

Uploaded by

Lipreading Using a Comparative Machine Learning

Youssef Samy Ghada Khoriba Mai Elshehaly

978-1-5386-5083-7/18/$31.00 ©2018 IEEE

III. S YSTEM OVERVIEW

Chung et al. [4] showed that lip recognition systems have

Fig. 2. Face Detection using DLib

E. Training and Validation

In order to cover a breadth of training words and to have a

2) Experiment 2 - Quadratic Discriminant Analysis (QDA):

In this section, the achieved accuracy from each classiﬁer

1) Experiment 1 - Naive Bayes (NB): We used Naive

VI. C ONCLUSION AND FUTURE WORK

Lip-Reading is a new way for enhancing speech recognition

You might also like