Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
30 views6 pages

Automatic Speech Recognition Using Deep Neural Networks

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views6 pages

Automatic Speech Recognition Using Deep Neural Networks

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

International Journal of Scientific Research in Engineering and Management (IJSREM)

Volume: 07 Issue: 12 | December - 2023 SJIF Rating: 8.176 ISSN: 2582-3930

AUTOMATIC SPEECH RECOGNITION USING


DEEP NEURAL NETWORKS

1st Shiwani Singhal 2nd Muskan Deswal 3rd Er. Shafalii Sharma
Computer Science Engineering Computer Science Engineering Computer Science Engineering
Chandigarh University Chandigarh University Chandigarh University
Mohali, Punjab Mohali, Punjab Mohali, Punjab
[email protected] [email protected] [email protected]

Abstract - This research concentrates on the tools for those with hearing impairments. ASR
progression and present status of automatic models have extensive applications across
speech recognition systems powered by deep various sectors, including healthcare,
neural networks. It discusses model entertainment, and many more.
architectures, training approaches, evaluating
model efficacy, and recent advancements The core objective of research in Automatic
specific to deep neural networks applied in Speech Recognition utilizing Deep Neural
automatic speech recognition models. It Networks is to enhance the accuracy, efficiency,
considers the challenges faced in crafting and dependability of speech recognition models.
these speech recognition models, such as data Constructing deep neural network architectures
scarcity and the necessity for adaptability. aids in accurately capturing intricate speech
Our exploration traces the evolution of patterns, contextual cues, noise reduction, and
automatic speech recognition through deep subtleties.
neural networks, presenting valuable insights
aimed at propelling the domain of speech
recognition for diverse applications, spanning II. LITERATURE REVIEW
from smart devices to healthcare.
Several research initiatives investigating
KEYWORDS - Automatic Speech automatic speech recognition through deep
Recognition, Deep Neural Networks, neural networks have notably progressed
Language Modeling, Robustness to Noise, communication and human-machine interaction.
Speech Modelling Prior to commencing this ASR study utilizing
DNNs, extensive reviews of the literature were
conducted, encompassing the developments,
I. INTRODUCTION methodologies, challenges, and future
trajectories within this domain. The evolution of
The Automatic Speech Recognition ASR has seen a significant transition from rule-
model plays a pivotal role in converting spoken based models to statistical methods and the
language to text and enabling seamless integration of neural networks. Ensuring the
interactions between humans and machines. This accuracy of models, gauged through word and
evolution has revolutionized communication character error rates, remains a critical focus. An
systems by empowering voice-controlled end-to-end methodology utilizing recurrent
devices, language translation, and accessibility neural networks and attention mechanisms has

© 2023, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM27313 | Page 1


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 07 Issue: 12 | December - 2023 SJIF Rating: 8.176 ISSN: 2582-3930

has been introduced for automatic speech model, a language model, and a decoding
recognition. The LAS model has significantly algorithm.
influenced the evolution of speech recognition
models. Progress in natural language processing A. Input Representation:
and diverse fields has substantially contributed
to advancements in machine translations. Let X,

III. CHARACTERSTICS OF ASR


be the input speech signal, where xt is the feature
Automatic speech recognition models vector at time t.
have three main dimensions for characterization:
dependence, vocabulary semantics, and speech B. Feature Extraction:
continuity. They can either be speaker-
dependent, necessitating training for each Extract the features from the raw signal,
speaker or speaker-independent, using various e.g., using Mel-Frequency Cepstral Coefficients.
speech examples to recognize new speakers. In Let
terms of speech continuity, there are four
different types of systems.

be the feature sequence.

Figure: Structure of ASR system

This comprises isolated word recognition


systems, connected word recognition systems,
continuous speech recognition systems, and
word spotting systems. Automatic speech
recognition models can face different types of
errors, such as insertion errors, substitution
C. Neural Network Architecture:
errors, and deletion errors.
Define a deep neural network with L
layers. Let,
IV. RESEARCH METHODOLOGIES

Creating a mathematical model for an


Automatic Speech Recognition (ASR) system
using Deep Neural Networks (DNNs) involves represent the weights and biases of layer l,
fundamental components. ASR systems respectively.
typically consist of three key parts: an acoustic

© 2023, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM27313 | Page 2


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 07 Issue: 12 | December - 2023 SJIF Rating: 8.176 ISSN: 2582-3930

The output of layer l is given by

G. Inference:

When provided with a new input


sequence X, the ASR system utilizes the trained
where, g is the activation function.
neural network to predict the sequence of units.
D. Input Layer:
H. Decoding:
The feature sequence is taken as the
A decoding algorithm is employed to
input layer F(X) as its input.
convert the sequence of predicted units into the
E. Output Layer: final recognized text.

The output layer generates probabilities


representing the likelihood of each phoneme or V. APPLICATIONS OF ASR
subword unit, known as posterior probabilities.
If there are N units, the output is A. Voice assistants:
Voice assistants employ automatic
speech recognition to convert spoken language
into text, enabling interaction with machines
through voice commands. Deep neural networks
where yi is the probability of unit i. play a crucial role in enhancing the accuracy of
voice recognition and understanding natural
speech patterns.

B. Call centres:
In call centres, automatic speech
recognition is used to boost customer service
quality and operational efficiency. It automates
the transcription of conversations between
customers and agents, enabling prompt service
delivery.

F. Training: C. Language translations:


The ASR model converts spoken
Define a training dataset
language into text and utilizes deep neural
network architectures to enhance the text for
. translation. Machine translation models are then
applied to interpret speech in various languages.
Minimize the cross-entropy loss function: This cooperation between ASR and DNNs
allows for immediate translation.

where yj(i) is the predicted probability.

© 2023, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM27313 | Page 3


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 07 Issue: 12 | December - 2023 SJIF Rating: 8.176 ISSN: 2582-3930

VI. CHALLENGES FACED

A. Data scarcity:
The restricted data accessible for
automatic speech recognition models creates a
challenge due to the lack of diverse training
data. This limitation impacts the system's
accuracy in precisely transcribing speech into
text, especially when handling various accents,
languages, and speaking styles.
Figure 7.1 The STT demonstration
B. Robustness to noise:
In speech processing, maintaining Enhancements in deep neural network structures
accuracy despite background noise or have enabled their capacity to manage vast
disruptions signifies the system's robustness. datasets and intricate tasks, thus improving the
Achieving this involves employing methods like system's scalability. Furthermore, there have
noise reduction and robust feature extraction to been advancements in DNN architectures and
enhance the precision and accuracy of speech methodologies. These DNN-based ASR models
recognition in noisy environments. find utility across sectors like healthcare,
education, and smart devices.
C. Model complexity:
The size and complexity of neural
network architectures play a role in the intricacy VIII. FUTURE DIRECTIONS
and evolution of model development and
implementation. More intricate models often In the domain of automatic speech
encompass a higher number of parameters and recognition using deep neural networks, there's
intricate components, potentially necessitating extensive exploration yet to be done. Data
larger datasets and heightened computational augmentation plays a vital role, especially in
resources. low-resource ASR situations characterized by
limited training data. Techniques such as
introducing noise, adjusting speed, or creating
VII. RESULTS OF THE RESEARCH synthetic data significantly contribute to
enhancing ASR models. It is crucial to
Progress in automatic speech underscore the significance of robustness and
recognition has significantly boosted the effective noise management within the realm of
accuracy and robustness of DNN-driven ASR automatic speech recognition. A robust ASR
models. These models have showcased better model must possess the ability to precisely
word error rates and character error rates than transcribe speech, even when confronted with
conventional systems. noisy environments.

© 2023, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM27313 | Page 4


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 07 Issue: 12 | December - 2023 SJIF Rating: 8.176 ISSN: 2582-3930

IX. CONCLUSION IberSPEECH 2016, Lisbon, Portugal, November


23-25, 2016, Proceedings 3, pp. 97-107. Springer
In conclusion, this research paper International Publishing, 2016.
thoroughly investigates automatic speech [6] Fantaye, T.G., Yu, J. and Hailu, T.T., 2020.
Investigation of automatic speech recognition
recognition systems utilizing deep neural
systems via the multilingual deep neural network
networks. It explores the transition from
modelling methods for a very low-resource
traditional ASR approaches to the substantial language, Chaha. Journal of Signal and
impact of DNN-based models, marking a Information Processing, 11(1), pp.1-21.
transformative shift in speech recognition. The [7] Fendji, J.L.K.E., Tala, D.C., Yenke, B.O. and
analysis of architectural paradigms and training Atemkeng, M., 2022. Automatic speech
strategies highlights significant improvements in recognition using limited vocabulary: A survey.
the accuracy and adaptability of ASR models. Applied Artificial Intelligence, 36(1), p.2095039.
Despite these advancements, challenges like [8] Gulati, A., Qin, J., Chiu, C.C., Parmar, N.,
Zhang, Y., Yu, J., Han, W., Wang, S., Zhang, Z.,
scarce data, noise resilience, and speaker
Wu, Y. and Pang, R., 2020. Conformer:
variability are recognized.
Convolution-augmented transformer for speech
recognition. arXiv preprint arXiv:2005.08100.
[9] Han, K., He, Y., Bagchi, D., Fosler-Lussier, E.
REFERENCES and Wang, D., 2015. Deep neural network-based
spectral feature mapping for robust speech
[1] Amodei, D., Ananthanarayanan, S., Anubhai, R., recognition. At the Sixteenth annual conference
Bai, J., Battenberg, E., Case, C., Casper, J., of the International Speech Communication
Catanzaro, B., Cheng, Q., Chen, G. and Chen, J., Association.
2016, June. Deep speech 2: End-to-end speech [10] Iosifova, O., Iosifov, I., Sokolov, V.Y.,
recognition in English and Mandarin. In Romanovskyi, O. and Sukaylo, I., 2021.
International conference on machine learning Analysis of automatic speech recognition
(pp. 173-182). PMLR methods. Cybersecurity Providing in Information
[2] Chan, W., Jaitly, N., Le, Q. and Vinyals, O., and Telecommunication Systems, 2923, pp.252-
2016, March. Listen, attend and spell: A neural 257.
network for large vocabulary conversational [11] Mukhamadiyev, A., Khujayarov, I., Djuraev, O.
speech recognition. In 2016 IEEE international and Cho, J., 2022. Automatic speech recognition
conference on acoustics, speech and signal method based on deep learning approaches for
processing (ICASSP) (pp. 4960-4964). IEEE. Uzbek language. Sensors, 22(10), p.3683.
[3] Cui, X., Goel, V. and Kingsbury, B., 2015. Data [12] Nassif, A.B., Shahin, I., Attili, I., Azzeh, M. and
augmentation for deep neural network acoustic Shaalan, K., 2019. Speech recognition using
modelling. IEEE/ACM Transactions on Audio, deep neural networks: A systematic review.
Speech, and Language Processing, 23(9), IEEE Access, 7, pp.19143-19165.
pp.1469-1477. [13] Palaz, D. and Collobert, R., 2015. Analysis of
[4] Du, J., Wang, Q., Gao, T., Xu, Y., Dai, L.R. and CNN-based speech recognition system using raw
Lee, C.H., 2014. Robust speech recognition with speech as input (No. REP_WORK). Idiap.
speech-enhanced deep neural networks. In the [14] Pardede, H.F., Yuliani, A.R. and Sustika, R.,
Fifteenth annual conference of the International 2018. Convolutional neural network and feature
Speech Communication Association. transformation for distant speech recognition.
[5] Espana-Bonet, Cristina, and José AR Fonollosa. International Journal of Electrical and Computer
"Automatic speech recognition with deep neural Engineering, 8(6), p.5381.
networks for impaired speech." In Advances in [15] Qian, Y., Bi, M., Tan, T. and Yu, K., 2016. Very
Speech and Language Technologies for Iberian deep convolutional neural networks for noise
Languages: Third International Conference, robust speech recognition. IEEE/ACM

© 2023, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM27313 | Page 5


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 07 Issue: 12 | December - 2023 SJIF Rating: 8.176 ISSN: 2582-3930

Transactions on Audio, Speech, and Language [25] Yu, D., Siniscalchi, S.M., Deng, L. and Lee,
Processing, 24(12), pp.2263-2276. C.H., 2012, March. Boosting attribute and phone
[16] Sarma, M., 2017. Speech recognition using deep estimation accuracies with deep neural networks
neural network trends. International Journal of for detection-based speech recognition. In 2012
Intelligent Systems Design and Computing, 1(1- IEEE International Conference on Acoustics,
2), pp.71-86. Speech and Signal Processing (ICASSP) (pp.
[17] Sim, K.C., Qian, Y., Mantena, G., Samarakoon, 4169-4172). IEEE.
L., Kundu, S. and Tan, T., 2017. Adaptation of
deep neural network acoustic models for robust
automatic speech recognition. New Era for
Robust Speech Recognition: Exploiting Deep
Learning, pp.219-243.
[18] Serizel, R. and Giuliani, D., 2014. Deep neural
network adaptation for children's and adults'
speech recognition. Deep neural network
adaptation for children's and adults' speech
recognition, pp.344-348.
[19] Soundarya, M., Karthikeyan, P.R. and
Thangarasu, G., 2023, March. Automatic Speech
Recognition trained with Convolutional Neural
Network and predicted with Recurrent Neural
Network. In 2023 9th International Conference
on Electrical Energy Systems (ICEES) (pp. 41-
45). IEEE.
[20] Toledano, D.T., Fernández-Gallego, M.P. and
Lozano-Diez, A., 2018. Multi-resolution speech
analysis for automatic speech recognition using
deep neural networks: Experiments on TIMIT.
PloS one, 13(10), p.e0205355.
[21] Tong, S., Garner, P.N. and Bourlard, H., 2017.
An investigation of deep neural networks for
multilingual speech recognition training and
adaptation (No. CONF, pp. 714-718).
[22] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit,
J., Jones, L., Gomez, A.N., Kaiser, Ł. and
Polosukhin, I., 2017. Attention is all you need.
Advances in neural information processing
systems, 30.
[23] Weng, C., Yu, D., Seltzer, M.L. and Droppo, J.,
2015. Deep neural networks for single-channel
multi-talker speech recognition. IEEE/ACM
Transactions on Audio, Speech, and Language
Processing, 23(10), pp.1670-1679.
[24] Yao, Kaisheng, Dong Yu, Frank Seide, Hang Su,
Li Deng, and Yifan Gong. "Adaptation of
context-dependent deep neural networks for
automatic speech recognition." In 2012 IEEE
Spoken Language Technology Workshop (SLT),
pp. 366-369. IEEE, 2012.

© 2023, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM27313 | Page 6

You might also like