0% found this document useful (0 votes)

74 views20 pages

Incorporating Knowledge Sources Into Statistical Speech Recognition

This document provides an overview of a book about incorporating knowledge sources into statistical speech recognition. The book presents a graphical framework called GFIKS that allows various knowledge sources to be incorporated into hidden Markov models (HMMs) for acoustic modeling in automatic speech recognition (ASR) systems. The framework uses Bayesian networks to represent the probabilistic relationships between different knowledge sources, such as background noise, accent, gender, and phonetic knowledge. This allows a simplified joint probability model to be constructed and estimated using limited training data, while maintaining performance improvements over traditional HMM-based ASR. The book evaluates the approach on large-vocabulary continuous speech recognition tasks.

Uploaded by

هيدايو کامارودين

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views20 pages

Incorporating Knowledge Sources Into Statistical Speech Recognition

Uploaded by

هيدايو کامارودين

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Incorporating Knowledge Sources into Statistical Speech Recognition

Lecture Notes in Electrical Engineering

Incorporating Knowledge Sources into Statistical Speech Recognition Sakti, Sakriani, Markov, Konstantin, Nakamura, Satoshi, and Minker, Wolfgang 978-0-387-85829-6 Intelligent Technical Systems Martnez Madrid, Natividad; Seepold, Ralf E.D. (Eds.) 978-1-4020-9822-2 Languages for Embedded Systems and their Applications Radetzki, Martin (Ed.) 978-1-4020-9713-3 Multisensor Fusion and Integration for Intelligent Systems Lee, Sukhan; Ko, Hanseok; Hahn, Hernsoo (Eds.) 978-3-540-89858-0 Designing Reliable and Efficient Networks on Chips Murali, Srinivasan 978-1-4020-9756-0 Trends in Communication Technologies and Engineering Science Ao, Sio-Iong; Huang, Xu; Wai, Ping-kong Alexander (Eds.) 978-1-4020-9492-7 Functional Design Errors in Digital Circuits: Diagnosis Correction and Repair Chang, Kai-hui, Markov, Igor, Bertacco, Valeria 978-1-4020-9364-7 Traffic and QoS Management in Wireless Multimedia Networks: COST 290 Final Report Koucheryavy, Y., Giambene, G., Staehle, D., Barcelo-Arroyo, F., Braun, T., Siris, V. (Eds.) 978-0-387-85572-1 Proceedings of the 3rd European Conference on Computer Network Defense Siris, V.; Ioannidis, S.; Anagnostakis, K.; Trimintzios, P. (Eds.) 978-0-387-85554-7 Intelligentized Methodology for Arc Welding Dynamical Processes: Visual Information Acquiring, Knowledge Modeling and Intelligent Control Chen, Shan-Ben, Wu, Jing 978-3-540-85641-2 Proceedings of the European Computing Conference: Volume 2 Mastorakis, Nikos, Mladenov, Valeri, Kontargyri, Vassiliki T. (Eds.) 978-0-387-84818-1 Proceedings of the European Computing Conference: Volume 1 Mastorakis, Nikos, Mladenov, Valeri, Kontargyri, Vassiliki T. (Eds.) 978-0-387-84813-6 Electronics System Design Techniques for Safety Critical Applications Sterpone, Luca 978-1-4020-8978-7 Data Mining and Applications in Genomics Ao, Sio-Iong 978-1-4020-8974-9 Continued after index

Sakriani Sakti Konstantin Markov Satoshi Nakamura Wolfgang Minker

Incorporating Knowledge Sources into Statistical Speech Recognition

Sakriani Sakti NICT/ATR Spoken Language Communication Research Laboratories Keihanna Science City Kyoto, Japan Satoshi Nakamura NICT / ATR Spoken Language Communication Research Laboratories Keihanna Science City Kyoto, Japan

Konstantin Markov NICT/ATR Spoken Language Communication Research Laboratories Keihanna Science City Kyoto, Japan Wolfgang Minker University of Ulm Ulm, Germany

ISBN 978-0-387-85829-6 DOI: 10.1007/978-0-387-85830-2

e-ISBN 978-0-387-85830-2

Library of Congress Control Number: 2008942803 Springer Science+Business Media, LLC 2009 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of going to press, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper. springer.com

This book is dedicated to our parents and families for their support and endless love

Preface

State-of-the-art automatic speech recognition (ASR) systems use statistical data-driven methods based on hidden Markov models (HMMs). Although such approaches have proved to be ecient choices, ASR systems often perform much worse than human listeners, especially in the presence of unexpected acoustic variability. To improve performance, we usually rely on collecting more data to train more detailed models. However, such resources are rarely available, since the presence of variabilities in speech arise from many dierent factors, and thus a huge amount of training data is required to cover all possible variabilities. In other words, it is not enough to handle these variabilities by relying solely on statistical models. The systems need additional knowledge on speech that could help to handle these sources of variability. Otherwise, only a limited level of success could be achieved. Many researchers are aware of this problem, and thus various attempts to integrate more explicitly knowledge-based and statistical approaches have been made. However, incorporating various additional knowledge sources often leads to a complicated model, where achieving optimal performance is not feasible due to insucient resources or data sparseness. As a result, input space resolution may be lost due to non-robust estimates and the increased number of unseen patterns. Moreover, decoding with large models may also become cumbersome and sometimes even impossible. This book addresses the problem of developing ecient ASR systems that can maintain a balance between utilizing wide-ranging knowledge of speech variability while keeping the training/recognition eort feasible, of course while also improving speech recognition performance. In this book, an efcient general framework to incorporate additional knowledge sources into state-of-the-art statistical ASR systems is provided. It can be applied to many existing ASR problems with their respective model-based likelihood functions in exible ways. Since there are various types of knowledge sources from dierent domains, it may be dicult to formulate a probabilistic model without learning the dependencies between the sources. To solve such problems in a unied way, the

VIII

PREFACE

work reported in this book adopts the Bayesian network (BN) framework. This approach allows the probabilistic relationship between information sources to be learned. Another advantage of the BN framework lies in the fact that it facilitates the decomposition of the joint probability density function (PDF) into a linked set of local conditional PDFs based on the junction tree algorithm. Consequently, a simplied form of the model can be constructed and reliably estimated using a limited amount of training data. This book focuses on the acoustic modeling problem as arguably the central part of any speech recognition system. The incorporation of various knowledge sources, including background noises, accent, gender and wide phonetic knowledge information, in modeling is also discusses. Such an application often suers from a sparseness of data and memory constraints. First, the additional sources of knowledge are incorporated at the HMM state distribution. Then, these additional sources of knowledge are incorporated at the HMM phonetic modeling. The presented approaches are experimentally veried in the large-vocabulary continuous-speech recognition (LVCSR) task. The book closes with a summary of the described methods and the results of the evaluations.

Contents

Introduction and Book Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Automatic Speech Recognition - A Way of Human-Machine Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Approaches to Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Knowledge-based Approaches . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Corpus-based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 State-of-the-art ASR Performance . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Studies on Incorporating Knowledge Sources . . . . . . . . . . . . . . . . 1.4.1 Sources of Variability in Speech . . . . . . . . . . . . . . . . . . . . . 1.4.2 Existing Ways of Incorporating Knowledge Sources . . . . 1.4.3 Major Challenges to Overcome . . . . . . . . . . . . . . . . . . . . . . 1.5 Book Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Statistical Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Pattern Recognition Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Theory of Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 General form of an HMM . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Principle Cases of HMM . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Pattern Recognition for HMM-Based ASR Systems . . . . . . . . . . 2.3.1 Front-end Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 HMM-Based Acoustic Model . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Pronunciation Lexicon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Language Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graphical Framework to Incorporate Knowledge Sources . . 3.1 Graphical Model Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Graphical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Junction Tree Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 4 4 6 7 10 10 12 15 16 19 19 22 22 23 25 35 36 43 49 50 51 55 56 56 59 63

Contents

3.2 Procedure of GFIKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Causal Relationship between Information Sources . . . . . 3.2.2 Direct Inference on Bayesian Network . . . . . . . . . . . . . . . . 3.2.3 Junction Tree Decomposition . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Junction Tree Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Practical Issues of GFIKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Types of Knowledge Sources . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Dierent Levels of Incorporation . . . . . . . . . . . . . . . . . . . . 4

68 70 71 72 75 75 75 76

Speech Recognition Using GFIKS . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.1 Applying GFIKS at the HMM State Level . . . . . . . . . . . . . . . . . . 79 4.1.1 Causal Relationship Between Information Sources . . . . . 80 4.1.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.1.3 Enhancing Model Reliability . . . . . . . . . . . . . . . . . . . . . . . . 81 4.1.4 Training and Recognition Issues . . . . . . . . . . . . . . . . . . . . . 82 4.2 Applying GFIKS at the HMM Phonetic-unit Level . . . . . . . . . . . 83 4.2.1 Causal Relationship between Information Sources . . . . . 83 4.2.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2.3 Enhancing the Model Reliability . . . . . . . . . . . . . . . . . . . . 85 4.2.4 Deleted Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.2.5 Training and Recognition Issues . . . . . . . . . . . . . . . . . . . . . 86 4.3 Experiments with Various Knowledge Sources . . . . . . . . . . . . . . . 87 4.3.1 Incorporating Knowledge at the HMM State Level . . . . . 87 4.3.2 Incorporating Knowledge at the HMM Phonetic-unit Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.4 Experiments Summary and Discussion . . . . . . . . . . . . . . . . . . . . . 132 Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.1.1 Theoretical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.1.2 Application Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.1.3 Experimental Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 5.2 Future Directions: A Roadmap to a Spoken Language Dialog System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Speech Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 A.1 AURORA TIDigit Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 A.2 TIMIT Acoustic-Phonetic Speech Corpus . . . . . . . . . . . . . . . . . . . 146 A.3 Wall Street Journal Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 A.4 ATR Basic Travel Expression Corpus . . . . . . . . . . . . . . . . . . . . . . 150 A.5 ATR English Database Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Contents

ATR Software Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 B.1 Generic Properties of ATRASR . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 B.2 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 B.3 SSS Data Generating Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 B.4 Acoustic Model Training Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 B.5 Language Model Training Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 B.6 Recognition Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Composition of Bayesian Wide-phonetic Context . . . . . . . . . . 163 C.1 Proof using Bayess Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 C.2 Variants of Bayesian Wide-phonetic Context Model . . . . . . . . . . 164 Statistical Signicance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 D.1 Statistical Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 D.2 The Use of the Sign Test for ASR . . . . . . . . . . . . . . . . . . . . . . . . . 172

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

List of Figures

1.1 1.2 1.3 1.4 1.5 1.6 1.7

1.8 2.1

A machine that recognizes the speech waveform of a human utterances as Good night. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Knowledge-based ASR system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Speech spectrogram reading, which corresponds to the word sequence of Good night. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Corpus-based statistical ASR system. . . . . . . . . . . . . . . . . . . . . . . . 6 2003 NISTs benchmark ASR test history (After Pallett, 2003, c 2003 IEEE). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 TC-STAR ASR evaluation campaign (After Choukri, 2007, c TC-STAR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 S curve of ASR technology progress and the predicted performance from combining deep knowledge with a statistical approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Incorporating knowledge into a corpus-based statistical ASR system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Pattern recognition: Establishing mapping from multidimensional measurement space X to three-class target decision space Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pattern recognition approach for ASR: Establish mapping from measurement space X of speech signal to target decision space Y of word strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simple three-state Markov chain for daily weather. . . . . . . . . . . . HMM of the daily weather, where there is no deterministic meaning on any state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Left-to-right HMM of the daily weather. . . . . . . . . . . . . . . . . . . . . . Process ow on trellis diagram of 3-state HMM with time length T. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Forward probability function representation (for j=1). . . . . . . . . . Backward probability function representation (for i=1). . . . . . . .

2.2

2.3 2.4 2.5 2.6 2.7 2.8

22 22 24 25 26 27 28

XIV

List of Figures

2.9 2.10 2.11 2.12

2.13 2.14 2.15

2.16 2.17

2.18

2.19

2.20 2.21 2.22 2.23 2.24 2.25 2.26 3.1 3.2 3.3 3.4

Example of nding the best path on a trellis diagram using the Viterbi algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graphical interpretation of the EM algorithm. . . . . . . . . . . . . . . . Forward-backward probability function representation. . . . . . . . . A generic automatic speech recognition system, composed of ve components: feature extraction, acoustic model, pronunciation lexicon, language model and search algorithm. . . . Source-Filter model of the speech signal x[n] = e[n] h[n]. . . . . . Source-lter separation by cepstral analysis. . . . . . . . . . . . . . . . . . (a) A windowed speech waveform. (b) The spectrum of Figure 2.15(a). (c) The resulting cepstrum. (d) The Fourier transform of the low-quefrency component. . . . . . . . . . . . . . . . . . . MFCC feature extraction technique, which generates a 25-dimensional feature vector xt for each frame. . . . . . . . . . . . . . . A summary of feature extraction process, producing a feature vector which correspond to one point in a multi-dimensional space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discrete HMM observation density where the emission statistics or HMM state output probabilities are represented by discrete symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Continuous GMM, where the continuous observation space is modeled using mixture Gaussians (state-specic). They are weighted and added to compute the emission statistic likelihoods (HMM state output probabilities). . . . . . . . . . . . . . . . . Structure example of the monophone /a/ HMM acoustic model. Structure example of the triphone /a , a, a+ / HMM acoustic model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shared-state structures of the triphone /a , a, a+ / HMM acoustic model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An example of a phonetic decision tree for HMM state of the triphone with the central phoneme /ay/. . . . . . . . . . . . . . . . . . . . . Contextual splitting and temporal splitting of SSS algorithm (After Jitsuhiro, 2005). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of a tree-based pronunciation lexicon. . . . . . . . . . . . . . . . Multi-level probability estimation of statistical ASR. . . . . . . . . . . Incorporating knowledge into corpus-based statistical ASR system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two equivalent models that can be obtained from each other through arc reversal of Bayess rule, since P(a,b)=P(b,a). . . . . . . Graphical representation of P (a|b1 , b2 , ..., bn ). . . . . . . . . . . . . . . . . Three BNs with dierent arrow directions over the same random variables a, b, and c. They appear in the case of serial, diverging, and converging connection, respectively. . . . . . . . . . . . .

30 31 32

36 37 37

38 41

45 46 47 47 48 49 50 52 55 60 60

List of Figures

3.5 3.6 3.7 3.8 3.9 3.10 3.11

3.12

3.13

3.14 3.15 4.1

Example of BN topology describing conditional relationship among a, b, c, d, e, f , g and h. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Moral and triangulated graph of Figure 3.5. . . . . . . . . . . . . . . . . . Junction graph of Figure 3.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The resulting junction tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clique C1 = [a, b, d] in the original graph of Figure 3.5. . . . . . . . . General procedure of GFIKS (graphical framework to incorporate additional knowledge sources). . . . . . . . . . . . . . . . . . . . (a) BN topology describing the conditional relationship between data D and model M . (b) BN topology describing the conditional relationship among D, M , and additional knowledge K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples of BN topologies describing the conditional relationship among data D, model M , and several knowledge sources K1 , K2 , ..., KN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (a) BN topology describing conditional relationship among D, M , K1 , and K2 . (b) Moral and triangulated graph of Figure 3.13(a). (c) Equivalent BN topology. (d) Moral and triangulated graph of Figure 3.13(c). (e) Junction tree of Figure 3.13(d). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (a) Equivalent BN topology of the BN shown in Figure 3.12(a). (b) Corresponding junction tree. . . . . . . . . . . . . . . . . . . . . Incorporating knowledge sources into HMM state (denoted by a small box) and phonetic unit level (denoted by a large box). . (a) Applying GFIKS at the HMM state level. (b) BN topology structure describing the conditional relationship between HMM state Q and observation vector X. . . . . . . . . . . . . . . . . . . . BN topology structure after incorporating additional knowledge sources K1 , K2 , ..., KN in HMM state distribution P (X, Q) (assuming that all K1 , K2 , ..., KN are independent given Q). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of observation space modeling by BN, where each value of Ki corresponds to a dierent Gaussian. . . . . . . . . . . . . . . (a) Applying GFIKS at the HMM phonetic-unit level. (b) BN topology structure describing the conditional relationship between HMM phonetic model and observation segment Xs . . BN topology structure after incorporating additional knowledge sources K1 , K2 , ..., KN in HMM phonetic model P (Xs , ) (assuming that all K1 , K2 , ..., KN are independent given ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rescoring procedure with the composition models. . . . . . . . . . . . . BN topology structure showing the conditional relationship among HMM state Q, observation vector X, and additional knowledge source of gender information G. . . . . . . . . . . . . . . . . . .

63 64 66 66 67 69

73 74 77

4.2

81 82

4.3 4.4

4.5

4.6 4.7

84 87

XVI

List of Figures

4.8

4.9

4.10 4.11

4.12

4.13 4.14 4.15 4.16 4.17

4.18

4.19 4.20

4.21

4.22

4.23

Recognition accuracy rates of proposed HMM/BN, which are comparable with those of other systems from the Hub and Spoke Paradigm for Continuous Speech Recognition Evaluation for primary condition of WSJ Hub2-5k task. . . . . . 93 BN topology structure describing the conditional relationship between HMM state Q, observation vector X, and additional knowledge sources of noise type N and SNR value S. . . . . . . . . . 94 Comparison of dierent systems: HMM, DBN (Bilmes et al., 2001), and proposed HMM/BN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 BN topologies of the left state (a), center state (b), and right state (c) of LR-HMM/BN for modeling a pentaphone context /a , a , a, a+ , a++ /. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 BN topologies of the left state (a), center state (b), and right state (c) of LRC-HMM/BN, for modeling a pentaphone context /a , a , a, a+ , a++ /. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Observation space modeling by BN, where a dierent value of second following context CR corresponds to a dierent Gaussian.101 Knowledge-based phoneme classes of the observation space. . . . . 102 Determining distance metric by Euclidean distance. . . . . . . . . . . . 103 Data-driven phoneme classes of observation space. . . . . . . . . . . . . 103 Recognition accuracy rates of pentaphone LR-HMM/BN using knowledge-based second preceding and following context clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Recognition accuracy rates of pentaphone LRC-HMM/BN using knowledge-based second preceding and following context clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Recognition accuracy rates of pentaphone LR-HMM/BN and LRC-HMM/BN using data-driven Gaussian clustering. . . . . . . . . 108 Comparing recognition accuracy rates of triphone HMM and pentaphone HMM/BN models with a xed and a varied number of mixture components per state, but having the same 15 mixture components per state on average. . . . . . . . . . . . . . . . . 109 Topology of fLRC-HMM/BN for modeling a pentaphone context /a , a , a, a+ , a++ /, where state PDF has additional variables CL and CR representing the second preceding and following contexts, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 (a) fLRCG-HMM/BN topology with additional knowledge G, CL and CR , (b) fLRCA-HMM/BN topology with additional variables A, CL , and CR , and (c) fLRCAG-HMM/BN topology with additional knowledge A, G, CL , and CR . . . . . . . . . . . . . . . . . 111 Recognition accuracy rates of proposed HMM/BN models having identical numbers of 5, 10, and 20 mixture components per state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

List of Figures

XVII

4.24 Comparing recognition accuracy rates of dierent systems: triphone HMM baseline, pentaphone HMM baseline, and the proposed pentaphone HMM/BN models having the same ve mixture components per state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.25 BN topology structure describing the conditional relationship among Xs , , CL , and CR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.26 (a) Equivalent BN topology. (b) Moral and triangulated graph of Figure 4.26(a). (c) Junction tree of Figure 4.26(b). . . . . . . . . . 117 4.27 (a) Conventional triphone model, (b) Conventional pentaphone model, (c) Bayesian pentaphone model composition C1L3R3, consisting of the preceding/following triphone-context unit and center-monophone unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.28 Rescoring procedure with pentaphone composition models: C1L3R3 or C3L4R4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 4.29 N-best rescoring mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.30 Recognition accuracy rates of Bayesian triphone model. . . . . . . . 122 4.31 Recognition accuracy rates of Bayesian pentaphone models. . . . . 124 4.32 Relative reductions in WER by Bayesian triphone C1L2R2 model from monophone baseline and by Bayesian pentaphone C1L3R3 model from triphone baseline. . . . . . . . . . . . . . . . . . . . . . . 125 4.33 Recognition accuracy rates of conventional pentaphone C5 and proposed Bayesian pentaphone C1L3R3 models with dierent amounts of training data. . . . . . . . . . . . . . . . . . . . . . . . . . . 126 4.34 BN topology structure describing the conditional relationship among Xs , , CL , CR , A, and G. . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 4.35 (a) Equivalent BN topology of Figure 4.34. (b) Moral and triangulated graph of Figure 4.35(a). (c) Corresponding junction tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 4.36 Rescoring procedure with the accent-gender-dependent pentaphone composition models: C1L3R3, C1L3R3-A, C1L3R3-G, and C1L3R3-AG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 4.37 Comparing recognition accuracy rates of dierent systems triphone HMM baseline, pentaphone HMM baseline, and proposed pentaphone models having the same 5, 10, and 20 mixture components per state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 4.38 Comparing recognition accuracy rates of dierent systems: triphone HMM baseline, pentaphone HMM baseline, and proposed models incorporating knowledge sources at HMM state and phonetic unit levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.1 Roadmap to spoken language dialog system incorporating other knowledge sources at higher ASR levels. . . . . . . . . . . . . . . . . 143

B.1 The ATRASR phoneme-based SSS data creation for phone-unit model training. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

XVIII List of Figures

B.2 The ATRASR topology training for each phone acoustic-unit model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 B.3 The ATRASR embedded training for a whole HMnet. . . . . . . . . . 159 B.4 The recognition process using ATRASR tools. . . . . . . . . . . . . . . . . 160 C.1 Bayesian pentaphone model composition. (a) is C5, the conventional pentaphone model, (b) is Bayesian C1L3R3, which is composed of the preceding/following triphone-context unit and center-monophone unit, (c) is Bayesian C3L4R4, which is composed of the preceding/following tetraphonecontext unit and center-triphone-context unit, (d) is Bayesian C1Lsk3Rsk3, which is composed of the preceding/following skip-triphone-context unit and center-monophone unit, and (e) is Bayesian C1C3Csk3, which is composed of the center skip-triphone-context unit, center triphone-context unit and center-monophone unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 D.1 The distribution of population according to the null hypothesis (H0 is true), with upper-tail of rejection region for P . . . . . . 171

List of Tables

4.1 4.2

English phoneme set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 1993 Hub and Spoke CSR evaluation on Hub 2: 5k read WSJ task (Kubala et al., 1994; Pallett et al., 1994). . . . . . . . . . . . . . . . 92 4.3 HMM/BN system performance on Hub 2: 5k read WSJ task. . . . 93 4.4 Recognition accuracy rates (%) for proposed HMM/BN on AURORA2 task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.5 Knowledge-based phoneme classes based on manner of articulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.6 Recognition accuracy rates (%) for proposed pentaphone HMM/BN model using fLRC-HMM/BN (see Figure 4.22) on a test set of matching accents with dierent numbers of mixture components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.7 Recognition accuracy rates (%) for proposed pentaphone HMM/BN model using fLRC-HMM/BN (see Figure 4.22) on a test set of mismatched accents with 15 mixture components. . . 115 4.8 Recognition accuracy rates (%) for proposed Bayesian pentaphone C1L3R3-AG (see Eq. (4.30)) on a test set of matching accents with dierent numbers of mixture components. 131 4.9 Recognition accuracy rates (%) for proposed Bayesian pentaphone C1L3R3-AG model (see Eq. (4.30)) on a test set of mismatched accents with 15 mixture components. . . . . . . . . . . . 132 4.10 Summary of incorporating various knowledge sources at the HMM state level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 4.11 Summary of incorporating various knowledge sources at the HMM phonetic unit level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 A.1 A.2 A.3 A.4 A.5 Dialect distribution of speakers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Speech materials of TIMIT database. . . . . . . . . . . . . . . . . . . . . . . . . 148 Statistics on the TIMIT database. . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Text sentence materials of ATR English speech database. . . . . . . 151 Speech materials of ATR English speech database. . . . . . . . . . . . . 151

Glossary

AM ARPA ASR A-STAR ATR AUS BN BRT BTEC BU C1 C3 Csk3 C5 CCCC CNRS-LIMSI CPD CPT CSR C-STAR CU DAG DARPA DBN DCT DEL DI DSR EDB ELRA

Acoustic model Advanced Research Projects Agency Automatic speech recognition Asian speech translation advanced research Advanced Telecommunication Research Australian Bayesian network British Basic travel expression corpus Boston University Center monophone unit Center triphone context Center skip-triphone context Center pentaphone context CSR corpus coordinating committee Frances National Center for Scientic Research Conditional probability distribution Conditional probability table Continuous speech recognition Consortium for speech translation advanced research Cambridge University Directed acyclic graph Defense Advanced Research Projects Agency Dynamic Bayesian network Discrete cosine transform Deletions Deleted interpolation Distributed speech recognition English database European language resources association

XXII

GLOSSARY

EM EPPS fLRC-HMM/BN fLRCA-HMM/BN

Expectation-maximization European Parliament Plenary Sessions Full HMM/BN for left, right and center state Full HMM/BN for left, right and center state, including accent dependency fLRCAG-HMM/BN Full HMM/BN for left, right and center state, including accent and gender dependency fLRG-HMM/BN Full HMM/BN for left, right and center state, including gender dependency FFT Fast Fourier transform GDHMM Gender-dependent Hidden Markov model GFIKS Graphical framework to incorporate additional knowledge sources GIHMM Gender-independent Hidden Markov model GMM Gaussian mixture model HMM Hidden Markov model ICASSP International conference on acoustics, speech and signal processing ICSI International Computer Science Institute ICSLP International conference on spoken language processing IEEE Institute of Electrical and Electronics Engineers IEICE Institute of Electronics, Information and Communication Engineers Imp Improvement INS Insertions L3 Left triphone context L4 Left tetraphone context LM Language model LPC Linear prediction coecients LRC-HMM/BN HMM/BN for left, right and center state LR-HMM/BN HMM/BN for left and right state Lsk3 Left skip-triphone context LVCSR Large-vocabulary continuous-speech recognition MAD Machine translation aided dialogue MAP Maximum a posteriori MDL Minimum description length MFCC Mel-frequency cepstral coecients MIT Massachusetts Institute of Technology ML Maximum likelihood MLLR Maximum likelihood linear regression MSG Modulation-ltered spectrogram MT Machine translation NIST National Institute of Standards and Technology NOVO Noise voice composition PDF Probability density function

GLOSSARY XXIII

PLP PMC R3 R4 Rel Resc Rsk3 S2ST SD SI SIL SLC SNR SSS STQ SUB SWB TC-STAR TI US VQ WER WFST WSJ

Perceptual linear prediction Parallel model combination Right triphone context Right tetraphone context Relative Rescoring Right skip-triphone context Speech-to-speech translation Speaker dependent Speaker independent Silence Spoken Language Communication Signal-to-noise ratio Successive state splitting Speech processing, transmission and quality Substitutions Switchboard Technology and corpora for speech to speech translation research Texas Instrument United States Vector quantization Word error rate Weighted nite state transducers Wall Street journal

International Journal of Cognitive Computing in Engineering: Harsh Ahlawat, Naveen Aggarwal, Deepti Gupta
No ratings yet
International Journal of Cognitive Computing in Engineering: Harsh Ahlawat, Naveen Aggarwal, Deepti Gupta
37 pages
Artificial Intelligence For Speech Recognition
No ratings yet
Artificial Intelligence For Speech Recognition
5 pages
Speech Recognition Using Deep Learning Techniques
No ratings yet
Speech Recognition Using Deep Learning Techniques
5 pages
Artificial Intelligence For Speech Recog
No ratings yet
Artificial Intelligence For Speech Recog
5 pages
10 - Recurrent Neural Network Based Speech Emotion
No ratings yet
10 - Recurrent Neural Network Based Speech Emotion
13 pages
Speech Recognition
100% (4)
Speech Recognition
576 pages
Automatic Speech Recognition A Comprehen
No ratings yet
Automatic Speech Recognition A Comprehen
27 pages
Sensors 20 02326 PDF
No ratings yet
Sensors 20 02326 PDF
19 pages
Modeling of Speech Recognition Based On Deep Learning: Min Zhang
No ratings yet
Modeling of Speech Recognition Based On Deep Learning: Min Zhang
8 pages
Applsci 12 01091
No ratings yet
Applsci 12 01091
18 pages
End-to-End Speech Recognition: A Survey
No ratings yet
End-to-End Speech Recognition: A Survey
27 pages
Integrated Method of Deep Learning and Large Language Model in Speech Recognition
No ratings yet
Integrated Method of Deep Learning and Large Language Model in Speech Recognition
6 pages
Speech Overview
No ratings yet
Speech Overview
30 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
A Study On Automatic Speech Recognition
100% (1)
A Study On Automatic Speech Recognition
2 pages
Developing A Negative Speech Emotion Recognition Model For Safety Systems Using Deep Learning
No ratings yet
Developing A Negative Speech Emotion Recognition Model For Safety Systems Using Deep Learning
31 pages
Human-Robot Communication: Supervisor: Prof. Nejat Biomechantronics Lab Progress Report
No ratings yet
Human-Robot Communication: Supervisor: Prof. Nejat Biomechantronics Lab Progress Report
23 pages
IRJET Speech Scribd
No ratings yet
IRJET Speech Scribd
3 pages
Performanceanalysisof ASRModelfor Santhalilanguageon Kaldiand Matlab Toolkit
No ratings yet
Performanceanalysisof ASRModelfor Santhalilanguageon Kaldiand Matlab Toolkit
5 pages
Comparative Analysis of Automatic Speech Recognition Techniques
No ratings yet
Comparative Analysis of Automatic Speech Recognition Techniques
8 pages
Multitask Learning of Deep Neural Networks For Low-Resource Speech Recognition
No ratings yet
Multitask Learning of Deep Neural Networks For Low-Resource Speech Recognition
12 pages
The Impact of Speech Recognition On Speech Synthesis
No ratings yet
The Impact of Speech Recognition On Speech Synthesis
8 pages
Speech Recognition Using Neural Networks: A Review: Dhavale Dhanashri, S.B. Dhonde
No ratings yet
Speech Recognition Using Neural Networks: A Review: Dhavale Dhanashri, S.B. Dhonde
4 pages
Speech Recognition in AI (COMP 334)
No ratings yet
Speech Recognition in AI (COMP 334)
26 pages
Automatic Speech Recognition: A Review: Anchal Katyal, Amanpreet Kaur, Jasmeen Gill
No ratings yet
Automatic Speech Recognition: A Review: Anchal Katyal, Amanpreet Kaur, Jasmeen Gill
4 pages
Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech Signal Processing Algorithms, Tools and Systems
No ratings yet
Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech Signal Processing Algorithms, Tools and Systems
37 pages
A Review On Automatic Speech Recognition Architect
No ratings yet
A Review On Automatic Speech Recognition Architect
13 pages
Application of Deep Learning-Based Speech Signal P
No ratings yet
Application of Deep Learning-Based Speech Signal P
6 pages
An In-Depth Analysis of Automatic Speech Recognition System
No ratings yet
An In-Depth Analysis of Automatic Speech Recognition System
5 pages
Rohit
No ratings yet
Rohit
14 pages
13 Spectral Warping and Data Augmentation For Low Resource Language ASR
No ratings yet
13 Spectral Warping and Data Augmentation For Low Resource Language ASR
11 pages
End-to-End Speech Recognition Survey
No ratings yet
End-to-End Speech Recognition Survey
27 pages
Audio Processing and Speech Recognition Concepts Techniques and Research Overviews
No ratings yet
Audio Processing and Speech Recognition Concepts Techniques and Research Overviews
107 pages
Convai Technical Overview Speech Ai Part 2 2301964
No ratings yet
Convai Technical Overview Speech Ai Part 2 2301964
11 pages
Speech Recognition Algo
No ratings yet
Speech Recognition Algo
17 pages
Easychair Preprint: Adnene Noughreche, Sabri Boulouma and Mohammed Benbaghdad
No ratings yet
Easychair Preprint: Adnene Noughreche, Sabri Boulouma and Mohammed Benbaghdad
8 pages
Speech Recognition Using Artificial Neural Network: - A Review
100% (1)
Speech Recognition Using Artificial Neural Network: - A Review
4 pages
A Review On Feature Extraction and Noise Reduction Technique
No ratings yet
A Review On Feature Extraction and Noise Reduction Technique
5 pages
Speech Recognition For Mobile Systems: BY: Pratibha Channamsetty Shruthi Sambasivan
No ratings yet
Speech Recognition For Mobile Systems: BY: Pratibha Channamsetty Shruthi Sambasivan
36 pages
Synopsis
No ratings yet
Synopsis
5 pages
Automated Speech Recognition Systems Applications in Industry
No ratings yet
Automated Speech Recognition Systems Applications in Industry
4 pages
Speech Recognition & Sentiment Analysis
No ratings yet
Speech Recognition & Sentiment Analysis
23 pages
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
No ratings yet
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
6 pages
Speech Recognition As Emerging Revolutionary Technology
No ratings yet
Speech Recognition As Emerging Revolutionary Technology
4 pages
Human-Computer Interaction Based On Speech Recogni
No ratings yet
Human-Computer Interaction Based On Speech Recogni
9 pages
Microcomputerschool 1994 Neural Networks in Speech
No ratings yet
Microcomputerschool 1994 Neural Networks in Speech
19 pages
A Comprehensive Survey On Automatic Speech Recognition Using Neural Networks
No ratings yet
A Comprehensive Survey On Automatic Speech Recognition Using Neural Networks
46 pages
Study On Speech Recognition Method of Artificial Intelligence Deep Learning
No ratings yet
Study On Speech Recognition Method of Artificial Intelligence Deep Learning
6 pages
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
No ratings yet
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
17 pages
Speech Recognition
No ratings yet
Speech Recognition
10 pages
Research Paper
No ratings yet
Research Paper
9 pages
ASR For Embedded Real Time Applications: K.Kartheek, D.V.Srihari Babu
No ratings yet
ASR For Embedded Real Time Applications: K.Kartheek, D.V.Srihari Babu
9 pages
Ai Speech
No ratings yet
Ai Speech
17 pages
Shareef Seminar Docs
No ratings yet
Shareef Seminar Docs
24 pages
5 Final Project Thesis Report Book Format
No ratings yet
5 Final Project Thesis Report Book Format
4 pages
Vivek Kumar - 1613112052
No ratings yet
Vivek Kumar - 1613112052
7 pages
Artificial Intelligence For Speech Recognition
No ratings yet
Artificial Intelligence For Speech Recognition
9 pages
Cream Puff Recipe for Home Bakers
No ratings yet
Cream Puff Recipe for Home Bakers
2 pages
Speech Signal Processing Guide
No ratings yet
Speech Signal Processing Guide
15 pages
Development Framework For Mobile Social Applications: (Despindler, Grossniklaus, Norrie) @inf - Ethz.ch
No ratings yet
Development Framework For Mobile Social Applications: (Despindler, Grossniklaus, Norrie) @inf - Ethz.ch
15 pages
Workbook Chapter 1CALC
No ratings yet
Workbook Chapter 1CALC
33 pages
Weekly Exam 3
No ratings yet
Weekly Exam 3
5 pages
Politics and The Film
No ratings yet
Politics and The Film
19 pages
Design of OFDM Transmitter and Receiver For Error Free Communication
No ratings yet
Design of OFDM Transmitter and Receiver For Error Free Communication
61 pages
Mysql Json Export
100% (1)
Mysql Json Export
7 pages
Passive Voice Test for Teens
No ratings yet
Passive Voice Test for Teens
4 pages
Mechatronics Lab Manual Latest - Dummy
No ratings yet
Mechatronics Lab Manual Latest - Dummy
11 pages
Handout Number Week 1
No ratings yet
Handout Number Week 1
1 page
CAD, Mechatronics
No ratings yet
CAD, Mechatronics
168 pages
Mubtilaat e Namaz
No ratings yet
Mubtilaat e Namaz
5 pages
Luxury Living at Sainamaha Panvel
No ratings yet
Luxury Living at Sainamaha Panvel
9 pages
InGuard (Toll Fraud Guard) Application Installation Manual - 2 - 0
No ratings yet
InGuard (Toll Fraud Guard) Application Installation Manual - 2 - 0
23 pages
Examen English Level 10 PDF
No ratings yet
Examen English Level 10 PDF
3 pages
Ielts Reading: Techniques
50% (4)
Ielts Reading: Techniques
33 pages
Борискин О.И Appendix - 1 2016
No ratings yet
Борискин О.И Appendix - 1 2016
94 pages
Yassarnal Qur-Aan: Part Two
No ratings yet
Yassarnal Qur-Aan: Part Two
32 pages
Space and Geometry in The B Deduction
No ratings yet
Space and Geometry in The B Deduction
31 pages
Determinants: 97 Questions & Solutions
No ratings yet
Determinants: 97 Questions & Solutions
13 pages
Literary Theory Bakhtin Theoryof Dialogism Ideasand Applications
No ratings yet
Literary Theory Bakhtin Theoryof Dialogism Ideasand Applications
14 pages
CBSE English Class VI Annual Examination Question Paper Pattern 2022-2023
No ratings yet
CBSE English Class VI Annual Examination Question Paper Pattern 2022-2023
1 page
About Illustrator Theory
No ratings yet
About Illustrator Theory
3 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
42 pages
39 Books of The Old Testement Names and Meaning Assignment
No ratings yet
39 Books of The Old Testement Names and Meaning Assignment
4 pages
Vtune Profiler - Cookbook - 2023.0 766316 766317
No ratings yet
Vtune Profiler - Cookbook - 2023.0 766316 766317
323 pages
RELIGION STUDIES P1 GR12 QP SEPT 2023 - English
No ratings yet
RELIGION STUDIES P1 GR12 QP SEPT 2023 - English
16 pages
What'S New in This Version: Bugfix
No ratings yet
What'S New in This Version: Bugfix
10 pages
Striving For Inner Peace Najmi
No ratings yet
Striving For Inner Peace Najmi
2 pages
Comparative and Superlative 1-Páginas-1
0% (1)
Comparative and Superlative 1-Páginas-1
1 page
REGULATION 2022R (Curriculam and Syllabus) UPDATED - 20.04.2024
No ratings yet
REGULATION 2022R (Curriculam and Syllabus) UPDATED - 20.04.2024
87 pages
Greetings and Introductions Worksheet
100% (4)
Greetings and Introductions Worksheet
2 pages
Comp 2 - Computer Science Ko
No ratings yet
Comp 2 - Computer Science Ko
15 pages

Incorporating Knowledge Sources Into Statistical Speech Recognition

Uploaded by

Incorporating Knowledge Sources Into Statistical Speech Recognition

Uploaded by

Incorporating Knowledge Sources into Statistical Speech Recognition

Lecture Notes in Electrical Engineering

Sakriani Sakti Konstantin Markov Satoshi Nakamura Wolfgang Minker

Incorporating Knowledge Sources into Statistical Speech Recognition

ISBN 978-0-387-85829-6 DOI: 10.1007/978-0-387-85830-2

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

1.1 1.2 1.3 1.4 1.5 1.6 1.7

2.3 2.4 2.5 2.6 2.7 2.8

2.9 2.10 2.11 2.12

2.13 2.14 2.15

3.5 3.6 3.7 3.8 3.9 3.10 3.11

3.14 3.15 4.1

4.13 4.14 4.15 4.16 4.17

XVIII List of Figures

EM EPPS fLRC-HMM/BN fLRCA-HMM/BN

You might also like