0 ratings0% found this document useful (0 votes) 89 views12 pagesDiscrete Time Processing of Speech Signa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
TK7862
S65
D4y
2000
Discrete-Time Processing
of Speech Signals
John R. Deller, Jr.
Michigan State University
John H. L. Hansen
University of Colorado at Boulder
John G. Proakis
Northeastern University
IEEE Signal Processing Society, Sponsor
IEEE
The institute of Electrical and Electronics Engineers, Inc., New York
One:
INTERSCIENCE
A JOHN WILEY & SONS, INC., PUBLICATION:
New York + Chichester - Weinheim + Brisbane ~ Singapore * TorontoContents
Preface to the IEEE Edition xvii
Preface xix
Acronyms and Abbreviations rodii
1 Signal Processing Background
1 Propaedeutic 3
1.0
hl
1.2
Preamble 3
1.0.1 The Purpose of Chapter! = 3
1.0.2 Please Read This Note on Notation 4
1.0.3 For People Who Never Read Chapter I {and
Those Who Do) 5
Review of DSP ‘Concepts and Notation 6
1.1.1 “Normalized Time and Frequency” 6
Singularity Signals 9
Energy and Power Signals 9
‘Transforms and a Few Related Concepts 10
Windows and Frames 16
Discrete-Time Systems 20
Minimum, Maximum, and Mixed-Phase Signals
and Systems 24
Review of Probability and Stochastic
Processes 29
1.2.1 Probability Spaces 30
1.2.2 Random Variables 33
1.2.3 Random Processes 42
1.2.4 Vector-Valued Random Processes 52
Topics in Statistical Pattern Recognition 55
1.3.1 Distance Measures 56
1.3.2. The Euclidean Metric and “Prewhitening” of
Features 58
UabWavbh
L
i
1
VAs
11
vilviii, Contents
1.3.3 Maximum Likelihood Classification 63
1.3.4 Feature Selection and Probablistic Separability
Measures 66
13.5 Clustering Algorithms 70
1.4. Information and Entropy 73
1.4.1 Definitions 73
1.4.2. Random Sources 77
1.4.3 Entropy Concepts in Pattern Recognition 78
1.5 Phasors and Steady-State Solutions 79
1.6 Onward to Speech Processing 81
1.7 Problems 85
Appendices: Supplemental Bibliography 90
1.A Example Textbooks on Digital Signal
Processing, 90
1.B Example Textbooks on Stochastic Processes 90
1.C Example Textbooks on Statistical Pattern
Recognition 91
1.D Example Textbooks on Information Theory 91
1.E Other Resources on Speech Processing 92
LE. Textbooks 92
1.E.2 Edited Paper Collections 92
1.E.3 Journals 92
1,E.4 Conference Proceedings 93
1.F Example Textbooks on Speech and Hearing
Sciences 93
1.G Other Resources on Artificial Neural
Networks 94
1.G.1 Textbooks and Monographs 94
1.G.2 Journals 94
1.G.3 Conference Proceedings 95
ll Speech Production and Modeling
2 Fundamentals of Speech Science 99
2.0 Preamble 99
2.1. Speech Communication 100
2.2 Anatomy and Physiology of the Speech Production
System 101
2.2.1 Anatomy 10]Contents ix
2.2.2 The Role of the Vocal Tract and Some
Elementary Acoustical Analysis . 104
2.2.3 Excitation of the Speech System and the
Physiology of Voicing 110
2.3 Phonemics and Phonetics 115
2.3.1 Phonemes Versus Phones 115
2.3.2 Phonemic and,Phonetic Transcription TI6
2.3.3 Phonemic and Phonetic Classification 117
2.3.4 Prosodic Features and Coarticulation 137
2.4 Conclusions 146
2.5 Problems 146
3 Modeling Speech Production 151
3.0 Preamble 151
3.1 Acoustic Theory of Speech Production 151
3.1.1 History i51
3.1.2 Sound Propagation 156
3.1.3 Source Excitation Model 159
3.1.4 Vocal-Tract Modeling — 166
3.1.5 Models for Nasals and Fricatives 186
3.2 Discrete-Time Modeling 187
3.2.1. General Discrete-Time Speech Model 187
3.2.2 A Discrete-Time Filter Model for Speech
Production 192
3.2.3 Other Specch Models 197
3.3 Conclusions 200
3.4 Problems 201
3.A Single Lossless Tube Analysis 203
3.A.1 Open and Closed Terminations 203
3.A.2 Impedance Analysis, T-Network, and Two-Post
Network 206
3.B Two-Tube Lossless Model of the Vocal Tract 211
3.C Fast Discrete-Time Transfer Function
Calculation 217
lll Analysis Techniques
4 Short-Term Processing of Speech 225
4.1 Introduction 225x Contents
4.2 Short-Term Measures from Long-Term
Concepts 226
4.2.1 Motivation 226
4.2.2 “Frames” of Speech 227
4.2.3 Approach 1 to the Derivation of a Short-Term
Feature and its Two Computational Forms 227
4.2.4 Approach 2 to the Derivation of a Short-Term
Feature and Its Two Computational Forms 237
4.2.5 On the Role of “1/N” and Related Issues 234
4.3 Example Short-Term Features and
Applications 236
4.3.1 Short-Term Estimates of Autocorrelation 236
4.3.2 Average Magnitude Difference Function 244
4.3.3 Zero Crossing Measure 245
4.3.4 Short-Term Power and Energy Measures 246
4.3.5 Short-Term Fourier Analysis 257
4.4 Conclusions 262
4.5 Problems 263
5 Linear Prediction Analysis 266
5.0 Preamble 266
5.1 Long-Term LP Analysis by System
Identification 267
5.1.1 The All-Pole Model 267
5.1.2 Identification of the Model 270
5.2 How Good Is the LP Model? 280
5.2.1 The “Ideal” and “Almost Ideal” Cases 280
5.2.2 “Nonideal” Cases 281
5.2.3, Summary and Further Discussion 287
5.3 Short-Term LP Analysis 290
5.3.1 Autocorrelation Method 290
5.3.2 Covariance Method 292
5.3.3 Solution Methods 296
5.3.4 Gain Computation 325
5.3.5 A Distance Measure for LP Coefficients 327
5.3.6 Preemphasis of the Speech Waveform 329
5.4 Alternative Representations of the LP
Coefficients 331
5.4.1 The Line Spectrum Pair = 33]
5.4.2 Cepstral Parameters 333
5.5 Applications of LP in Speech Analysis 333
5.5.1 Pitch Estimation 333
5.5.2 Formant Estimation and Glottal Waveform
Deconvolution 336Contents xi
5.6 Conclusions 342
5.7 Problems 343
5.A Proof of Theorem 5.1 348
5.B The Orthogonality Principle 350
6 Cepstral Analysis 352
6.1 Introduction 352
6.2 “Real” Cepsirum 355
6.2.1 Long-Term Real Cepstrum 355
6.2.2 Short-Term Real Cepstrum —_ 364
6.2.3 Example Applications of the stRC to Speech
Analysis and Recognition 366
6.2.4 Other Forms and Variations on the stRC
Parameters 380
6.3. Complex Cepstrum 386
6.3.1. Long-Term Complex Cepstrum 386
6.3.2 Short-Term Complex Cepstrum 393
6.3.3 Exampie Application of the stCC to Speech
Analysis 394
6.3.4 Variations on the Complex Cepstrum. 397
6.4 A Critical Analysis of the Cepstrum and
Conclusions 397
6.5 Problems 401
IV Coding, Enhancement and Quality Assessment
7 Speech Coding and Synthesis 409
7.1 Introduction 410
7.2 Optimum Scalar and Vector Quantization 410
7.2.1 Scalar Quantization 417
7.2.2 Vector Quantization 425
7.3. Waveform Coding 434
7.3.1 Introduction 434
73.2 Time Domain Waveform Coding 435
7.3.3 Frequency Domain Waveform Coding 451
7.3.4 Vector Waveform Quantization 457
74 Vocoders 459
7.4.1 The Channel Vocoder 460
7.4.2 The Phase Vocoder 462
74.3 The Cepstral (Homomorphic) Vocoder 462xii Contents
i)
76
V7
TA
7.4.4 Formant Vocoders 469
7.4.5 Linear Predictive Coding 471
7.4.6 Vector Quantization of Model Parameters 485
Measuring the Quality of Speech Compression
Techniques 488
Conclusions 489
Problems 490
Quadrature Mirror Filters 494
8 Speech Enhancement 501
8.1
8.2
8.3
8.4
8.5
8.6
Introduction 501
Classification of Speech Enhancement
Methods 504
Short-Term Spectral Amplitude Techniques 506
8.3.1 Introduction 506
8.3.2 Spectral Subtraction 506
8.3.3 Summary of Short-Term Spectral Magnitude
Methods 516
Speech Modeling and Wiener Filtering 517
8.4.1 Introduction 517
8.4.2 Iterative Wiener Filtering 517
8.4.3 Speech Enhancement and All-Pole
Modeling = 527
8.4.4 Sequential Estimation via EM Theory 524
8.4.5 Constrained Iterative Enhancement 525
8.4.6 Further Refinements to Iterative
Enhancement 527
8.4.7 Summary of Speech Modeling and Wiener
Filtering 528
Adaptive Noise Canceling 528
8.5.1 Introduction 528
8.5.2 ANC Formalities and the LMS
Algorithm 530
8.5.3 Applications of ANC 534
8.5.4 Summary of ANC Methods $41
Systems Based on Fundamental Frequency
Tracking S4t
8.6.1 Introduction 54]
8.6.2 Single-Channel ANC 542
8.6.3 Adaptive Comb Filtering 545
8.6.4 Harmonic Selection 549
8.6.5 Summary of Systems Based on Fundamental
Frequency Tracking 551Contents xii
8.7 Performance Evaluation 552
8.7.1 Introduction 552
8.7.2 Enhancement and Perceptual Aspects of
Speech 552
8.7.3 Speech Enhancement Algorithm
Performance 554
8.8 Conclusions 556
8.9 Problems 557
8A The INTEL System 561
8.B Addressing Cross-Talk in Dual-Channel ANC 565
9 Speech Quality Assessment
9.1 Introduction 568
9.1.1 The Need for Quality Assessment 568
9.1.2 Quality Versus Intelligibility 570
9.2 Subjective Quality Measures 570
9.2.1 Intelligibility Tests 572
9.2.2 Quality Tests 575
9.3 Objective Quality Measures 580
9.3.1 Articulation Index 582
9.3.2 Signal-to-Noise Ratio 584
9.3.3 Itakura Measure 587
9.3.4 Other Measures Based on LP Analysis 588
9.3.5 Weighted-Spectral Slope Measures 589
9.3.6 Global Objective Measures 590
9.3.7 Example Applications 59]
9.4 Objective Versus Subjective Measures 593
9.5 Problems 595
V Recognition
10 The Speech Recognition Problem
10.1 Introduction 601
10.1.1 The Dream and the Reality 601
10.1.2 Discovering Our Ignorance 604
10.1.3 Circumventing Our Ignorance 605
10.2. The “Dimensions of Difficulty” 606
10.2.1 Speaker-Dependent Versus Speaker-Independent
Recognition 607
10.2.2 Vocabulary Size 607
568
601xiv Contents
10.2.3 Isolated-Word Versus Continuous-Speech
Recognition 608
10.2.4 Linguistic Constraints 6/4
10.2.5 Acoustic Ambiguity and Confusability 6/9
10.2.6 Environmental Noise 620
10.3. Related Problems and Approaches 620
10.3.1 Knowledge Engineering 620
10.3.2 Speaker Recognition and Verification 621
10.4 Conclusions 621
10.5 Problems 621
11° Dynamic Time Warping
11.1 Introduction 623
11.2 Dynamic Programming 624
11.3. Dynamic Time Warping Applied to IWR = 634
11.3.1. DTW Problem and Its Solution Using DP 634
11.3.2 DTW Search Constraints 638
11.3.3 Typical DTW Algorithm: Memory and
Computational Requirements 649
11.4 DTW Applied to CSR 651
11.4.1 Introduction 657
114.2 Level Building 652
11.4.3 The One-Stage Algorithm 660
11.4.4 A Grammar-Driven Connected-Word Recognition
System 669
11.4.5 Pruning and Beam Search 670
11.4.6. Summary of Resource Requirements for DTW
Algorithms 671
11.5 Training Issues in DTW Algorithms 672
11.6 Conclusions 674
11.7 Problems 674
12 The Hidden Markov Model
12.1 Introduction 677
12.2 Theoretical Developments 679
12.2.1 Generalities 679
12.2.2 The Discrete Observation HMM = 684
12.2.3 The Continuous Observation HMM = 705
12.2.4 Inclusion of State Duration Probabilities in the
Discrete Observation HMM = 709
12.2.5 Scaling the Forward-Backward Algorithm 7/5
623
67712.3
12.4
12.5
Contents xv
12.2.6 Training with Multiple Observation
Sequences 718
12.2.7 Alternative Optimization Criteria in the Training
of HMMs 720
12.2.8 A Distance Measure for HMMs 722
Practical Issues 723
12.3.1 Acoustic Observations 723
12.3.2 Model Structure and Size 724
12.3.3 Training with Insufficient Data 728
12.3.4 Acoustic Units Modeled by HMMs 730
First View of Recognition Systems Based on
HMMs 734
12.4.1 Introduction 734
12.4.2 IWR Without Syntax 735
12.4.3 CSR by the Connected-Word Strategy Without
Syntax 738
12.4.4 Preliminary Comments on Language Modeling
Using HMMs 740
Problems 740
13 Language Modeling 745
13.1
13.2
13.3
13.4
13.5
13.6
13.7
13.8
13.9
Introduction 745
Formal Tools for Linguistic Processing 746
13.2.1 Formal Languages 746
13.2.2 Perplexity of a Language 749
13.2.3 Bottom-Up Versus Top-Down Parsing 757
HMMs, Finite State Automata, and “Regular
Grammars 754
A “Bottom-Up” Parsing Example 759
Principles of “Top-Down” Recognizers 764
13.5.1 Focus on the Linguistic Decoder 764
13.5.2. Foens on the Acoustic Decoder 770
13.5.3 Adding Levels to the Linguistic Decoder 772
13.5.4. Training the Continuous-‘Speech Recognizer 775
Other Language Models 779
13.6.1 N-Gram Statistical Models 779
13.6.2 Other Formal Grammars 785
FWR As “CSR” —- 789
Standard Databases for Speech-Recognition
Research 790
A Survey of Language-Model-Based Systems 791xvi Contents
13,40 Conclusions 801
13.11 Problems 801
14 «The Artificial Neural Network 805
14.
14.2
14.3
14.4
14.5
14.6
Index
Introduction 805
The Artificial Neuron 808
Network Principles and Paradigms 813
14.3.1 Introduction 873
14.3.2 Layered Networks: Formalities and
Definitions 875
14.3.3 The Multilayer Perceptron 819
14,3.4 Learning Vector Quantizer 834
Applications of ANNs in Speech Recognition 837
14.4.1, Presegmented Speech Material 837
14.4.2, Recognizing Dynamic Speech 839
14.4.3, ANNs and Conventional Approaches 841
14.4.4 Language Modeling Using ANNs 845
14.4.5 Integration of ANNs into the Survey Systems of
Section 13.9 845
Conchusions 846
Problems 847
899Sample Iustration
af, Matrices A,B
@)
M, Matrices A,B,
Observation probabilities “tied” b(K11) = b(kI2) Wk.
{b)
Interpolated model
A=eA +(e) A,
A=eB,+(1-2)B,
©
FIGURE 12.15. {a) A three-state HMM trained in the conventional (e.g., F-B
algorithm) manner. (b) The “same” three-state HMM with tied states. (c) An
“interpolated” mode! derived from the models of {a) and (b).