0% found this document useful (0 votes)

46 views43 pages

ASR Course for Informatics Students

Uploaded by

Kapil Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views43 pages

ASR Course for Informatics Students

Uploaded by

Kapil Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Automatic Speech Recognition: Introduction

Peter Bell

Automatic Speech Recognition— ASR Lecture 1

11 January 2020

ASR Lecture 1 Automatic Speech Recognition: Introduction 1

Automatic Speech Recognition — ASR
Course details
Lectures: About 18 lectures, delivered live on Teams for now
Labs: Weekly lab sessions – using Python, OpenFst
(openfst.org) and later Kaldi (kaldi-asr.org)
Lab sessions will start in Week 3 – exact format TBA.
Assessment:
First five lab sessions worth 10%
Coursework, building on the lab sessions, worth 40%
Open book exam in April or May worth 50%
People:
Course organiser: Peter Bell
Guest lecturers: Hiroshi Shimodaira and Yumnah Mohammied
TA: Andrea Carmantini
Demonstrators: Chau Luu and Electra Wallington
http://www.inf.ed.ac.uk/teaching/courses/asr/

ASR Lecture 1 Automatic Speech Recognition: Introduction 2

Your background
If you have taken:
Speech Processing and either of (MLPR or MLP)
Perfect!
either of (MLPR or MLP) but not Speech Processing
(probably you are from Informatics)
You’ll require some speech background:
A couple of the lectures will cover material that was in Speech
Processing
Some additional background study (including material from
Speech Processing)
Speech Processing but neither of (MLPR or MLP)
(probably you are from SLP)
You’ll require some machine learning background (especially
neural networks)
A couple of introductory lectures on neural networks provided
for SLP students
Some additional background study

ASR Lecture 1 Automatic Speech Recognition: Introduction 3

Labs

Series of weekly labs using Python, OpenFst and Kaldi

They count towards 10% of the course credit
Labs start week 3 – exact arrangements TBA
You will need to work in pairs
Labs 1-5 will give you hands-on experience of using HMM
algorithms to build your own ASR system
These labs are an important pre-requisite for the coursework –
take advantage of the demonstrator support!
Later optional labs will introduce you to Kaldi recipes for
training acoustic models – useful if you will be doing an
ASR-related research project

ASR Lecture 1 Automatic Speech Recognition: Introduction 4

What is speech recognition?

ASR Lecture 1 Automatic Speech Recognition: Introduction 5

What is speech recognition?

ASR Lecture 1 Automatic Speech Recognition: Introduction 5

What is speech recognition?

Speech-to-text transcription
Transform recorded audio into a sequence of words
Just the words, no meaning.... But do need to deal with
acoustic ambiguity: “Recognise speech?” or “Wreck a nice
beach?”
Speaker diarization: Who spoke when?
Speech recognition: what did they say?
Paralinguistic aspects: how did they say it? (timing,
intonation, voice quality)
Speech understanding: what does it mean?

ASR Lecture 1 Automatic Speech Recognition: Introduction 6

Why is
speech recognition
difficult?

ASR Lecture 1 Automatic Speech Recognition: Introduction 7

From a linguistic perspective
Many sources of variation
Speaker Tuned for a particular speaker, or
speaker-independent? Adaptation to speaker
characteristics

ASR Lecture 1 Automatic Speech Recognition: Introduction 8

From a linguistic perspective
Many sources of variation
Speaker Tuned for a particular speaker, or
speaker-independent? Adaptation to speaker
characteristics
Environment Noise, competing speakers, channel conditions
(microphone, phone line, room acoustics)
Style Continuously spoken or isolated? Planned monologue
or spontaneous conversation?
Vocabulary Machine-directed commands, scientific language,
colloquial expressions
Accent/dialect Recognise the speech of all speakers who speak a
particular language
Other paralinguistics Emotional state, social class, . . .

ASR Lecture 1 Automatic Speech Recognition: Introduction 8

As a classification problem: very high dimensional output

space

ASR Lecture 1 Automatic Speech Recognition: Introduction 9

From a machine learning perspective

As a classification problem: very high dimensional output

space
As a sequence-to-sequence problem: very long input sequence
(although limited re-ordering between acoustic and word
sequences)

ASR Lecture 1 Automatic Speech Recognition: Introduction 9

From a machine learning perspective

As a classification problem: very high dimensional output

ASR Lecture 1 Automatic Speech Recognition: Introduction 9

From a machine learning perspective

As a classification problem: very high dimensional output

space
As a sequence-to-sequence problem: very long input sequence
(although limited re-ordering between acoustic and word
sequences)
Data is often noisy, with many “nuisance” factors of variation
in the data
Very limited quantities of training data available (in terms of
words) compared to text-based NLP
Manual speech transcription is very expensive (10x real time)

ASR Lecture 1 Automatic Speech Recognition: Introduction 9

From a machine learning perspective

As a classification problem: very high dimensional output

ASR Lecture 1 Automatic Speech Recognition: Introduction 9

The speech recognition problem

We generally represent recorded speech as a sequence of

acoustic feature vectors (observations), X and the output
word sequence as W

ASR Lecture 1 Automatic Speech Recognition: Introduction 10

The speech recognition problem

We generally represent recorded speech as a sequence of

acoustic feature vectors (observations), X and the output
word sequence as W
At recognition time, our aim is to find the most likely W,
given X

ASR Lecture 1 Automatic Speech Recognition: Introduction 10

The speech recognition problem

We generally represent recorded speech as a sequence of

acoustic feature vectors (observations), X and the output
word sequence as W
At recognition time, our aim is to find the most likely W,
given X
To achieve this, statistical models are trained using a corpus
of labelled training utterances (Xn , Wn )

ASR Lecture 1 Automatic Speech Recognition: Introduction 10

Representing recorded speech (X)

Represent a recorded utterance as a sequence of feature vectors

Reading: Jurafsky & Martin section 9.3

ASR Lecture 1 Automatic Speech Recognition: Introduction 11
Labelling speech (W)

Labels may be at different levels: words, phones, etc.

Labels may be time-aligned – i.e. the start and end times of an
acoustic segment corresponding to a label are known

Reading: Jurafsky & Martin chapter 7 (especially sections 7.4, 7.5)

ASR Lecture 1 Automatic Speech Recognition: Introduction 12

Two key challenges

In training the model:

Aligning the sequences Xn and Wn for each training utterance

ASR Lecture 1 Automatic Speech Recognition: Introduction 13

Two key challenges

In training the model:

Aligning the sequences Xn and Wn for each training utterance
w1 w2

NO RIGHT

x1 x2 x3 x4 ...

ASR Lecture 1 Automatic Speech Recognition: Introduction 13

Two key challenges

In training the model:

Aligning the sequences Xn and Wn for each training utterance
w1 w2

NO RIGHT

x1 x2 x3 x4 ...

ASR Lecture 1 Automatic Speech Recognition: Introduction 13

Two key challenges

In training the model:

Aligning the sequences Xn and Wn for each training utterance
p1 p2 p3 p4 p5

n oh r ai t

x1 x2 x3 x4 ...

ASR Lecture 1 Automatic Speech Recognition: Introduction 13

Two key challenges

In training the model:

Aligning the sequences Xn and Wn for each training utterance
g1 g2 g3 g4 g5 g6 g6

n o r i g h t

x1 x2 x3 x4 ...

ASR Lecture 1 Automatic Speech Recognition: Introduction 13

Two key challenges

In training the model:

Aligning the sequences Xn and Wn for each training utterance
g1 g2 g3 g4 g5 g6 g6

n o r i g h t

x1 x2 x3 x4 ...

In performing recognition:
Searching over all possible output sequences W
to find the most likely one

ASR Lecture 1 Automatic Speech Recognition: Introduction 13

Two key challenges

In training the model:

Aligning the sequences Xn and Wn for each training utterance
g1 g2 g3 g4 g5 g6 g6

n o r i g h t

x1 x2 x3 x4 ...

In performing recognition:
Searching over all possible output sequences W
to find the most likely one

The hidden Markov model (HMM) provides a good solution to

both problems

ASR Lecture 1 Automatic Speech Recognition: Introduction 13

The Hidden Markov Model

x1 x2 x3 x4 ...

A simple but powerful model for mapping a sequence of

continuous observations to a sequence of discrete outputs
It is a generative model for the observation sequence
Algorithms for training (forward-backward) and
recognition-time decoding (Viterbi)

ASR Lecture 1 Automatic Speech Recognition: Introduction 14

The Hidden Markov Model

x1 x2 x3 x4 ...

A simple but powerful model for mapping a sequence of

continuous observations to a sequence of discrete outputs
It is a generative model for the observation sequence
Algorithms for training (forward-backward) and
recognition-time decoding (Viterbi)
Later in the course we will also look at newer all-neural,
fully-differentiable “end-to-end” models
ASR Lecture 1 Automatic Speech Recognition: Introduction 14
Hierarchical modelling of speech

Generative "No right" Utterance W

Model
NO RIGHT Word

n oh r ai t Subword

HMM

Acoustics X

ASR Lecture 1 Automatic Speech Recognition: Introduction 15

“Fundamental Equation of Statistical Speech Recognition”

If X is the sequence of acoustic feature vectors (observations) and

W denotes a word sequence, the most likely word sequence W∗ is
given by
W∗ = arg max P(W | X)
W

ASR Lecture 1 Automatic Speech Recognition: Introduction 16

“Fundamental Equation of Statistical Speech Recognition”

If X is the sequence of acoustic feature vectors (observations) and

W denotes a word sequence, the most likely word sequence W∗ is
given by
W∗ = arg max P(W | X)
W

Applying Bayes’ Theorem:

p(X | W)P(W)
P(W | X) =
p(X)
∝ p(X | W)P(W)
W∗ = arg max p(X | W) P(W)
W | {z } | {z }
Acoustic Language
model model

ASR Lecture 1 Automatic Speech Recognition: Introduction 16

Speech Recognition Components

W∗ = arg max p(X | W)P(W)

Use an acoustic model, language model, and lexicon to obtain the

most probable word sequence W∗ given the observed acoustics X
Recorded Speech X Decoded Text W*
(Transcription)

Signal
Analysis p(X | W)
Acoustic
Model
Search
Space
Training P(W)
Language W
Data
Model

ASR Lecture 1 Automatic Speech Recognition: Introduction 17

Phones and Phonemes

Phonemes
abstract unit defined by linguists based on contrastive role in
word meanings (eg “cat” vs “bat”)
40–50 phonemes in English
Phones
speech sounds defined by the acoustics
many allophones of the same phoneme (eg /p/ in “pit” and
“spit”)
limitless in number
Phones are usually used in speech recognition – but no
conclusive evidence that they are the basic units in speech
recognition
Possible alternatives: syllables, automatically derived units, ...

(Slide taken from Martin Cooke from long ago)

ASR Lecture 1 Automatic Speech Recognition: Introduction 18

Evaluation

How accurate is a speech recognizer?

String edit distance
Use dynamic programming to align the ASR output with a
reference transcription
Three type of error: insertion, deletion, substitutions
Word error rate (WER) sums the three types of error. If there
are N words in the reference transcript, and the ASR output
has S substitutions, D deletions and I insertions, then:
S +D +I
WER = 100 · % Accuracy = 100 − WER%
N
Speech recognition evaluations: common training and
development data, release of new test sets on which different
systems may be evaluated using word error rate

ASR Lecture 1 Automatic Speech Recognition: Introduction 19

Next Lecture

Recorded Speech Decoded Text

(Transcription)

Signal
Analysis
Acoustic
Model
Search
Space
Training Language
Data Model

ASR Lecture 1 Automatic Speech Recognition: Introduction 20

Example: recognising TV broadcasts

ASR Lecture 1 Automatic Speech Recognition: Introduction 21

Reading

Jurafsky and Martin (2008). Speech and Language Processing

(2nd ed.): Chapter 7 (esp 7.4, 7.5) and Section 9.3.
General interest:
The Economist Technology Quarterly, “Language: Finding a
Voice”, Jan 2017.
http://www.economist.com/technology-quarterly/2017-05-
01/language
The State of Automatic Speech Recognition: Q&A with
Kaldi’s Dan Povey, Jul 2018.
https://medium.com/descript/the-state-of-automatic-
speech-recognition-q-a-with-kaldis-dan-povey-
c860aada9b85

ASR Lecture 1 Automatic Speech Recognition: Introduction 22

Approach 2 - Middleware - SAP ECC or S4HANA BTP
No ratings yet
Approach 2 - Middleware - SAP ECC or S4HANA BTP
20 pages
Pure+Moderation Brochure+General+2020+
No ratings yet
Pure+Moderation Brochure+General+2020+
20 pages
Wella Hair Color Guide
No ratings yet
Wella Hair Color Guide
14 pages
Lecture 9 - Speech Recognition
No ratings yet
Lecture 9 - Speech Recognition
65 pages
AI Speech Recognition Guide
100% (3)
AI Speech Recognition Guide
13 pages
Application of Computers in Hospital and Clinical Pharmacy
11% (9)
Application of Computers in Hospital and Clinical Pharmacy
13 pages
Lectures 1 Rabiner Speech Processing
No ratings yet
Lectures 1 Rabiner Speech Processing
77 pages
HG3052 SpeechSynthesisAndRecognition Lecture 10 Update2019-20
No ratings yet
HG3052 SpeechSynthesisAndRecognition Lecture 10 Update2019-20
49 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
35 pages
A Review On Different Approaches For Speech - Recognition System
No ratings yet
A Review On Different Approaches For Speech - Recognition System
6 pages
Speech Recognition Introduction
No ratings yet
Speech Recognition Introduction
8 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
As R Tutorial
No ratings yet
As R Tutorial
16 pages
Chapter One
No ratings yet
Chapter One
13 pages
ASRcourse MOSIG2024
No ratings yet
ASRcourse MOSIG2024
97 pages
IT Report-1
No ratings yet
IT Report-1
14 pages
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
100% (1)
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
65 pages
Comparative Analysis of Automatic Speech Recognition Techniques
No ratings yet
Comparative Analysis of Automatic Speech Recognition Techniques
8 pages
ASR Fundamentals and Techniques
No ratings yet
ASR Fundamentals and Techniques
39 pages
Lecture1 PDF
No ratings yet
Lecture1 PDF
28 pages
ASRcourse DSBA
No ratings yet
ASRcourse DSBA
100 pages
Speech Recognition & Sentiment Analysis
No ratings yet
Speech Recognition & Sentiment Analysis
23 pages
LR Speech Tts ASR Combo 2020
No ratings yet
LR Speech Tts ASR Combo 2020
11 pages
14-Speech Recognition
No ratings yet
14-Speech Recognition
11 pages
Speech Overview
No ratings yet
Speech Overview
30 pages
ASR2018
No ratings yet
ASR2018
40 pages
ASR Thesis Help for Students
100% (3)
ASR Thesis Help for Students
7 pages
Research Paper
No ratings yet
Research Paper
9 pages
Automatic Speech Recognition: A Review: Anchal Katyal, Amanpreet Kaur, Jasmeen Gill
No ratings yet
Automatic Speech Recognition: A Review: Anchal Katyal, Amanpreet Kaur, Jasmeen Gill
4 pages
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
No ratings yet
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
6 pages
Challenges in Automatic Speech Recognition
No ratings yet
Challenges in Automatic Speech Recognition
34 pages
Speech Recognition As Emerging Revolutionary Technology
No ratings yet
Speech Recognition As Emerging Revolutionary Technology
4 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
17 pages
Automatic Speech Recognition Documentation
No ratings yet
Automatic Speech Recognition Documentation
24 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
No ratings yet
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
17 pages
A Review On Automatic Speech Recognition Architect
No ratings yet
A Review On Automatic Speech Recognition Architect
13 pages
Artificial Intelligence For Speech Recognition
No ratings yet
Artificial Intelligence For Speech Recognition
13 pages
Enabling ASR For Low-Resource Languages: A Comprehensive Dataset Creation Approach
No ratings yet
Enabling ASR For Low-Resource Languages: A Comprehensive Dataset Creation Approach
13 pages
End-to-End Speech Recognition Survey
No ratings yet
End-to-End Speech Recognition Survey
27 pages
UNIT-V Automatic Speech Recognition 22.10,24
No ratings yet
UNIT-V Automatic Speech Recognition 22.10,24
15 pages
Speech Recognition Course Guide
No ratings yet
Speech Recognition Course Guide
74 pages
Automatic Speech Recognition: MD SHAKIR ALAM (2K18/CO/194)
No ratings yet
Automatic Speech Recognition: MD SHAKIR ALAM (2K18/CO/194)
2 pages
PF Chapter5
No ratings yet
PF Chapter5
70 pages
Vivek Kumar - 1613112052
No ratings yet
Vivek Kumar - 1613112052
7 pages
ASR Insights for NLP Students
No ratings yet
ASR Insights for NLP Students
22 pages
Convai Technical Overview Speech Ai Part 2 2301964
No ratings yet
Convai Technical Overview Speech Ai Part 2 2301964
11 pages
Mba-Ai Speech Technologies: Prof. Brian Mak
No ratings yet
Mba-Ai Speech Technologies: Prof. Brian Mak
56 pages
Speech Recognition
No ratings yet
Speech Recognition
10 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
Speech Recognition1
100% (1)
Speech Recognition1
39 pages
Transfer Learning For ASR To Deal With Low-Resource Data Problem
No ratings yet
Transfer Learning For ASR To Deal With Low-Resource Data Problem
8 pages
Ed 613938
No ratings yet
Ed 613938
7 pages
Speech Recognition: BY Charu Joshi
100% (2)
Speech Recognition: BY Charu Joshi
26 pages
Speech Recognition Application
No ratings yet
Speech Recognition Application
13 pages
Applsci 12 01091
No ratings yet
Applsci 12 01091
18 pages
ASR For Embedded Real Time Applications: K.Kartheek, D.V.Srihari Babu
No ratings yet
ASR For Embedded Real Time Applications: K.Kartheek, D.V.Srihari Babu
9 pages
A Report On
No ratings yet
A Report On
35 pages
Speech Recognition For Mobile Systems: BY: Pratibha Channamsetty Shruthi Sambasivan
No ratings yet
Speech Recognition For Mobile Systems: BY: Pratibha Channamsetty Shruthi Sambasivan
36 pages
Speech Recognition
No ratings yet
Speech Recognition
7 pages
2021 B.tech. 4TH Semester (Ee) Signals and Systems
No ratings yet
2021 B.tech. 4TH Semester (Ee) Signals and Systems
2 pages
Compileeers
No ratings yet
Compileeers
53 pages
02 Stacks Queues
No ratings yet
02 Stacks Queues
31 pages
Prachyam Papers Issue 03 April 23 1
No ratings yet
Prachyam Papers Issue 03 April 23 1
7 pages
MDU Toc 24266-Dec-2018
No ratings yet
MDU Toc 24266-Dec-2018
2 pages
TAC (5th) Dec2018
No ratings yet
TAC (5th) Dec2018
2 pages
Aaaa
No ratings yet
Aaaa
8 pages
Blockchain Based NFT Marketplace: Literature Survey
No ratings yet
Blockchain Based NFT Marketplace: Literature Survey
8 pages
Metaverse Land Value Dynamics
No ratings yet
Metaverse Land Value Dynamics
32 pages
Maths Formula 2
No ratings yet
Maths Formula 2
9 pages
Ce1 PDF
No ratings yet
Ce1 PDF
29 pages
Maths Formula 1
No ratings yet
Maths Formula 1
16 pages
BADAL WEATHER APP Ijariie6871
No ratings yet
BADAL WEATHER APP Ijariie6871
9 pages
Web Development: A Project On
No ratings yet
Web Development: A Project On
55 pages
Assignment 2 CS Sec#4
No ratings yet
Assignment 2 CS Sec#4
5 pages
Time Table - 1, B.Tech (Electronics and Communication Engineering, Esr /iot ), V Sem
No ratings yet
Time Table - 1, B.Tech (Electronics and Communication Engineering, Esr /iot ), V Sem
1 page
Week 11 APP Tutorial Assignment
No ratings yet
Week 11 APP Tutorial Assignment
4 pages
Collin College - Continuing Education: Course Syllabus
No ratings yet
Collin College - Continuing Education: Course Syllabus
4 pages
GAT Application Procedure
No ratings yet
GAT Application Procedure
2 pages
Welcome To Jiwaji
No ratings yet
Welcome To Jiwaji
1 page
Psychosocial Factors Influencing Students Attitude Towards Computer Based Test
No ratings yet
Psychosocial Factors Influencing Students Attitude Towards Computer Based Test
7 pages
Autos Automobile.. EDA Project by Anjali Sinha
No ratings yet
Autos Automobile.. EDA Project by Anjali Sinha
26 pages
Tutorial 2
No ratings yet
Tutorial 2
2 pages
Factorizing Polynomials
No ratings yet
Factorizing Polynomials
51 pages
Lab8 - ARM Memory
No ratings yet
Lab8 - ARM Memory
9 pages
Unit 11
No ratings yet
Unit 11
5 pages
Overview On DBS
No ratings yet
Overview On DBS
30 pages
Tourism MS
No ratings yet
Tourism MS
22 pages
Ignore Pause in LaTeX Beamer With Handout - Gordon Lesti
No ratings yet
Ignore Pause in LaTeX Beamer With Handout - Gordon Lesti
2 pages
Day - 8 - Solutions: Non-Verbal - Coding and Decoding (Logical)
No ratings yet
Day - 8 - Solutions: Non-Verbal - Coding and Decoding (Logical)
8 pages
SOFTWARE ENGINEERING March 2021
No ratings yet
SOFTWARE ENGINEERING March 2021
4 pages
Cellular Gateway Release Notes Xe 17 11 X
No ratings yet
Cellular Gateway Release Notes Xe 17 11 X
6 pages
Finding The Groove
No ratings yet
Finding The Groove
7 pages
Control Engineering Completion
No ratings yet
Control Engineering Completion
20 pages
Introduction To UX Design
No ratings yet
Introduction To UX Design
8 pages
Improved BMS A Smart Electric Vehicle Design Based On An Intelligent Battery Management System
No ratings yet
Improved BMS A Smart Electric Vehicle Design Based On An Intelligent Battery Management System
8 pages
SOFTWARE MANUAL DesignStudioReference RevT
No ratings yet
SOFTWARE MANUAL DesignStudioReference RevT
936 pages
Practical Exercises - PowerPoint
No ratings yet
Practical Exercises - PowerPoint
2 pages
To Operating: Systems
No ratings yet
To Operating: Systems
564 pages
AUTOSAR Memory Stack
No ratings yet
AUTOSAR Memory Stack
31 pages

ASR Course for Informatics Students

Uploaded by

ASR Course for Informatics Students

Uploaded by

Automatic Speech Recognition: Introduction

Automatic Speech Recognition— ASR Lecture 1

ASR Lecture 1 Automatic Speech Recognition: Introduction 1

ASR Lecture 1 Automatic Speech Recognition: Introduction 2

ASR Lecture 1 Automatic Speech Recognition: Introduction 3

Series of weekly labs using Python, OpenFst and Kaldi

ASR Lecture 1 Automatic Speech Recognition: Introduction 4

ASR Lecture 1 Automatic Speech Recognition: Introduction 5

ASR Lecture 1 Automatic Speech Recognition: Introduction 5

ASR Lecture 1 Automatic Speech Recognition: Introduction 6

ASR Lecture 1 Automatic Speech Recognition: Introduction 7

ASR Lecture 1 Automatic Speech Recognition: Introduction 8

ASR Lecture 1 Automatic Speech Recognition: Introduction 8

ASR Lecture 1 Automatic Speech Recognition: Introduction 8

ASR Lecture 1 Automatic Speech Recognition: Introduction 8

ASR Lecture 1 Automatic Speech Recognition: Introduction 8

ASR Lecture 1 Automatic Speech Recognition: Introduction 8

As a classification problem: very high dimensional output

ASR Lecture 1 Automatic Speech Recognition: Introduction 9

As a classification problem: very high dimensional output

ASR Lecture 1 Automatic Speech Recognition: Introduction 9

As a classification problem: very high dimensional output

ASR Lecture 1 Automatic Speech Recognition: Introduction 9

As a classification problem: very high dimensional output

ASR Lecture 1 Automatic Speech Recognition: Introduction 9

As a classification problem: very high dimensional output

ASR Lecture 1 Automatic Speech Recognition: Introduction 9

We generally represent recorded speech as a sequence of

ASR Lecture 1 Automatic Speech Recognition: Introduction 10

We generally represent recorded speech as a sequence of

ASR Lecture 1 Automatic Speech Recognition: Introduction 10

We generally represent recorded speech as a sequence of

ASR Lecture 1 Automatic Speech Recognition: Introduction 10

Represent a recorded utterance as a sequence of feature vectors

Reading: Jurafsky & Martin section 9.3

Labels may be at different levels: words, phones, etc.

Reading: Jurafsky & Martin chapter 7 (especially sections 7.4, 7.5)

ASR Lecture 1 Automatic Speech Recognition: Introduction 12

In training the model:

ASR Lecture 1 Automatic Speech Recognition: Introduction 13

In training the model:

ASR Lecture 1 Automatic Speech Recognition: Introduction 13

In training the model:

ASR Lecture 1 Automatic Speech Recognition: Introduction 13

In training the model:

ASR Lecture 1 Automatic Speech Recognition: Introduction 13

In training the model:

ASR Lecture 1 Automatic Speech Recognition: Introduction 13

In training the model:

ASR Lecture 1 Automatic Speech Recognition: Introduction 13

In training the model:

The hidden Markov model (HMM) provides a good solution to

ASR Lecture 1 Automatic Speech Recognition: Introduction 13

A simple but powerful model for mapping a sequence of

ASR Lecture 1 Automatic Speech Recognition: Introduction 14

A simple but powerful model for mapping a sequence of

Generative "No right" Utterance W

ASR Lecture 1 Automatic Speech Recognition: Introduction 15

If X is the sequence of acoustic feature vectors (observations) and

ASR Lecture 1 Automatic Speech Recognition: Introduction 16

If X is the sequence of acoustic feature vectors (observations) and

Applying Bayes’ Theorem:

ASR Lecture 1 Automatic Speech Recognition: Introduction 16

W∗ = arg max p(X | W)P(W)

Use an acoustic model, language model, and lexicon to obtain the

ASR Lecture 1 Automatic Speech Recognition: Introduction 17

(Slide taken from Martin Cooke from long ago)

ASR Lecture 1 Automatic Speech Recognition: Introduction 18

How accurate is a speech recognizer?

ASR Lecture 1 Automatic Speech Recognition: Introduction 19

Recorded Speech Decoded Text

ASR Lecture 1 Automatic Speech Recognition: Introduction 20

ASR Lecture 1 Automatic Speech Recognition: Introduction 21

Jurafsky and Martin (2008). Speech and Language Processing

ASR Lecture 1 Automatic Speech Recognition: Introduction 22

You might also like