DD2119 Speech and Speaker Recognition course

Author: Anna Sánchez Espunyes, Kenza Bouzid

Solutions for labs for SSR course @ KTH. Each lab contains implementation of speech recognition algorithms as well as notebooks with experiments.

Lab1 - Feature Extraction

The objective is to experiment with different features commonly used for speech analysis and recognition.

Tasks:

compute Mel Filterbank and MFCC features step-by-step
examine features
evaluate correlation between feature
compare utterances with Dynamic Time Warping
illustrate the discriminative power of the features with respect to words
perform hierarchical clustering of utterances
train and analyze a Gaussian Mixture Model of the feature vectors.

Lab2 - Hidden Markov Models with Gaussian Emissions

Objectives:

implement the algorithms for the evaluation and decoding of Hidden Markov Models (HMMs),
use your implementation to perform isolated word recognition
implement the algorithms for training Gaussian Hidden Markov Models (G-HMMs),
explain the meaning of the forward, backward and state posterior probabilities evaluated on speech utterances,

Tasks:

The overall task is to implement and test methods for isolated word recognition:

combine phonetic HMMs into word HMMs using a lexicon
implement the forward-backward algorithm,
use it compute the log likelihood of spoken utterances given a Gaussian HMM
perform isolated word recognition
implement the Viterbi algorithm, and use it to compute Viterbi path and likelihood
compare and comment Viterbi and Forward likelihoods
implement the Baum-Welch algorithm to update the parameters of the emission probability distributions

Lab3 - Phoneme Recognition with Deep Neural Networks

Objectives:

create phonetic annotations of speech recordings using predefined phonetic models
use software libraries1 to define and train Deep Neural Networks (DNNs) for phoneme recognition
explain the difference between HMM and DNN training
compare which speech features are more suitable for each model and explain why

Tasks:

using predefined Gaussian-emission HMM phonetic models, create time aligned phonetic transcriptions of the TIDIGITS database,
define appropriate DNN models for phoneme recognition using Keras,
train and evaluate the DNN models on a frame-by-frame recognition score,
repeat the training by varying model parameters and input features

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.idea		.idea
.vscode		.vscode
lab1		lab1
lab2		lab2
lab3		lab3
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DD2119 Speech and Speaker Recognition course

Lab1 - Feature Extraction

Tasks:

Lab2 - Hidden Markov Models with Gaussian Emissions

Objectives:

Tasks:

Lab3 - Phoneme Recognition with Deep Neural Networks

Objectives:

Tasks:

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

kenza-bouzid/ssr_labs

Folders and files

Latest commit

History

Repository files navigation

DD2119 Speech and Speaker Recognition course

Lab1 - Feature Extraction

Tasks:

Lab2 - Hidden Markov Models with Gaussian Emissions

Objectives:

Tasks:

Lab3 - Phoneme Recognition with Deep Neural Networks

Objectives:

Tasks:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages