Effective Training of Neural Networks for Automatic Speech Recognition

This repository contains the source code, experiments, and resources for the Master's thesis: "Effective Training of Neural Networks for Automatic Speech Recognition".

Master Thesis Details

Degree: Master of Information Technology (NMAL)
Institution: Faculty of Information Technology, Brno University of Technology (FIT BUT)
Author: Matej Horník ([email protected])
Supervisor: Ing. Alexander Polok ([email protected])
Year: 2025
Thesis Link: Official Thesis Page (VUT) (Link will be fully active after defense)

About This Project

This project systematically investigates efficient training strategies for encoder-decoder Transformer models in Automatic Speech Recognition (ASR). It explores initialization techniques, the role of adapter layers, Parameter-Efficient Fine-tuning (PEFT) methods like LoRA and DoRA, and the impact of domain-specific pre-training, primarily using the LibriSpeech and VoxPopuli datasets.

The code includes scripts for model creation, fine-tuning ( leveraging Hugging Face Transformers), evaluation, and implementations of various experimental setups discussed in the thesis.

Key Outcomes & Model

A result of this work is a Wav2Vec2-BART (base) model fine-tuned on English VoxPopuli, achieving a Word Error Rate (WER) of 8.85% on the test set.

You can find the model, along with usage instructions and a detailed model card, on the Hugging Face Hub: BUT-FIT/wav2vec2-base_bart-base_voxpopuli-en

Setup and Installation

This project uses Poetry for dependency management and packaging.

Clone the repository:

git clone https://github.com/hornikmatej/thesis_mit.git
cd thesis_mit

Install Poetry: If you don't have Poetry installed, follow the instructions on the official Poetry website.
Install dependencies: This command will create a virtual environment and install all the necessary packages defined in pyproject.toml and poetry.lock.
```
poetry install
```
Activate the virtual environment:
```
poetry shell
```
You are now in the project's virtual environment with all dependencies available.

Running Experiments

The repository is structured to facilitate the reproduction of experiments:

The main training script for sequence-to-sequence ASR models is run_speech_recognition_seq2seq.py.
Specific experiment configurations and launch commands are organized within shell scripts, primarily in the run_scripts/ directory (e.g., run_scripts/voxpopuli_best.sh).
The src/ directory contains custom modules for model creation, specialized trainers, data handling, etc.
Ensure you have the necessary datasets downloaded or accessible (e.g., via Hugging Face Datasets caching). Preprocessing scripts or arguments might be needed as detailed in the thesis or individual run scripts.

Please refer to the thesis document and the comments within the scripts for detailed instructions on running specific experiments.

Citation

If you use code or findings from this thesis in your research, please consider citing:

@mastersthesis{Hornik2025EffectiveTraining,
  author       = {Horník, Matej},
  title        = {Effective Training of Neural Networks for Automatic Speech Recognition},
  school       = {Brno University of Technology, Faculty of Information Technology},
  year         = {2025},
  supervisor   = {Polok, Alexander},
  type         = {Master's Thesis},
  note         = {Online. Available at: \url{https://www.vut.cz/en/students/final-thesis/detail/164401} and code at \url{https://github.com/hornikmatej/thesis_mit}}
}

✌️

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
bart		bart
ctc		ctc
graphs		graphs
run_scripts		run_scripts
scripts		scripts
src		src
thesis		thesis
transducer		transducer
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
config.py		config.py
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml
run_libri.sh		run_libri.sh
run_speech_recognition_seq2seq.py		run_speech_recognition_seq2seq.py
run_voxpopuli.sh		run_voxpopuli.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Effective Training of Neural Networks for Automatic Speech Recognition

Master Thesis Details

About This Project

Key Outcomes & Model

Setup and Installation

Running Experiments

Citation

About

Uh oh!

Releases

Packages

Languages

hornikmatej/thesis_mit

Folders and files

Latest commit

History

Repository files navigation

Effective Training of Neural Networks for Automatic Speech Recognition

Master Thesis Details

About This Project

Key Outcomes & Model

Setup and Installation

Running Experiments

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages