Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

hornikmatej/thesis_mit

Repository files navigation

Effective Training of Neural Networks for Automatic Speech Recognition

License: MIT Python 3.10+ Poetry Hugging Face Transformers PyTorch

This repository contains the source code, experiments, and resources for the Master's thesis: "Effective Training of Neural Networks for Automatic Speech Recognition".


Master Thesis Details


About This Project

This project systematically investigates efficient training strategies for encoder-decoder Transformer models in Automatic Speech Recognition (ASR). It explores initialization techniques, the role of adapter layers, Parameter-Efficient Fine-tuning (PEFT) methods like LoRA and DoRA, and the impact of domain-specific pre-training, primarily using the LibriSpeech and VoxPopuli datasets.

The code includes scripts for model creation, fine-tuning ( leveraging Hugging Face Transformers), evaluation, and implementations of various experimental setups discussed in the thesis.


Key Outcomes & Model

A result of this work is a Wav2Vec2-BART (base) model fine-tuned on English VoxPopuli, achieving a Word Error Rate (WER) of 8.85% on the test set.

Hugging Face Model

You can find the model, along with usage instructions and a detailed model card, on the Hugging Face Hub: BUT-FIT/wav2vec2-base_bart-base_voxpopuli-en


Setup and Installation

This project uses Poetry for dependency management and packaging.

  1. Clone the repository:

    git clone https://github.com/hornikmatej/thesis_mit.git
    cd thesis_mit
  2. Install Poetry: If you don't have Poetry installed, follow the instructions on the official Poetry website.

  3. Install dependencies: This command will create a virtual environment and install all the necessary packages defined in pyproject.toml and poetry.lock.

    poetry install
  4. Activate the virtual environment:

    poetry shell

    You are now in the project's virtual environment with all dependencies available.


Running Experiments

The repository is structured to facilitate the reproduction of experiments:

  • The main training script for sequence-to-sequence ASR models is run_speech_recognition_seq2seq.py.
  • Specific experiment configurations and launch commands are organized within shell scripts, primarily in the run_scripts/ directory (e.g., run_scripts/voxpopuli_best.sh).
  • The src/ directory contains custom modules for model creation, specialized trainers, data handling, etc.
  • Ensure you have the necessary datasets downloaded or accessible (e.g., via Hugging Face Datasets caching). Preprocessing scripts or arguments might be needed as detailed in the thesis or individual run scripts.

Please refer to the thesis document and the comments within the scripts for detailed instructions on running specific experiments.


Citation

If you use code or findings from this thesis in your research, please consider citing:

CITE

@mastersthesis{Hornik2025EffectiveTraining,
  author       = {Horník, Matej},
  title        = {Effective Training of Neural Networks for Automatic Speech Recognition},
  school       = {Brno University of Technology, Faculty of Information Technology},
  year         = {2025},
  supervisor   = {Polok, Alexander},
  type         = {Master's Thesis},
  note         = {Online. Available at: \url{https://www.vut.cz/en/students/final-thesis/detail/164401} and code at \url{https://github.com/hornikmatej/thesis_mit}}
}

✌️

About

Repository for Master Thesis at FIT BUT.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published