🗣️ Whisper-Clinical: Fine-Tuning Whisper for Dysarthric & Aphasic Speech

Parameter-efficient fine-tuning of OpenAI's Whisper for clinical speech recognition — targeting dysarthria (TORGO) with cross-domain evaluation on aphasia (AphasiaBank).

ASR models trained on healthy speech fail catastrophically on clinical populations. Whisper Large-v3 achieves ~5% WER on LibriSpeech but 45–74% WER on dysarthric speech. This project closes that gap using LoRA adaptation with minimal compute.

Quick Start

git clone https://github.com/DavidBarbera/whisper-clinical-speech.git
cd whisper-clinical-speech

# Full pipeline (RunPod / cloud GPU):
./run_all.sh              # prepare → train → evaluate → push

# Or step by step:
./run_all.sh --prepare-only   # download and prepare TORGO
./run_all.sh --debug          # quick 2-epoch sanity check
./run_all.sh --skip-push      # full training without HF upload
./run_all.sh --push-only      # push adapter to HuggingFace
./run_all.sh --eval-only      # evaluate existing adapter

Project Structure

whisper-clinical-speech/
├── configs/
│   └── lora_config.yaml              # All hyperparameters
├── scripts/
│   ├── prepare_torgo.py              # Download & split TORGO dataset
│   ├── train.py                      # LoRA fine-tuning with HF Trainer
│   ├── evaluate.py                   # WER + SemScore evaluation
│   └── push_to_hub.py               # Upload adapter + model card to HF
├── src/
│   └── logger.py                     # Shared logging & run management
├── outputs/                          # ALL generated outputs
│   ├── logs/                         # One .log file per run
│   ├── checkpoints/                  # One directory per training run
│   │   └── latest → ...             # Symlink to most recent training
│   ├── results/                      # One directory per evaluation run
│   │   ├── latest_baseline_test → ..
│   │   └── latest_lora_test → ...
│   └── hub/                          # Staging area for HF push
├── run_all.sh                        # One-command pipeline
├── data/                             # Downloaded datasets (gitignored)
├── .vscode/                          # Shared IDE configuration
├── .gitignore
├── requirements.txt
├── LICENSE
└── README.md

Use the Model

from transformers import WhisperForConditionalGeneration, WhisperProcessor
from peft import PeftModel
import librosa

processor = WhisperProcessor.from_pretrained("openai/whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model = PeftModel.from_pretrained(model, "dbarbera/whisper-small-torgo-dysarthria-lora")

audio, sr = librosa.load("path/to/audio.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
generated_ids = model.generate(**inputs)
print(processor.batch_decode(generated_ids, skip_special_tokens=True)[0])

About the Author

David Barbera — PhD in Cognitive Neuroscience (UCL), specializing in speech recognition for clinical populations. Built a CE-marked Class II medical device for aphasia rehabilitation.

🌐 davidbarbera.github.io · 🎓 Google Scholar · 💻 GitHub · 🤗 HuggingFace

License

MIT. Model weights released under same license as OpenAI Whisper (MIT).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🗣️ Whisper-Clinical: Fine-Tuning Whisper for Dysarthric & Aphasic Speech

Quick Start

Project Structure

Use the Model

About the Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.vscode		.vscode
configs		configs
outputs		outputs
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_all.sh		run_all.sh

Folders and files

Latest commit

History

Repository files navigation

🗣️ Whisper-Clinical: Fine-Tuning Whisper for Dysarthric & Aphasic Speech

Quick Start

Project Structure

Use the Model

About the Author

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages