Parameter-efficient fine-tuning of OpenAI's Whisper for clinical speech recognition — targeting dysarthria (TORGO) with cross-domain evaluation on aphasia (AphasiaBank).
ASR models trained on healthy speech fail catastrophically on clinical populations. Whisper Large-v3 achieves ~5% WER on LibriSpeech but 45–74% WER on dysarthric speech. This project closes that gap using LoRA adaptation with minimal compute.
git clone https://github.com/DavidBarbera/whisper-clinical-speech.git
cd whisper-clinical-speech
# Full pipeline (RunPod / cloud GPU):
./run_all.sh # prepare → train → evaluate → push
# Or step by step:
./run_all.sh --prepare-only # download and prepare TORGO
./run_all.sh --debug # quick 2-epoch sanity check
./run_all.sh --skip-push # full training without HF upload
./run_all.sh --push-only # push adapter to HuggingFace
./run_all.sh --eval-only # evaluate existing adapterwhisper-clinical-speech/
├── configs/
│ └── lora_config.yaml # All hyperparameters
├── scripts/
│ ├── prepare_torgo.py # Download & split TORGO dataset
│ ├── train.py # LoRA fine-tuning with HF Trainer
│ ├── evaluate.py # WER + SemScore evaluation
│ └── push_to_hub.py # Upload adapter + model card to HF
├── src/
│ └── logger.py # Shared logging & run management
├── outputs/ # ALL generated outputs
│ ├── logs/ # One .log file per run
│ ├── checkpoints/ # One directory per training run
│ │ └── latest → ... # Symlink to most recent training
│ ├── results/ # One directory per evaluation run
│ │ ├── latest_baseline_test → ..
│ │ └── latest_lora_test → ...
│ └── hub/ # Staging area for HF push
├── run_all.sh # One-command pipeline
├── data/ # Downloaded datasets (gitignored)
├── .vscode/ # Shared IDE configuration
├── .gitignore
├── requirements.txt
├── LICENSE
└── README.md
from transformers import WhisperForConditionalGeneration, WhisperProcessor
from peft import PeftModel
import librosa
processor = WhisperProcessor.from_pretrained("openai/whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model = PeftModel.from_pretrained(model, "dbarbera/whisper-small-torgo-dysarthria-lora")
audio, sr = librosa.load("path/to/audio.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
generated_ids = model.generate(**inputs)
print(processor.batch_decode(generated_ids, skip_special_tokens=True)[0])David Barbera — PhD in Cognitive Neuroscience (UCL), specializing in speech recognition for clinical populations. Built a CE-marked Class II medical device for aphasia rehabilitation.
🌐 davidbarbera.github.io · 🎓 Google Scholar · 💻 GitHub · 🤗 HuggingFace
MIT. Model weights released under same license as OpenAI Whisper (MIT).