Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DavidBarbera/whisper-clinical-speech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🗣️ Whisper-Clinical: Fine-Tuning Whisper for Dysarthric & Aphasic Speech

HuggingFace Model License: MIT Python 3.10+

Parameter-efficient fine-tuning of OpenAI's Whisper for clinical speech recognition — targeting dysarthria (TORGO) with cross-domain evaluation on aphasia (AphasiaBank).

ASR models trained on healthy speech fail catastrophically on clinical populations. Whisper Large-v3 achieves ~5% WER on LibriSpeech but 45–74% WER on dysarthric speech. This project closes that gap using LoRA adaptation with minimal compute.

Quick Start

git clone https://github.com/DavidBarbera/whisper-clinical-speech.git
cd whisper-clinical-speech

# Full pipeline (RunPod / cloud GPU):
./run_all.sh              # prepare → train → evaluate → push

# Or step by step:
./run_all.sh --prepare-only   # download and prepare TORGO
./run_all.sh --debug          # quick 2-epoch sanity check
./run_all.sh --skip-push      # full training without HF upload
./run_all.sh --push-only      # push adapter to HuggingFace
./run_all.sh --eval-only      # evaluate existing adapter

Project Structure

whisper-clinical-speech/
├── configs/
│   └── lora_config.yaml              # All hyperparameters
├── scripts/
│   ├── prepare_torgo.py              # Download & split TORGO dataset
│   ├── train.py                      # LoRA fine-tuning with HF Trainer
│   ├── evaluate.py                   # WER + SemScore evaluation
│   └── push_to_hub.py               # Upload adapter + model card to HF
├── src/
│   └── logger.py                     # Shared logging & run management
├── outputs/                          # ALL generated outputs
│   ├── logs/                         # One .log file per run
│   ├── checkpoints/                  # One directory per training run
│   │   └── latest → ...             # Symlink to most recent training
│   ├── results/                      # One directory per evaluation run
│   │   ├── latest_baseline_test → ..
│   │   └── latest_lora_test → ...
│   └── hub/                          # Staging area for HF push
├── run_all.sh                        # One-command pipeline
├── data/                             # Downloaded datasets (gitignored)
├── .vscode/                          # Shared IDE configuration
├── .gitignore
├── requirements.txt
├── LICENSE
└── README.md

Use the Model

from transformers import WhisperForConditionalGeneration, WhisperProcessor
from peft import PeftModel
import librosa

processor = WhisperProcessor.from_pretrained("openai/whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model = PeftModel.from_pretrained(model, "dbarbera/whisper-small-torgo-dysarthria-lora")

audio, sr = librosa.load("path/to/audio.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
generated_ids = model.generate(**inputs)
print(processor.batch_decode(generated_ids, skip_special_tokens=True)[0])

About the Author

David Barbera — PhD in Cognitive Neuroscience (UCL), specializing in speech recognition for clinical populations. Built a CE-marked Class II medical device for aphasia rehabilitation.

🌐 davidbarbera.github.io · 🎓 Google Scholar · 💻 GitHub · 🤗 HuggingFace

License

MIT. Model weights released under same license as OpenAI Whisper (MIT).

Releases

No releases published

Packages

 
 
 

Contributors