Swedish Transcription and Diarization with WhisperX FastAPI API

This repository contains a FastAPI application that provides speech-to-text transcription for Swedish, alignment, and speaker diarization using the whisperx library. Models for speech-to-text (Whisper) and forced alignment are finetuned by Kungliga Biblioteket, KB-Lab. See https://www.kb.se/samverkan-och-utveckling/nytt-fran-kb/nyheter-samverkan-och-utveckling/2025-02-20-valtranad-ai-modell-forvandlar-tal-till-text.html for more details.] It is all tested on my laptop with a RTX4090 GPU. CI will enforce linting with ruff ruff check .

Features

Transcription: Transcribes audio files using the KBLab/kb-whisper-large Whisper model (or smaller if desired).
Alignment: Aligns the transcribed text with the audio to provide word-level timestamps.
Speaker Diarization: Identifies different speakers in the audio and labels the transcribed segments accordingly.

Requirements

NVIDIA GPU with CUDA drivers (for optimal performance)
Docker
Docker Compose

Installation

Clone the repository:

git clone [https://github.com/joenaess/swe-trans-dia.git](https://github.com/joenaess/swe-trans-dia.git)

Create a .env file in the root directory and add your Hugging Face token:
```
HUGGINGFACE_TOKEN=your_huggingface_token
```
Download models for the container, token should have been approved for pyannote https://huggingface.co/pyannote/speaker-diarization-3.1:
```
python model_dl.py
```
Build the Docker image:
```
docker-compose build --no-cache
```
Start the FastAPI server:
```
docker-compose up -d
```
Run tests:
```
pytest tests/
```

Send a POST request to the /transcribe/ endpoint with your audio file:

curl -X POST -F "[email protected]" "http://localhost:8000/transcribe/?min_speakers=2&max_speakers=2"

API Documentation

The API documentation is automatically generated by FastAPI and can be accessed at:

http://localhost:8000/docs

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
__pycache__		__pycache__
app		app
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
model_dl.py		model_dl.py
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Swedish Transcription and Diarization with WhisperX FastAPI API

Features

Requirements

Installation

API Documentation

About

Uh oh!

Releases

Packages

Languages

License

joenaess/swe-trans-dia

Folders and files

Latest commit

History

Repository files navigation

Swedish Transcription and Diarization with WhisperX FastAPI API

Features

Requirements

Installation

API Documentation

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages