Thanks to visit codestin.com
Credit goes to github.com

Skip to content

joenaess/swe-trans-dia

Repository files navigation

Swedish Transcription and Diarization with WhisperX FastAPI API

This repository contains a FastAPI application that provides speech-to-text transcription for Swedish, alignment, and speaker diarization using the whisperx library. Models for speech-to-text (Whisper) and forced alignment are finetuned by Kungliga Biblioteket, KB-Lab. See https://www.kb.se/samverkan-och-utveckling/nytt-fran-kb/nyheter-samverkan-och-utveckling/2025-02-20-valtranad-ai-modell-forvandlar-tal-till-text.html for more details.] It is all tested on my laptop with a RTX4090 GPU. CI will enforce linting with ruff ruff check .

Features

  • Transcription: Transcribes audio files using the KBLab/kb-whisper-large Whisper model (or smaller if desired).
  • Alignment: Aligns the transcribed text with the audio to provide word-level timestamps.
  • Speaker Diarization: Identifies different speakers in the audio and labels the transcribed segments accordingly.

Requirements

  • NVIDIA GPU with CUDA drivers (for optimal performance)
  • Docker
  • Docker Compose

Installation

  1. Clone the repository:

    git clone [https://github.com/joenaess/swe-trans-dia.git](https://github.com/joenaess/swe-trans-dia.git)
    
  2. Create a .env file in the root directory and add your Hugging Face token:

    HUGGINGFACE_TOKEN=your_huggingface_token
    
  3. Download models for the container, token should have been approved for pyannote https://huggingface.co/pyannote/speaker-diarization-3.1:

    python model_dl.py
    
  4. Build the Docker image:

    docker-compose build --no-cache
    
  5. Start the FastAPI server:

    docker-compose up -d
    
  6. Run tests:

    pytest tests/
    
  7. Send a POST request to the /transcribe/ endpoint with your audio file:

    curl -X POST -F "[email protected]" "http://localhost:8000/transcribe/?min_speakers=2&max_speakers=2"
    

API Documentation

The API documentation is automatically generated by FastAPI and can be accessed at:

http://localhost:8000/docs

About

Speech to text transcription particularly for Swedish and speaker diarization

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published