Thanks to visit codestin.com
Credit goes to github.com

Skip to content

A flexible speech recognition toolkit supporting multiple backends (Whisper, Faster-Whisper, WhisperX, SpeechRecognition, Vosk) with CLI and Gradio web interface.

License

Notifications You must be signed in to change notification settings

kaka-lin/multi-asr-toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-ASR Toolkit

Multi-ASR Toolkit is a flexible and extensible speech recognition toolkit supporting multiple backend engines such as Whisper, Faster-Whisper, WhisperX, SpeechRecognition, and Vosk. It provides both a command-line interface and a web-based interface via Gradio, facilitating easy transcription of audio files using various ASR models.

Requirements

Install

  1. Python packages

    $ pip3 install -r requirements.txt
  2. ffmpeg

    # Ubuntu
    $ sudo apt install ffmpeg
    
    # Mac
    $ brew install ffmpeg

    For Windows, you can refer to this website: ffmpeg install

Usage

Using through command line interface (CLI)

# python app.py --mode cli <wav/mp3 file> 
$ python app.py --mode cli data/test.mp3 

# python app.py --mode cli <wav/mp3 file> --backend <asr backend> --language <language> --model-size <model size>
$ python app.py --mode cli data/test.mp3 --backend faster-whisper --language en --model-size base

Using through web application (made with Gradio)

$ python3 app.py

❓Tips & Tricks

YT=DLP Authentication Error

  • Open a new incognito/private window and log in to your YouTube account.

  • In the same tab, open https://www.youtube.com/robots.txt, ensuring that only this tab is using the login session.

  • Use a browser extension (e.g., "cookies.txt" for Chrome) to export the youtube.com cookies for this session to a file named cookies.txt, and then immediately close the incognito window.

  • Using the manually exported cookies.txt in Python.

    Place your cookies.txt in a fixed path, and then specify it in ydl_opts like this:

    ydl_opts = {
    'format': 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/mp4',
    'outtmpl': filepath,
    'verbose': True,          # 用于调试,正式可去掉
    'merge_output_format': 'mp4',
    'cookies': '<Path>/youtube_cookies.txt',
    }

Demucs part and usage

🙏 Reference

About

A flexible speech recognition toolkit supporting multiple backends (Whisper, Faster-Whisper, WhisperX, SpeechRecognition, Vosk) with CLI and Gradio web interface.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages