Multi-ASR Toolkit

Multi-ASR Toolkit is a flexible and extensible speech recognition toolkit supporting multiple backend engines such as Whisper, Faster-Whisper, WhisperX, SpeechRecognition, and Vosk. It provides both a command-line interface and a web-based interface via Gradio, facilitating easy transcription of audio files using various ASR models.

Requirements

Python 3.10 or higher
Gradio requires Python 3.10 or higher.
pygame
pydub
ffmpeg: for convert mp3 to wav
PyTorch 2.1+, TensorFlow 2.6+
transformers
SpeechRecognition
Whisper
Faster Whisper
MLX Whisper
demucs
yt-dlp

Install

Python packages
```
$ pip3 install -r requirements.txt
```

ffmpeg

# Ubuntu
$ sudo apt install ffmpeg

# Mac
$ brew install ffmpeg

For Windows, you can refer to this website: ffmpeg install

Usage

Using through command line interface (CLI)

# python app.py --mode cli <wav/mp3 file> 
$ python app.py --mode cli data/test.mp3 

# python app.py --mode cli <wav/mp3 file> --backend <asr backend> --language <language> --model-size <model size>
$ python app.py --mode cli data/test.mp3 --backend faster-whisper --language en --model-size base

Using through web application (made with `Gradio`)

$ python3 app.py

❓Tips & Tricks

YT=DLP Authentication Error

Open a new incognito/private window and log in to your YouTube account.
In the same tab, open https://www.youtube.com/robots.txt, ensuring that only this tab is using the login session.
Use a browser extension (e.g., "cookies.txt" for Chrome) to export the youtube.com cookies for this session to a file named cookies.txt, and then immediately close the incognito window.

Using the manually exported cookies.txt in Python.

Place your cookies.txt in a fixed path, and then specify it in ydl_opts like this:

ydl_opts = {
'format': 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/mp4',
'outtmpl': filepath,
'verbose': True,          # 用于调试，正式可去掉
'merge_output_format': 'mp4',
'cookies': '<Path>/youtube_cookies.txt',
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
backends		backends
core		core
data		data
images		images
tabs		tabs
utils		utils
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
cli.py		cli.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-ASR Toolkit

Requirements

Install

Usage

Using through command line interface (CLI)

Using through web application (made with `Gradio`)

❓Tips & Tricks

YT=DLP Authentication Error

reference: yt-dlp/wiki/Extractors

Demucs part and usage

🙏 Reference

About

Uh oh!

Releases

Packages

Languages

License

kaka-lin/multi-asr-toolkit

Folders and files

Latest commit

History

Repository files navigation

Multi-ASR Toolkit

Requirements

Install

Usage

Using through command line interface (CLI)

Using through web application (made with Gradio)

❓Tips & Tricks

YT=DLP Authentication Error

reference: yt-dlp/wiki/Extractors

Demucs part and usage

🙏 Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Using through web application (made with `Gradio`)

Packages