VoiceOver AI

Overview

VoiceOver AI is an advanced speech-to-text and text-to-speech pipeline designed to process video files by extracting audio, transcribing speech, and translating the text into different languages. This project utilizes OpenAI's Whisper model for automatic speech recognition (ASR), along with various NLP tools for transliteration and translation.

Features

Video Upload & Processing: Users can upload video files, extract audio, and process it efficiently.
Automatic Speech Recognition (ASR): Uses OpenAI Whisper for high-accuracy transcription.
Text Processing & Translation: Supports multiple languages using googletrans and indic-transliteration.
Customizable Resolutions: Option to resize videos to 720p for better processing.
Interactive UI: Utilizes ipywidgets for an intuitive and user-friendly experience.

Installation

Prerequisites

Python 3.8+
FFmpeg (for video processing)

Steps

Clone the repository:

git clone https://github.com/yourusername/VoiceOverAI.git
cd VoiceOverAI

Install dependencies:
```
pip install -r requirements.txt
```

Usage

Upload a Video File
- Run the Jupyter Notebook.
- Upload a video file via the UI.
- Choose whether to resize the video to 720p.
Extract and Transcribe Audio
- The system extracts audio from the uploaded video.
- OpenAI Whisper is used for speech-to-text conversion.
Translation and Voice Generation
- Translate the extracted text into a target language.
- Convert the translated text into speech.

Dependencies

The project requires the following libraries, which are included in requirements.txt:

TTS
numpy==1.24.0
scipy
git+https://github.com/openai/whisper.git
indic-transliteration
jiwer
googletrans==4.0.0-rc1
tensorflow==2.12.0
pickle-mixin
openai-whisper
librosa
matplotlib
nltk
batch-face

Install them with:

pip install -r requirements.txt

Technologies Used

Python
OpenAI Whisper (ASR)
TensorFlow
Google Translate API
Librosa (for audio processing)
FFmpeg (for video processing)

Contributing

Contributions are welcome! Feel free to fork the repository and submit a pull request.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
config		config
research		research
src/VoiceOverAI		src/VoiceOverAI
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
file_template.py		file_template.py
main.py		main.py
params.yaml		params.yaml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoiceOver AI

Overview

Features

Installation

Prerequisites

Steps

Usage

Dependencies

Technologies Used

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VoiceOver AI

Overview

Features

Installation

Prerequisites

Steps

Usage

Dependencies

Technologies Used

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages