openai-video-to-speech

A set of Python scripts for working with Azure Speech Service and OpenAI for text to speech, and text to avatar.

Video to transcription (transcription.py)

Take an MP4 video as a input
Strip audio from input video
Creates a transcription from the audio

Text to speech (tts.py)

Create an text-to-speech audio file from the transcription (can use transcription.py to create the text or provide your own)

Text to avatar with speech (avatar.py)

Creates AI avatar (with speech) from a transcription (can use transcription.py to create the text or provide your own)

Setup OpenAI Service

If you want to use Azure OpenAI voices, you need to setup an Azure OpenAI service.

Create an Azure OpenAI service in the North Central US or Sweden Central regions using the steps outlined at Quickstart: Speech to text with the Azure OpenAI Whisper model
Obtain the Endpoint and Key from Azure Portal > The OpenAI service you just created > Keys and Endpoints
Using the Azure OpenAI Studio > Deployments, deploy a tts-hd model using the steps outlined in Create and deploy an Azure OpenAI Service resource take a note of the deployment name
Using the Azure OpenAI Studio > Deployments, deploy a Whisper model using the steps outlined in Create and deploy an Azure OpenAI Service resource take a note of the deployment name
Select a voice to use from https://platform.openai.com/docs/guides/text-to-speech. Use alloy if unsure. The ash, coral and sage voices are not yet included in the OpenAI API so cannot yet be used in this tool.

Setup Azure Speech Service

If you want to use Azure Speech Service voices, you need to setup an Azure Speech service. These voices are not as natural as OpenAI but there is a much broader choice of languages, accents and controls.

You do can do the same thing that this code is doing with the Azure Speech Service through the Azure Speech Studio

Open the Azure Speech Studio
Go to the Voice Gallery
Create a Speech resource and note the region and resource key

Setup Python Environment

Create an .env file which contains these env variables. Replace the real values with values from the steps above

AZURE_OPENAI_API_KEY=Key
AZURE_OPENAI_ENDPOINT=Endpoint
AZURE_OPENAI_TTS_DEPLOYMENT=TtsDeploymentName
AZURE_OPENAI_WHISPER_DEPLOYMENT=WhisperDeploymentname
AZURE_SPEECH_KEY=Key
AZURE_SPEECH_REGION=swedencentral
AZURE_AVATAR_API_VERSION=2024-04-15-preview

Create a Python Virtual Environment python3 -m venv video-to-speech-venv
Activate virtual environment using source video-to-speech-venv/bin/activate on MacOS or video-to-speech-venv\Scripts\activate for Windows
Install dependencies pip3 install -r requirements.txt
Install FFmpeg for video decoding and encoding. On MacOS use brew install ffmpeg or choco install ffmpeg on Windows

Create a transcription from a video

Run script python transcription.py -i <full path to input video file> or python transcription.py --help for arguments, options and usage

Create AI speech audio from transcription

Run script python tts.py -i <full path to input transcription file> -v alloy (this will use the OpenAI alloy voice) or python tts.py --help for arguments, options and usage

Create AI avatar video from transcription

Run script python avatar.py --help for arguments, options and usage

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
input		input
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
avatar.py		avatar.py
avatar_list_jobs.py		avatar_list_jobs.py
requirements.txt		requirements.txt
transcription.py		transcription.py
tts.py		tts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

openai-video-to-speech

Setup OpenAI Service

Setup Azure Speech Service

Setup Python Environment

Create a transcription from a video

Create AI speech audio from transcription

Create AI avatar video from transcription

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

martinkearn/openai-video-to-speech

Folders and files

Latest commit

History

Repository files navigation

openai-video-to-speech

Setup OpenAI Service

Setup Azure Speech Service

Setup Python Environment

Create a transcription from a video

Create AI speech audio from transcription

Create AI avatar video from transcription

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages