A set of Python scripts for working with Azure Speech Service and OpenAI for text to speech, and text to avatar.
Video to transcription (transcription.py)
- Take an MP4 video as a input
- Strip audio from input video
- Creates a transcription from the audio
Text to speech (tts.py)
- Create an text-to-speech audio file from the transcription (can use
transcription.pyto create the text or provide your own)
Text to avatar with speech (avatar.py)
- Creates AI avatar (with speech) from a transcription (can use
transcription.pyto create the text or provide your own)
If you want to use Azure OpenAI voices, you need to setup an Azure OpenAI service.
- Create an Azure OpenAI service in the North Central US or Sweden Central regions using the steps outlined at Quickstart: Speech to text with the Azure OpenAI Whisper model
- Obtain the
EndpointandKeyfrom Azure Portal > The OpenAI service you just created > Keys and Endpoints - Using the Azure OpenAI Studio > Deployments, deploy a
tts-hdmodel using the steps outlined in Create and deploy an Azure OpenAI Service resource take a note of the deployment name - Using the Azure OpenAI Studio > Deployments, deploy a
Whispermodel using the steps outlined in Create and deploy an Azure OpenAI Service resource take a note of the deployment name - Select a voice to use from https://platform.openai.com/docs/guides/text-to-speech. Use
alloyif unsure. Theash,coralandsagevoices are not yet included in the OpenAI API so cannot yet be used in this tool.
If you want to use Azure Speech Service voices, you need to setup an Azure Speech service. These voices are not as natural as OpenAI but there is a much broader choice of languages, accents and controls.
You do can do the same thing that this code is doing with the Azure Speech Service through the Azure Speech Studio
- Open the Azure Speech Studio
- Go to the Voice Gallery
- Create a
Speech resourceand note theregionandresource key
- Create an
.envfile which contains these env variables. Replace the real values with values from the steps above
AZURE_OPENAI_API_KEY=Key
AZURE_OPENAI_ENDPOINT=Endpoint
AZURE_OPENAI_TTS_DEPLOYMENT=TtsDeploymentName
AZURE_OPENAI_WHISPER_DEPLOYMENT=WhisperDeploymentname
AZURE_SPEECH_KEY=Key
AZURE_SPEECH_REGION=swedencentral
AZURE_AVATAR_API_VERSION=2024-04-15-preview
- Create a Python Virtual Environment
python3 -m venv video-to-speech-venv - Activate virtual environment using
source video-to-speech-venv/bin/activateon MacOS orvideo-to-speech-venv\Scripts\activatefor Windows - Install dependencies
pip3 install -r requirements.txt - Install FFmpeg for video decoding and encoding. On MacOS use
brew install ffmpegorchoco install ffmpegon Windows
- Run script
python transcription.py -i <full path to input video file>orpython transcription.py --helpfor arguments, options and usage
- Run script
python tts.py -i <full path to input transcription file> -v alloy(this will use the OpenAI alloy voice) orpython tts.py --helpfor arguments, options and usage
- Run script
python avatar.py --helpfor arguments, options and usage