Thanks to visit codestin.com
Credit goes to github.com

Skip to content

martinkearn/openai-video-to-speech

Repository files navigation

openai-video-to-speech

A set of Python scripts for working with Azure Speech Service and OpenAI for text to speech, and text to avatar.

Video to transcription (transcription.py)

  1. Take an MP4 video as a input
  2. Strip audio from input video
  3. Creates a transcription from the audio

Text to speech (tts.py)

  1. Create an text-to-speech audio file from the transcription (can use transcription.py to create the text or provide your own)

Text to avatar with speech (avatar.py)

  1. Creates AI avatar (with speech) from a transcription (can use transcription.py to create the text or provide your own)

Setup OpenAI Service

If you want to use Azure OpenAI voices, you need to setup an Azure OpenAI service.

  1. Create an Azure OpenAI service in the North Central US or Sweden Central regions using the steps outlined at Quickstart: Speech to text with the Azure OpenAI Whisper model
  2. Obtain the Endpoint and Key from Azure Portal > The OpenAI service you just created > Keys and Endpoints
  3. Using the Azure OpenAI Studio > Deployments, deploy a tts-hd model using the steps outlined in Create and deploy an Azure OpenAI Service resource take a note of the deployment name
  4. Using the Azure OpenAI Studio > Deployments, deploy a Whisper model using the steps outlined in Create and deploy an Azure OpenAI Service resource take a note of the deployment name
  5. Select a voice to use from https://platform.openai.com/docs/guides/text-to-speech. Use alloy if unsure. The ash, coral and sage voices are not yet included in the OpenAI API so cannot yet be used in this tool.

Setup Azure Speech Service

If you want to use Azure Speech Service voices, you need to setup an Azure Speech service. These voices are not as natural as OpenAI but there is a much broader choice of languages, accents and controls.

You do can do the same thing that this code is doing with the Azure Speech Service through the Azure Speech Studio

  1. Open the Azure Speech Studio
  2. Go to the Voice Gallery
  3. Create a Speech resource and note the region and resource key

Setup Python Environment

  1. Create an .env file which contains these env variables. Replace the real values with values from the steps above
AZURE_OPENAI_API_KEY=Key
AZURE_OPENAI_ENDPOINT=Endpoint
AZURE_OPENAI_TTS_DEPLOYMENT=TtsDeploymentName
AZURE_OPENAI_WHISPER_DEPLOYMENT=WhisperDeploymentname
AZURE_SPEECH_KEY=Key
AZURE_SPEECH_REGION=swedencentral
AZURE_AVATAR_API_VERSION=2024-04-15-preview
  1. Create a Python Virtual Environment python3 -m venv video-to-speech-venv
  2. Activate virtual environment using source video-to-speech-venv/bin/activate on MacOS or video-to-speech-venv\Scripts\activate for Windows
  3. Install dependencies pip3 install -r requirements.txt
  4. Install FFmpeg for video decoding and encoding. On MacOS use brew install ffmpeg or choco install ffmpeg on Windows

Create a transcription from a video

  1. Run script python transcription.py -i <full path to input video file> or python transcription.py --help for arguments, options and usage

Create AI speech audio from transcription

  1. Run script python tts.py -i <full path to input transcription file> -v alloy (this will use the OpenAI alloy voice) or python tts.py --help for arguments, options and usage

Create AI avatar video from transcription

  1. Run script python avatar.py --help for arguments, options and usage

About

A Python script which converts an MP4 video to speech using Azure OpenAI services.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages