Thanks to visit codestin.com
Credit goes to github.com

Skip to content

daredevilNYH/Multi-Language-Audio-Generator

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Language-Audio-Generator

A Python application to create synchronised audio of multi-languages from a subtitle file. It uses Microsoft Azure Speech service to dub the video with AI voices. This programme is generated with the help of ChatGPT 4.0.

How It Works

You will need a pre-prepared subtitle SRT file of the chosen language, you can

  1. Automatically detects the languages within the SRT file
  2. Extract texts and timings from SRT file
  3. Uses Microsoft Azure API to generate individual audio clips (segment) based on the extracted texts and timings, through the Speech text-to-speech service
  4. Adjust the speed of the audio clips using the prosody parameter (speed factors) within SSML (Speech Synthesis Markup Language)
    • This is done by comparing the target duration (extracted timings) and default duration (time taken for AI voice to finish with its default speed). i.e speed_factor = default_duration / target_duration
    • Speed factor is capped at a minimum of 1 to avoid rare cases of extremely slow playback speed
  5. Build the entire track by combining all individual video clips based on the timing

Key Features

  • Language auto-detection
  • AI voice customisation using Microsoft Azure Speech service
  • Fully synchronised audio track
  • Chart showing speed factors that provides capability of fine-tuning the srt file to move texts around (please do not change timings!) to reduce the speed factor under rare cases with super fast playback speed

Instructions

  1. Go to Google Colab, sign in with your Google account and create a new notebook
  2. Connect to the server and install the Python libraries by running the below codes in the notebook, one by one (you will get an error if you try to run them all in one go)
    1. pip install gtts pydub srt
    2. pip install azure-cognitiveservices-speech
    3. pip install langdetect
  3. Copy the `main.py' file from the repo and paste it into your notebook
  4. Locate your Microsoft Azure service account information
    1. speech_key
    2. service_region
      Screenshot 2024-06-23 172200
      and fill in the two fields in the notebook with your Azure service account information
      Screenshot 2024-06-24 204521
  5. Upload the SRT file to your notebook or you can also choose to use the sample SRT files prepared in this repo in three different languages English, Chinese and Spanish etc.
    Screenshot 2024-06-23 172200
    and copy the file path
    Screenshot 2024-06-23 172839
  6. Paste the path
    Screenshot 2024-06-24 204428
  7. Download voice_config.json from the repo and upload it to your notebook following the same method as the SRT file.
  8. Hit run and once it finishes, you will get a file text_to_speech_audio.wav. You can download it from the notebook and upload to either your editting software or Youtube Studio if you have the audio feather enabled in your account
    Screenshot 2024-06-23 173841

Additional Notes

  1. You can browse on Microsoft Azure Speech service to choose the voice you want and copy the code and replace it with the default voice in the configuration file. Please do not add multiple voices for one language.
    Screenshot 2024-06-23 220511
  2. At the moment, I have only included a couple of languages in the configuration file voice_config.json, feel free to create your own configuration file and add your language so the programme can detect it.
    For example, "en": "en-US-BrianNeural" where "en-US-BrianNeural" is your chosen AI voice name from Azure and "en" is your language which is just the first two letter from the AI voice name.
    jason

About

An simple application to create synchronised audio from a srt file

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%