A Python application to create synchronised audio of multi-languages from a subtitle file. It uses Microsoft Azure Speech service to dub the video with AI voices. This programme is generated with the help of ChatGPT 4.0.
You will need a pre-prepared subtitle SRT file of the chosen language, you can
- Automatically detects the languages within the SRT file
- Extract texts and timings from SRT file
- Uses Microsoft Azure API to generate individual audio clips (segment) based on the extracted texts and timings, through the Speech text-to-speech service
- Adjust the speed of the audio clips using the prosody parameter (speed factors) within SSML (Speech Synthesis Markup Language)
- This is done by comparing the target duration (extracted timings) and default duration (time taken for AI voice to finish with its default speed). i.e speed_factor = default_duration / target_duration
- Speed factor is capped at a minimum of 1 to avoid rare cases of extremely slow playback speed
- Build the entire track by combining all individual video clips based on the timing
- Language auto-detection
- AI voice customisation using Microsoft Azure Speech service
- Fully synchronised audio track
- Chart showing speed factors that provides capability of fine-tuning the srt file to move texts around (please do not change timings!) to reduce the speed factor under rare cases with super fast playback speed
- Go to Google Colab, sign in with your Google account and create a new notebook
- Connect to the server and install the Python libraries by running the below codes in the notebook, one by one (you will get an error if you try to run them all in one go)
pip install gtts pydub srtpip install azure-cognitiveservices-speechpip install langdetect
- Copy the `main.py' file from the repo and paste it into your notebook
- Locate your Microsoft Azure service account information
- Upload the SRT file to your notebook or you can also choose to use the sample SRT files prepared in this repo in three different languages English, Chinese and Spanish etc.

and copy the file path

- Paste the path

- Download
voice_config.jsonfrom the repo and upload it to your notebook following the same method as the SRT file. - Hit run and once it finishes, you will get a file
text_to_speech_audio.wav. You can download it from the notebook and upload to either your editting software or Youtube Studio if you have the audio feather enabled in your account

- You can browse on Microsoft Azure Speech service to choose the voice you want and copy the code and replace it with the default voice in the configuration file. Please do not add multiple voices for one language.

- At the moment, I have only included a couple of languages in the configuration file
voice_config.json, feel free to create your own configuration file and add your language so the programme can detect it.
For example,"en": "en-US-BrianNeural"where"en-US-BrianNeural"is your chosen AI voice name from Azure and"en"is your language which is just the first two letter from the AI voice name.


