AI Audio Descriptions

Introduction

Audio Description is a technique for describing what is happening during a video, to benefit audience members who are blind or have low vision. This generally takes the form of a second audio track, and is available on TV, streaming services, and at movie theaters. The narration is timed to fit within silent parts of the video, so it doesn't overlap the dialog, and does not increase the length of the program (as would be the case if the video was paused to provide a description).

This project leverages Artificial Intelligence to assist in the process of generating the Audio Description track. First, a description is generated for each scene, along with a transcript of the dialog. Silences are then identified, and the descriptions rewritten to fit in the gaps. This is presented to the human AD editor as a draft to review and update. Once the script is finalized, the video can be downloaded with Audio Descriptions inserted using Text-To-Speech.

We hope that making the AD authoring process faster, and thus less expensive, will result in more inclusive content being created. Providing content with AD tracks is a legal requirement in several countries, and this will also help media companies meet these requirements.

We'd love to hear what you think. Especially if you deploy this solution within your organization. Email [email protected].

Examples

copilot.mp4

reindeer-games.mp4

Try It Yourself

We are providing this solution as open source to enable content creators to incorporate it into their workflows. The web app allows uploading of MP4 videos, having the draft AD script generated, editing the script, and generating a new video file with the audio description inserted.

While we provide an end-to-end user experience, aspects such as hosting, authentication and authorization will differ customer-to-customer.

The below details will enable a developer to run the solution on their dev box.

Setup Azure

Azure Subscription: If you don't already have one, you can get a free Azure subscription here.
Azure AI Services: Provides access to Azure Content Understanding, Open AI, and speech APIs. When creating the resource, select either West US, Sweden Central, or Australia East as the region.
Azure Storage Account: Used to store the videos. After creating the account, create a container named "audio-description" and generate a Shared Access Signiture for the container. You will also need to enable CORS to allow the app to retrieve data from blob storage (select CORS from the storage account settings and create a new rule: set Allowed Origins to be the URL where the app is running, Allowed Methods to get/put/options/delete, Allowed Headers to *, and Max Age 9999).
GPT model: Go into the AI Services resource created above, and deploy a GPT-4O model.

Configure the Solution

After cloning this repo, create a file called .env. Add lines in the format key=value with the following entries:

VITE_AI_SERVICES_RESOURCE: The name of the resource (not the full domain name).
VITE_AI_SERVICES_KEY: Can be copied from the portal.
VITE_AI_SERVICES_REGION: All one word, such as westus or swedencentral.
VITE_STORAGE_ACCOUNT: The name of the resource (not the full domain name).
VITE_BLOB_SAS_TOKEN: The Shared Access Signiture created above. This should be a set of keys and values, such as: sp=…&st=…&se=…&spr=…&sv=…&sr=…&sig=….
VITE_GPT_DEPLOYMENT: The name you chose when creating the deployment, such as gpt-4o.

Run the App

In the project directory, run npm install to install required packages.
Make sure the .env file created above is in this directory too.
Run npm run dev to run the project locally.
The URL, such as [http://localhost:5173], will be displayed in the terminal. Visit that URL in your browser to view the app.

Contributions Welcome

This is just the beginning. We have several ideas for improvements, and the AI keeps improving. If you have ideas, or code contributions, we'd love to hear from you.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
code_of_conduct.md		code_of_conduct.md
eslint.config.js		eslint.config.js
index.html		index.html
license.md		license.md
package-lock.json		package-lock.json
package.json		package.json
readme.md		readme.md
security.md		security.md
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Audio Descriptions

Introduction

Examples

Try It Yourself

Setup Azure

Configure the Solution

Run the App

Contributions Welcome

See Also

About

Uh oh!

Releases

Packages

Languages

License

a2hsh/ai-audio-descriptions

Folders and files

Latest commit

History

Repository files navigation

AI Audio Descriptions

Introduction

Examples

Try It Yourself

Setup Azure

Configure the Solution

Run the App

Contributions Welcome

See Also

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages