Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

FIRSTinMI/live-captions

Repository files navigation

Live Captions

This application hosts a web page with captions generated by either Google Speech-to-Text API or April-ASR from an input stream on the local machine. To add to your stream you can add the url http://localhost:3000/ as a browser input and once the application is running captions will be sent to the browser input with a websocket.

Multiple inputs can be added and each can be set to display as a different color. Each input stream has an adjustable threshold, so if you hold the microphone away from your mouth to talk but forget to mute it, you can avoid having that conversation broadcast on the screen. It will also stop streaming to the Google API after about a minute of silence, since every minute of API use costs 1.6 cents per stream, we want to reduce that cost when we don't need it. Also includes a configurable profainity filter incase the transcription API mishears what someone said.

Usage

  1. Download the latest release
  2. Run the application
  3. Go to the settings page http://localhost:3000/settings.html
  4. Enter your Google API key in the server tab
  5. Create inputs for your microphones in the transcription tab
  6. Click apply to restart the server with the new settings
  7. Adjust the input thresholds to suit your needs
  8. Create a browser input in vMix pointing to http://localhost:3000/ and set it as the top overlay (4)
  9. ?
  10. Success

Local Engine

If you happen to be at a venue with a poor internet connection you can use the April engine. It's recognition is not as good as Google's but atleast it'll work consistently. Currently the April engine is in beta

  1. Make sure python is installed
  2. Open powershell and run this command pip install april_asr websockets psutil
  3. Select the April engine on the transcription tab of settings and click apply
  4. Wait for the software to download the model and script
  5. Win

Language Support

  • Google v2 can support multi-lingual transcriptions. You can use the dropdown to tell it what languages to expect for a given source.
  • Google v1 can support one language per-input. If you select multiples, it will default to the first one listed.
  • April ASR ignores the languge selection and only supports English.

First time setup for non FiM users

This will walk through the steps to setup a google cloud account for non FiM users

  1. Visit https://console.cloud.google.com/ and open a new project
image image image
  1. Note the project ID for later use. (All IDs shown in this demo have been revoked. I'm keeping them visible for clarity.)
  2. Select the "Convert speech to text" product
image
  1. Enable it
image
  1. Select Credentials then Create Credentials->Service Account
image
  1. Name the account anything of your choice
  2. Select speech client permissions and select Done
image
  1. Then click on the new account under Service Accounts
image
  1. Click on "Keys" then "Add Key->Create New Key" and select type JSON
image
  1. Open the json file that auto-download in a text editor
  2. Copy the client_email, private key, and project_id values to the server tab of local-captions tool. *NOTE The key names in the JSON vary slightly from the key names in the tool. Make sure you don't change the keynames, and only change the values. *
  3. Go to https://console.cloud.google.com/speech/adaptation-resources/list
  4. Add a new phraseSet containing common phrases like team names to improve the transcription
image
  1. Copy phraseSet name and paste it in the transcription field, replacing the examples from FiM
image

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •