This application hosts a web page with captions generated by either Google Speech-to-Text API or April-ASR from an input stream on the local machine. To add to your stream you can add the url http://localhost:3000/ as a browser input and once the application is running captions will be sent to the browser input with a websocket.
Multiple inputs can be added and each can be set to display as a different color. Each input stream has an adjustable threshold, so if you hold the microphone away from your mouth to talk but forget to mute it, you can avoid having that conversation broadcast on the screen. It will also stop streaming to the Google API after about a minute of silence, since every minute of API use costs 1.6 cents per stream, we want to reduce that cost when we don't need it. Also includes a configurable profainity filter incase the transcription API mishears what someone said.
- Download the latest release
- Run the application
- Go to the settings page http://localhost:3000/settings.html
- Enter your Google API key in the server tab
- Create inputs for your microphones in the transcription tab
- Click apply to restart the server with the new settings
- Adjust the input thresholds to suit your needs
- Create a browser input in vMix pointing to http://localhost:3000/ and set it as the top overlay (4)
- ?
- Success
If you happen to be at a venue with a poor internet connection you can use the April engine. It's recognition is not as good as Google's but atleast it'll work consistently. Currently the April engine is in beta
- Make sure python is installed
- Open powershell and run this command
pip install april_asr websockets psutil - Select the April engine on the transcription tab of settings and click apply
- Wait for the software to download the model and script
- Win
- Google v2 can support multi-lingual transcriptions. You can use the dropdown to tell it what languages to expect for a given source.
- Google v1 can support one language per-input. If you select multiples, it will default to the first one listed.
- April ASR ignores the languge selection and only supports English.
This will walk through the steps to setup a google cloud account for non FiM users
- Visit https://console.cloud.google.com/ and open a new project
- Note the project ID for later use. (All IDs shown in this demo have been revoked. I'm keeping them visible for clarity.)
- Select the "Convert speech to text" product
- Enable it
- Select Credentials then Create Credentials->Service Account
- Name the account anything of your choice
- Select speech client permissions and select Done
- Then click on the new account under Service Accounts
- Click on "Keys" then "Add Key->Create New Key" and select type JSON
- Open the json file that auto-download in a text editor
- Copy the client_email, private key, and project_id values to the server tab of local-captions tool. *NOTE The key names in the JSON vary slightly from the key names in the tool. Make sure you don't change the keynames, and only change the values. *
- Go to https://console.cloud.google.com/speech/adaptation-resources/list
- Add a new phraseSet containing common phrases like team names to improve the transcription
- Copy phraseSet name and paste it in the transcription field, replacing the examples from FiM