(NOTE: work on this hasn't started yet!)
LipSphinx is a web-based lipsync data editor / phoneme recognition and alignment tool built on pocketsphinx-js, designed for use for video games. It is a free open source multiplatform alternative to all the existing tools that rely on proprietary APIs (e.g. Microsoft SAPI) and it runs entirely in your browser. With the data generated by LipSphinx, you must make your own parser for your game engine that translates the phoneme timings into your characters' mouth movements or blendshape weights. A Unity3D plug-in is provided.
You can run this online or offline on your own webserver, or use the live demo hosted on GitHub here: LINK
-
An actual need for lipsync data. Lipsync doesn't add that much to your game. Plenty of older games used simpler methods -- Knights of the Old Republic used pre-animated mouth movements, and Half-Life 1 lowered an NPC's jaw based simply on the volume of the sound, and both worked well enough.
-
"Final" voice-over audio files. Do not waste time baking lipsync data for placeholder audio you will discared later. Machine phoneme recognition is an inherently error-laden process; we use audio recognition software simply to generate a base to work from, it takes a lot of manual adjustment to get good results.
-
A lipsync data parser for your game. This tool only generates phoneme timing data. It is up to you how to translate that into your game and/or character animations. Ideally, you would map a mouth position / blendshape for each phoneme.