WhisperX

This is a wrapper of WhisperX for deployment on Replicate.

Deploying to Replicate

The most up-to-date documentation is here: https://replicate.com/docs/guides/push-a-model

Fire up the cheapest GPU machine on Lambda. You should also attach a filesystem.

Note: cog has issues building the image on Lambda ARM64 instances as of October 2025. Use x86_64.

SSH into the instance, or use LambdaLab's Cloud IDE and open a terminal.
Install cog:

sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/download/v0.8.6/cog_`uname -s`_`uname -m`
sudo chmod +x /usr/local/bin/cog

Clone this repo and cd into it

cd [filesystem name] # if you attached a filesystem
git clone https://github.com/wordscenes/whisper-x-cog.git
cd whisper-x-cog

Download the models to the Docker container:

sudo cog run script/download_models.py

If you get nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.8, please update your driver to a newer version, or use an earlier cuda container: unknown, then you didn't attach a file system. (I guess it runs out of memory or something. It's a stupid error message 🤷.)

Note: sudo is necessary, see Replicate docs

Test by building the container and running prediction on the included sample file:

sudo cog predict -i [email protected] -i language=en

You can also test the align method by itself using the segments contained in segments.json.

sudo cog predict -i [email protected] -i language=en -i mode=align -i segments="$(cat segments.json)"

Judging manually, the roughly expected timestamps are:

testing: .55-.9
one: .9-1.05
two: 1.05-1.2
three: 1.2-1.45

You should also double-check the printed module versions to make sure they're what you meant to use.

You should also check with a more difficult file; we do not include it here for copyright reasons, but internally we test with the first 5 minutes of Shrek in Japanese. Send the file to the server with scp (taking care not to put it in the whisper-x-cog directory, as that will will include it in the Docker image if you build again):

scp -i <your_key_rsa> <your_audio>.mp3 ubuntu@<machine IP>:/home/ubuntu/<your_audio>.mp3

Transcribing Shrek takes 2m6s, with 36s used for startup. Also note that the first word after the long song is given a timespan consisting of the previous 20s. TODO: Not sure what to do about that right now.

Push to replicate:

sudo cog login
sudo cog push

If you get name unknown: The model https://replicate.com/wordscenes/whisperx does not exist, then you forgot to use sudo in cog push!

If you get You are not logged in to Replicate. Run 'cog login' and try again., then you forgot to use sudo in cog login!

Go to https://replicate.com/wordscenes/whisperx/versions, grab the latest version ID, and replace it in any code that calls this API (unfortunately you can't just call the latest version :( ).

Rejected Experiments (copied from whisper-ts-cog, assumed to still apply)

Whisper model "large-v3" does horribly on the Shrek test, replacing many phrases with "ご視聴ありがとうございました" (others have also reported "thank you for watching" hallucinations in English). See https://deepgram.com/learn/whisper-v3-results for some validating evidence that this model severely underperforms vs. "large-v2".
faster-whisper (version "0.10.0") degrades accuracy on the Shrek test, omitting one sentence at the end of the clip ("よし、もう満杯だ"), as well as reducing the expressiveness of the onomatopetic katakana expressions. It is faster, but we want accuracy.

Note: WhisperX v3 uses faster-whisper by default, we should consider switching to WhisperX v2.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
script		script
.editorconfig		.editorconfig
cog.yaml		cog.yaml
predict.py		predict.py
readme.md		readme.md
requirements.txt		requirements.txt
segments.json		segments.json
testing-1-2-3.mp3		testing-1-2-3.mp3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WhisperX

Deploying to Replicate

Rejected Experiments (copied from whisper-ts-cog, assumed to still apply)

About

Uh oh!

Releases

Packages

Languages

wordscenes/whisper-x-cog

Folders and files

Latest commit

History

Repository files navigation

WhisperX

Deploying to Replicate

Rejected Experiments (copied from whisper-ts-cog, assumed to still apply)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages