-
Notifications
You must be signed in to change notification settings - Fork 52
Open
Description
I am trying to get subtitles for a long audio file, using Medium. It's the audio from a compilation from a spanish-language TV series.
My command was this:
pwcpp \
--model "medium-q8_0" \
--language "es" \
--temperature 0.1 \
--print_realtime true \
--output-srt \
./input.wav
I ran it, saw it was doing great, so I left for a few hours. I came back to find it went crazy at around the 2 hour markk:
[02:03:58.320 --> 02:03:59.320] ¿Qué pasa?
[02:03:59.320 --> 02:04:00.320] ¿Qué pasa?
[02:04:00.320 --> 02:04:01.320] ¿Qué pasa?
[02:04:01.320 --> 02:04:02.320] ¿Qué pasa?
Progress: 37%
[02:04:02.320 --> 02:04:14.320] ¿Qué pasa?
Progress: 38%
[02:04:14.320 --> 02:04:39.320] ¿Qué pasa?
Progress: 38%
...repeat ad nauseum for 2.5 hours
[04:34:56.320 --> 04:35:11.320] ¿Qué pasa?
Progress: 84%
[04:35:11.320 --> 04:35:26.320] ¿Qué pasa?
Progress: 84%
I had previously tried another whisper.cpp-based transcription tool (Memo, using the non-quantized Medium model) and the same thing happened after 30min, with a different repeating line.
Is whisper.cpp inherently unreliable for long-running tasks? And could pwcpp provide a way to mitigate this? For example:
- I didn't even get a partial srt file when I killed pwcpp. Always write the srt even if incomplete.
- Provide the user a way to resume. Let us specify the timestamp to start at, and a file to resume from
--begin-at-timestamp "02:04:39.320 --use-existing-srt ./previousoutput.srt"
(or at least just --begin-at-timestamp. I could manually combine the two srt files.)
Metadata
Metadata
Assignees
Labels
No labels