Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@varenc
Copy link

@varenc varenc commented Oct 27, 2024

The center channel should contain all the dialogue with less background noise.

Made minor tweaks and changes to support detecting center channel and applying additional ffmpeg -af 'pan=c0=FC' filter for center channel extraction. Slightly increases accuracy for speech detection in limited testing.

downside is this may cause worse detection in some cases, or very bad detection if for some reason the dialogue isn't on the FC center channel. Will cause ffmpeg errors for audio with 6 or 8 channels that don't have a center channel. (6.0(front))

TODO: Make this a configurable option in the future.

extra PR note: I don't really think this should get merged as in, but just planting the seed that this improvement is reasonable. I think it also needs to be a configurable option and the audio detectiion should be more robust and check for actual 5.1, 5.1(side), 7.1, 7.1(wide), etc, etc channel layouts instead of just seeing if the audio has 6 or 8 channels. Also more confirmation that media really should always have all the subtitled dialogue on the FC channel.

…dio from the video file.

The center channel should contain all the dialogue with less background noise.

made minor tweaks and changes to support detecting center channel and applying
additional ffmpeg `-af 'pan=c0=FC'` filter for center channel extraction.
slightly increases accuracy for speech detection in limited testing.

downside is this may cause worse detection in some cases, or very bad detection
if for some reason the dialogue isn't on the FC center channel. Will cause ffmpeg
errors for audio with 6 or 8 channels that don't have a center channel.

TODO: Make this a configurable option in the future.
@varenc
Copy link
Author

varenc commented Oct 28, 2024

For confirming that 5.1 audio media indeed has all the dialogue on the FC channel, I used this helpful ffmpeg command:

ffmpeg -i <input_media> -af 'asplit[a1][a2];[a1]pan=mono|c0=FC[a1];[a2]pan=mono|c0=0.5*FL+0.5*FR+0*FC+0.707*LFE+0.5*BL+0.5*BR[a2];[a1][a2]amerge=inputs=2' -c:a aac -c:v copy -c:s copy out.mkv

That takes in a media with 5.1 sound and downmixes the audio to stereo so that just the FC (center) channel is on the left channel, and all the other channels are downmixed to mono and put on the right channel. The numbers in the second pan filter are just downmixing the 5.1 audio with 0*FC to make sure the FC channel isn't included.

This results in a stereo media file you can listen to with earbuds and you can easily hear to confirm that all dialogue is on the left earbud while everything else is on the right earbud. In my brief testing I found all dialogue to always be on the FC channel. A media file like this also demonstrates how just FC is a cleaner audio source for just dialogue with less background noise.

@varenc
Copy link
Author

varenc commented Oct 28, 2024

update: I found some counterexample media where extracting just the center FC channel results in much worse/nonsense performance. I haven't figured out why yet exactly but I would guess it's because on this particular media there's some dialogue that's missing from the FC channel. So definitely wouldn't recommend this as the default behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant