Codestin Search App

ms-dot-k · 2024-01-22T08:21:17Z

What?

New recipe for training audio-visual speech recognition model on Easycom dataset.
The recipe is based on LRS3 avsr recipe which utilizes pre-trained AV-HuBERT model. (Dumped features)

I added data augmentation techniques to the espnet2/asr/encoder/avhubert_encoder.py

acoustic noise perturbation: a babble noise is corrupted with random noisy strengths at the feature level.
modality dropout: audio and video streams are randomly dropped out, so that we can still perform audio-visual or audio-only, visual-only prediction after the model is trained.

The dataset is very challenging due to noise and long-distance voice.
Previous ASR model (wav2vec2.0) trained on 60k hours of data achieves 87.5% WER (https://arxiv.org/pdf/2212.11377.pdf). Therefore, by employing the visual information, we can improve the performance greatly by complementing the insufficient audio information (due to noise, overlapped speech, and long-distance voice) during speech recognition.

The trained model using the recipe was trained on 1,759 hours of data for pre-training (AV-HuBERT) and 438 hours of data for finetuning. Considering the data amount, the current performance seems reasonable.

One possible direction to improve the performance is using more audio-visual data including LRS2, VoxCeleb, and AVSpeech.

for more information, see https://pre-commit.ci

ftshijt

Very cool extension! Many thanks for the effort.

Could you please also add en entry in egs2/README.md for the dataset?

Also, two minor comments as follows:

egs2/easycom/avsr1/local/data.sh

for more information, see https://pre-commit.ci

codecov · 2024-01-26T07:30:35Z

Codecov Report

Attention: 26 lines in your changes are missing coverage. Please review.

Comparison is base (27f292d) 76.11% compared to head (0582547) 76.13%.

Files	Patch %	Lines
espnet2/asr/encoder/avhubert_encoder.py	33.33%	26 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5630      +/-   ##
==========================================
+ Coverage   76.11%   76.13%   +0.01%     
==========================================
  Files         743      743              
  Lines       69117    69151      +34     
==========================================
+ Hits        52608    52647      +39     
+ Misses      16509    16504       -5

Flag	Coverage Δ
test_configuration_espnet2	`∅ <ø> (∅)`
test_integration_espnet1	`62.92% <ø> (+0.14%)`	⬆️
test_integration_espnet2	`48.48% <2.56%> (-0.04%)`	⬇️
test_python_espnet1	`18.39% <0.00%> (-0.01%)`	⬇️
test_python_espnet2	`52.66% <33.33%> (+0.03%)`	⬆️
test_utils	`22.15% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

for more information, see https://pre-commit.ci

sw005320 · 2024-01-31T14:42:42Z

Thanks a lot!

ms-dot-k and others added 3 commits January 22, 2024 17:07

AVSR recipe for EASYCOM dataset (training with LRS3 dataset)

02de038

Improve training with modality dropout and noise augmentation, Availa…

3df53b0

…ble for audio-only training and inference

Update db.sh

d854971

Add easycom dataset

mergify bot added ESPnet2 Installation labels Jan 22, 2024

pre-commit-ci bot and others added 2 commits January 22, 2024 08:22

[pre-commit.ci] auto fixes from pre-commit.com hooks

afbaaf2

for more information, see https://pre-commit.ci

README

29364e5

mergify bot added the README label Jan 22, 2024

ms-dot-k and others added 3 commits January 22, 2024 18:31

add model

0c02f35

[pre-commit.ci] auto fixes from pre-commit.com hooks

02454c9

for more information, see https://pre-commit.ci

Merge branch 'easycom' of https://github.com/ms-dot-k/espnet into eas…

c12528f

…ycom

sw005320 requested a review from ftshijt January 22, 2024 12:30

sw005320 added the AV Audio visual processing label Jan 22, 2024

sw005320 added this to the v.202312 milestone Jan 22, 2024

sw005320 reviewed Jan 22, 2024

View reviewed changes

ms-dot-k and others added 6 commits January 22, 2024 22:35

Reflect the comment

ba4ced1

Reflect the comment

dba195b

Modification according to ci

198012e

Modification according to ci

6b72ef1

Modification according to ci

a7d720e

[pre-commit.ci] auto fixes from pre-commit.com hooks

569d6d7

for more information, see https://pre-commit.ci

ftshijt reviewed Jan 25, 2024

View reviewed changes

egs2/easycom/avsr1/local/data.sh Outdated Show resolved Hide resolved

egs2/easycom/avsr1/local/data.sh Outdated Show resolved Hide resolved

ms-dot-k and others added 5 commits January 26, 2024 14:43

Reflecting comments

7aaab99

[pre-commit.ci] auto fixes from pre-commit.com hooks

15e42c3

for more information, see https://pre-commit.ci

Add new dataset in egs2/README.md

27ae6a2

ci

67e05f4

[pre-commit.ci] auto fixes from pre-commit.com hooks

88fbf17

for more information, see https://pre-commit.ci

ms-dot-k and others added 2 commits January 28, 2024 13:44

for ci test

8c69e87

Merge branch 'master' into easycom

a78511d

[pre-commit.ci] auto fixes from pre-commit.com hooks

0582547

for more information, see https://pre-commit.ci

sw005320 merged commit 348139f into espnet:master Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AVSR recipe for Easycom Dataset#5630

AVSR recipe for Easycom Dataset#5630
sw005320 merged 22 commits intoespnet:masterfrom
ms-dot-k:easycom

ms-dot-k commented Jan 22, 2024

Uh oh!

sw005320 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sw005320 Jan 22, 2024

Uh oh!

ms-dot-k Jan 22, 2024

Uh oh!

ftshijt left a comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 26, 2024 •

edited

Loading

Uh oh!

sw005320 commented Jan 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

+              |---|---|---|---|---|---|---|---|---|
+              |inference_asr_model_valid.acc.ave/test_with_LRS3|694|8886|70.4|18.6|11.0|5.0|34.6|75.4|
+              ## Audio-only Speech Recognition Results (Audio-only) <br> exp/asr_train_avsr_avhubert_large_with_lrs3_noise_extracted_en_bpe1000

Conversation

ms-dot-k commented Jan 22, 2024

What?

See also

Uh oh!

sw005320 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sw005320 Jan 22, 2024

Choose a reason for hiding this comment

Uh oh!

ms-dot-k Jan 22, 2024

Choose a reason for hiding this comment

Uh oh!

ftshijt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sw005320 commented Jan 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Jan 26, 2024 •

edited

Loading