Codestin Search App

Shikhar-S · 2024-11-30T14:47:12Z

What?

Implementation of the BEATs-BART Audio Captioning system described in this paper.

Our Scores
cider_d: 46.1
spider: 29.8

Paper's comparable scores
cider_d: 45.8
spider: 29.6

(Other numbers for this implementation are in the README and the paper's numbers can be found in second last row of Table 2 (w/o Instructor).)

This PR also implements 1) an option to not load pre-trained weights in ESPnet HuggingFace decoder, and 2) modify the pre-trained model's architecture using external configs.

Why?

This PR improves on the data preparation (training on all 5 captions, text preprocessing), and implements the same decoder as the paper's implementation (more attention heads, no pre-trained weight loading).

Codecov Report

Attention: Patch coverage is 10.39326% with 638 lines in your changes missing coverage. Please review.

Project coverage is 47.48%. Comparing base (cccc290) to head (ec75a49).
Report is 64 commits behind head on master.

Files with missing lines	Patch %	Lines
espnet2/asr/encoder/beats_encoder.py	10.11%	613 Missing ⚠️
...2/asr/decoder/hugging_face_transformers_decoder.py	15.00%	17 Missing ⚠️
espnet2/bin/asr_inference.py	16.66%	5 Missing ⚠️
espnet2/torch_utils/initialize.py	0.00%	2 Missing ⚠️
espnet2/tasks/abs_task.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5967      +/-   ##
==========================================
+ Coverage   45.59%   47.48%   +1.89%     
==========================================
  Files         614      529      -85     
  Lines       54356    47844    -6512     
==========================================
- Hits        24781    22717    -2064     
+ Misses      29575    25127    -4448

Flag	Coverage Δ
test_integration_espnet2	`47.48% <10.39%> (-0.56%)`	⬇️
test_integration_espnetez	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ftshijt

LGTM! thanks for the updates. Please fix the CI issues accordingly~

egs2/clotho_v2/asr1/cmd.sh

Shikhar-S · 2024-12-04T21:47:30Z

LGTM! thanks for the updates. Please fix the CI issues accordingly~

Done. Please lmk if any other changes are necessary.

sw005320 · 2024-12-04T22:05:31Z

egs2/clotho_v2/asr1/run_upload.sh

Do you need this file?
It has the absolute path.
We also discussed that we don't need to separate an upload shell.
Can you double-check it?

Thanks for catching this. Rechecked and fixed.

sw005320

Can you create a unit test, which covers 1) HF transformers_decoder and 2) beats_encoder? If 1) is tricky, you can try 2) only.

Check https://github.com/espnet/espnet/tree/master/test/espnet2/asr/encoder

… and correct assertion in beats

…tho_dcase23

Shikhar-S · 2024-12-07T18:25:03Z

Handled all comments and added tests.
The ci fails for due to low coverage. However most of the lines it cites are either covered by other tests, or are trivial (like var assignment).

sw005320 · 2024-12-08T14:05:49Z

egs2/README.md

 | bur_openslr80           | Burmese ASR training dataset                                                                                                     | ASR                     | BUR                   | https://openslr.org/80/                                                                                      |              |
 | catslu               	  | CATSLU-MAPS                                                                                                                      | SLU                     | CMN           	       | https://sites.google.com/view/catslu/home                                                                    |              |
 | catslu_entity        	  | CATSLU                                                                                                                           | SLU/Entity Classifi.    | CMN           	       | https://sites.google.com/view/catslu/home                                                                    |              |
+| clotho_v2                  | Clotho v2.1 dataset for audio captioning | ASR   | ENG                   | https://zenodo.org/records/4783391


Can you change ASR --> AAC?

sw005320 · 2024-12-08T14:06:12Z

Thanks! This is an amazing contribution!

Clotho_v2 Audio Captioning (DCASE 2023 implementation)

Shikhar-S and others added 30 commits September 23, 2024 09:30

add clotho_v2 aac recipe with beats encoder

7dba51a

BEATs encoder with BART decoder for Clotho AAC task

fd6c73e

recipe downloads data now

dfa7aee

add readme

7800c2c

Merge branch 'master' into clotho_asr

5b3374e

[pre-commit.ci] auto fixes from pre-commit.com hooks

d9ac98e

for more information, see https://pre-commit.ci

fix bug in pad, add option to use all layers, use last layer as adapter

1b4b322

fix downloading for train set, change pre-trained model path for delta

aa8910e

fix beats layernorm initialization bug

01e54fa

Merge branch 'master' into clotho_asr

bc2343c

[pre-commit.ci] auto fixes from pre-commit.com hooks

0425cfb

for more information, see https://pre-commit.ci

add local paths to audiocaps and clotho_chatgpt_mixup

64a00c4

Merge branch 'clotho_asr' of github.com:Shikhar-S/espnet into clotho_asr

599b586

remove ipynb ckpts

ac48449

changes for running on babel environment

2303395

Merge branch 'espnet:master' into clotho_asr

255bdba

[pre-commit.ci] auto fixes from pre-commit.com hooks

32d539a

for more information, see https://pre-commit.ci

clean up

af85cd5

Merge branch 'clotho_asr' of github.com:Shikhar-S/espnet into clotho_asr

5fba713

Merge branch 'espnet:master' into clotho_asr

66124af

[pre-commit.ci] auto fixes from pre-commit.com hooks

257ee2a

for more information, see https://pre-commit.ci

fix linting

cd21d78

Merge branch 'clotho_asr' of github.com:Shikhar-S/espnet into clotho_asr

b1933a4

[pre-commit.ci] auto fixes from pre-commit.com hooks

0cf2272

for more information, see https://pre-commit.ci

Merge branch 'master' into clotho_asr

98af5dc

fix remaining linting issue

736f1d2

Merge branch 'clotho_asr' of github.com:Shikhar-S/espnet into clotho_asr

09779ac

handle comments https://github.com/espnet/espnet/pull/5915\#pullreque…

2e098be

…streview-2437805856

[pre-commit.ci] auto fixes from pre-commit.com hooks

2a6f343

for more information, see https://pre-commit.ci

handle comment: https://github.com/espnet/espnet/pull/5915\#discussio…

92d2e57

…n_r1844016571

sw005320 added the Recipe label Dec 1, 2024

fix ci errors: line length, and transformers import

dded1d5

remove unnecessary runner files and imports

79cf497

ftshijt reviewed Dec 3, 2024

View reviewed changes

egs2/clotho_v2/asr1/cmd.sh Outdated Show resolved Hide resolved

Shikhar-S and others added 3 commits December 3, 2024 07:39

restore default cmd.sh

da52e55

restore default slurm.conf

27f5c9d

Merge branch 'master' into clotho_dcase23

a0e82d8

Fhrozen modified the milestones: v.202412, v.202503 Dec 4, 2024

sw005320 reviewed Dec 4, 2024

View reviewed changes

Shikhar-S and others added 10 commits December 6, 2024 10:16

remove upload script

9fdef3a

rename variables, remove unnecessary utility function from hf decoder…

1515739

… and correct assertion in beats

hf decoder tests: overriding config and skipping param loading

ceed0e9

add test for bets: fwd-bkwd, config override

2acac65

Merge branch 'espnet:master' into clotho_dcase23

2b02da2

Merge branch 'clotho_dcase23' of github.com:Shikhar-S/espnet into clo…

dd0bc08

…tho_dcase23

restore run.sh to default values

c76215a

satisfy ci

41bffd3

line len

1946883

fix hf asr inference: var renaming

ec75a49

sw005320 merged commit efd0d59 into espnet:master Dec 8, 2024

sw005320 reviewed Dec 8, 2024

View reviewed changes

Shikhar-S deleted the clotho_dcase23 branch December 8, 2024 21:16

Shikhar-S mentioned this pull request Mar 2, 2025

Audioverse, BEATs pre-training and Dasheng audio encoder #6052

Open

Shikhar-S pushed a commit to Shikhar-S/espnet that referenced this pull request Mar 13, 2025

Merge pull request espnet#5967 from Shikhar-S/clotho_dcase23

bb515d9

Clotho_v2 Audio Captioning (DCASE 2023 implementation)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clotho_v2 Audio Captioning (DCASE 2023 implementation)#5967

Clotho_v2 Audio Captioning (DCASE 2023 implementation)#5967
sw005320 merged 63 commits intoespnet:masterfrom
Shikhar-S:clotho_dcase23

Shikhar-S commented Nov 30, 2024 •

edited

Loading

Uh oh!

codecov bot commented Dec 2, 2024 •

edited

Loading

Uh oh!

ftshijt left a comment

Uh oh!

Uh oh!

Shikhar-S commented Dec 4, 2024

Uh oh!

sw005320 Dec 4, 2024

Uh oh!

Shikhar-S Dec 7, 2024

Uh oh!

sw005320 left a comment

Uh oh!

Shikhar-S commented Dec 7, 2024

Uh oh!

sw005320 Dec 8, 2024

Uh oh!

sw005320 commented Dec 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Shikhar-S commented Nov 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What?

Why?

See also

Uh oh!

codecov bot commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ftshijt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Shikhar-S commented Dec 4, 2024

Uh oh!

sw005320 Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

Shikhar-S Dec 7, 2024

Choose a reason for hiding this comment

Uh oh!

sw005320 left a comment

Choose a reason for hiding this comment

Uh oh!

Shikhar-S commented Dec 7, 2024

Uh oh!

sw005320 Dec 8, 2024

Choose a reason for hiding this comment

Uh oh!

sw005320 commented Dec 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Shikhar-S commented Nov 30, 2024 •

edited

Loading

codecov bot commented Dec 2, 2024 •

edited

Loading