Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Clotho Audio Captioning#5915

Closed
Shikhar-S wants to merge 33 commits intoespnet:masterfrom
Shikhar-S:clotho_asr
Closed

Clotho Audio Captioning#5915
Shikhar-S wants to merge 33 commits intoespnet:masterfrom
Shikhar-S:clotho_asr

Conversation

@Shikhar-S
Copy link
Contributor

@Shikhar-S Shikhar-S commented Sep 30, 2024

What?

  1. Add clotho audio captioning recipe
  2. Ads BEATs encoder to ESPnet
  3. Add configs for BEATs encoder, BART decoder model
  4. Add script to evaluate results using FENSE metric.
  5. Add data preparation scripts for audiocaps and clotho_chatgpt mixup as described in this paper.

Audio captioning Results

cider_d : 0.39208390185921266
spice : 0.1247477210504762
spider : 0.25841581145484444
sbert_sim : 0.5130076380936723
fer : 0.03636363636363636
fense : 0.49523599610873387
meteor : 0.17313377768322902
rouge_l : 0.3479915684386986
fer.add_tail_prob : 0.04684687778353691
fer.repeat_event_prob: 0.06736405938863754
fer.repeat_adv_prob : 0.0016691883793100715
fer.remove_conj_prob: 0.11576957255601883
fer.remove_verb_prob: 0.19993385672569275
fer.error_prob : 0.3197185695171356
spider_fl : 0.24773266080817882


Why?

Towards open-sourcing the winning implementation for DCASE AAC challenge.

@Shikhar-S
Copy link
Contributor Author

Shikhar-S commented Sep 30, 2024

Still working towards finding correct hyper-parameters for the BEATs enc + BART dec model. Here is the detailed report on current issues. Please feel free to add comments.

@mergify
Copy link
Contributor

mergify bot commented Nov 1, 2024

This pull request is now in conflict :(

@mergify mergify bot added the conflicts label Nov 1, 2024
@mergify mergify bot removed the conflicts label Nov 5, 2024
@Shikhar-S
Copy link
Contributor Author

Shikhar-S commented Nov 5, 2024

Layernorm bias and variance were getting re-initialized after pre-trained model initialization in the earlier version. Fixed in this revision.

@Shikhar-S Shikhar-S marked this pull request as ready for review November 5, 2024 15:30
@Shikhar-S
Copy link
Contributor Author

Updated README and paths to huggingface models. Please feel free to review now.

@sw005320 sw005320 added this to the v.202412 milestone Nov 14, 2024
@sw005320
Copy link
Contributor

@Shikhar-S, please fix the CI error
https://github.com/espnet/espnet/actions/runs/11830412428/job/32963877215?pr=5915

@sw005320
Copy link
Contributor

@Jungjee, can you review this PR?

@sw005320 sw005320 requested a review from ftshijt November 14, 2024 17:25
@codecov
Copy link

codecov bot commented Nov 14, 2024

Codecov Report

Attention: Patch coverage is 9.96979% with 596 lines in your changes missing coverage. Please review.

Project coverage is 38.22%. Comparing base (c07ed8e) to head (e4af8fc).
Report is 16 commits behind head on master.

Files with missing lines Patch % Lines
espnet2/asr/encoder/beats_encoder.py 9.84% 595 Missing ⚠️
espnet2/tasks/abs_task.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #5915       +/-   ##
===========================================
+ Coverage   14.97%   38.22%   +23.25%     
===========================================
  Files         827      564      -263     
  Lines       77263    50856    -26407     
===========================================
+ Hits        11570    19442     +7872     
+ Misses      65693    31414    -34279     
Flag Coverage Δ
test_integration_espnetez 38.22% <9.96%> (?)
test_python_espnetez ?
test_utils ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@sw005320
Copy link
Contributor

@ftshijt, can you also review this PR?

Copy link
Collaborator

@ftshijt ftshijt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update and your contribution! I left a few minor comments as follows:

@sw005320
Copy link
Contributor

Is it necessary to split the recipe with pre-training and fine-tuning?
As an "all-in-one" recipe, it is better to handle both in a single run.sh

@Shikhar-S
Copy link
Contributor Author

Is it necessary to split the recipe with pre-training and fine-tuning? As an "all-in-one" recipe, it is better to handle both in a single run.sh

I see your point and it should be doable in a single script. I will update the scripts to make it work end to end, from data prep to printing final fine-tuned model's numbers.

@Shikhar-S
Copy link
Contributor Author

Closing this PR, added a better implementation in #5967

@Shikhar-S Shikhar-S closed this Nov 30, 2024
@Shikhar-S Shikhar-S deleted the clotho_asr branch December 8, 2024 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants