Thanks to visit codestin.com
Credit goes to github.com

Skip to content

adapter modules update#5034

Closed
jwrh wants to merge 50 commits intoespnet:masterfrom
jwrh:master
Closed

adapter modules update#5034
jwrh wants to merge 50 commits intoespnet:masterfrom
jwrh:master

Conversation

@jwrh
Copy link
Contributor

@jwrh jwrh commented Mar 19, 2023

Hi, this is the first part of FindAdaptNet PR update -- the adapter module.

@sw005320 sw005320 requested a review from simpleoier March 21, 2023 12:24
@sw005320 sw005320 added New Features ASR Automatic speech recogntion labels Mar 21, 2023
@sw005320 sw005320 added this to the v.202303 milestone Mar 21, 2023
Copy link
Collaborator

@simpleoier simpleoier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I left some comments.

@jwrh jwrh mentioned this pull request Mar 26, 2023
@@ -0,0 +1,43 @@
## Use adapters for ASR in ESPnet2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add something about adapter in the espnet top level readme and tutorial doc as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, I would add a section in the main README

Copy link
Contributor Author

@jwrh jwrh Apr 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • section added in top level README - ASR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, where would be a good place to add in the tutorial docs? Right now I have the full tutorial for adapters in adapter_utils/README.md.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi just a quick reminder that the updated documentes have been updated to contain the documentation of adapter modules and how to use them. Could you give the PR in its current state a quick look?

@sw005320
Copy link
Contributor

@simpleoier, could I ask you to review this again?

Copy link
Collaborator

@simpleoier simpleoier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong. I feel this PR is not complete. The Adapter function is not able to use after this PR. For example, when adding adapter-related configs, where are they used?

@@ -0,0 +1,43 @@
## Use adapters for ASR in ESPnet2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self,
orig_dim: int,
down_dim: int,
layer_norm: str = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you compare the performance of those? If not, I think we can do 1 as default.

@sw005320
Copy link
Contributor

Can you add unit test and integration test?

@jwrh
Copy link
Contributor Author

jwrh commented Jul 8, 2023

Hi, I've added some unit testing (tentative) for adapters modules. Integration testing, I think, can be added in the training scheme PR as discussed earlier here.

Copy link
Collaborator

@simpleoier simpleoier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! Please address the CI test errors first.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clean this config a bit? For example, the number of white space for each indent, unnecessary empty lines in each block, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do!

Modified fairseq's TransformerSentenceEncoderLayer for wav2vec2 with adapters.
Link:
https://github.com/
facebookresearch/fairseq/blob/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment: why new line for this?

activation_fn: str = "relu",
layer_norm_first: bool = False,
adapter_down_dim: int = 192,
) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comments to explain the arguments here. So new users can easily understand what to tune.

self_attn_padding_mask: torch.Tensor = None,
need_weights: bool = False,
att_args=None,
) -> torch.Tensor:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. It's better to add some the shape information of each input tensor. It may help users to debug.


return

# freeze all layers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor suggestion: is it cleaner / easier to understand if you merge this and the following parameter-freezing lines?

Note this is done for model training, so modifications would be at __stage 11__ of ESPnet2 recipes.
### Prerequisites
1. Install [S3PRL](https://github.com/s3prl/s3prl) by `tools/installers/install_s3prl.sh`.
2. Wav2Vec is needed, [fairseq](https://github.com/pytorch/fairseq) should be installed by `tools/installers/install_fairseq.sh`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check this? I think the s3prl now doesn't depend on the fairseq to use adapters. If necessary, you may need to update your adapter code a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch! Yes, indeed the fairseq part is unnecessary since s3prl has a prototype of TransformerSentenceEncoderLayer in their codebase which I can refer to when testing the whole module (Originally importing fairseq was for using the TransformerSentenceEncoderLayer at testing stage). Will remove this prereq in the next commit

from espnet2.asr.frontend.adapter_utils import *


def test_add_adapters_wav2vec2():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this code support other ssl models, e.g. hubert / wavlm? Another concern is that we need to download the wav2vec2 checkpoint if we use this. It will slow down the CI test a lot and may cause some issues if downloading fails. Can you change some configs to avoid it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the code does not. I am certainly looking to support those! Do you think it it's better to do with currently wav2vec2 only or support those model within this PR?

The current code gets around downloading the wav2vec2 checkpoint by onlying simulating with 3 TransformerSentenceEncoderLayer s and adding adapters to 2 of them. Does this work?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the information.

  1. Can you please remind me if you support other models in your paper?
  2. OK. If it skip downloading ckpt during, it is good. But another concern is how you ensure that it is compatible if fairseq updates the related implementation?

Copy link
Contributor Author

@jwrh jwrh Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.We did also support HuBERT in our paper, do you prefer integrating that part of code in this PR as well?
2.Yes, the problem about updated implementation is very real and it seems (from my view) that we could either do

  • pulling the wav2vec2.0 model and assume we don't know its internal implementation. This way I think we do not need to worry about fairseq updating implementation. But it's costly to pull the model and run them
  • or current way which gets around the need of pulling the whole model. But this way assumes knowledge of wav2vec's implementation and may fail if implementation are updated.
    which one do we prefer? Any other way of testing is greatly appreciated!

@kan-bayashi kan-bayashi modified the milestones: v.202307, v.202312 Aug 3, 2023
@kan-bayashi kan-bayashi modified the milestones: v.202310, v.202312 Oct 25, 2023
@kan-bayashi kan-bayashi modified the milestones: v.202312, v.202405 Feb 6, 2024
@Fhrozen Fhrozen modified the milestones: v.202409, v.202412 Oct 1, 2024
@Fhrozen Fhrozen modified the milestones: v.202412, v.202503 Dec 4, 2024
@Fhrozen Fhrozen modified the milestones: v.202503, v.202506 Mar 27, 2025
@github-actions
Copy link

This PR is stale because it has been open for 90 days with no activity.
It will be closed if no further activity occurs.
Thank you for your contributions.

@github-actions github-actions bot added the Stale For probot label Jun 27, 2025
@github-actions
Copy link

This PR is closed. Please re-open if needed.

@github-actions github-actions bot closed this Jul 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants