Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add OWSM-CTC#5933

Merged
sw005320 merged 17 commits intoespnet:masterfrom
pyf98:owsm-ctc-pr
Nov 12, 2024
Merged

Add OWSM-CTC#5933
sw005320 merged 17 commits intoespnet:masterfrom
pyf98:owsm-ctc-pr

Conversation

@pyf98
Copy link
Collaborator

@pyf98 pyf98 commented Oct 22, 2024

What?

This PR adds OWSM-CTC: https://aclanthology.org/2024.acl-long.549/

TODO:

  • Add OWSM-CTC model code
  • Add OWSM-CTC recipe
  • Verify loading pre-trained model
  • Write unit tests

@codecov
Copy link

codecov bot commented Oct 25, 2024

Codecov Report

Attention: Patch coverage is 14.08451% with 61 lines in your changes missing coverage. Please review.

Project coverage is 48.03%. Comparing base (ba092ad) to head (ce075c4).
Report is 24 commits behind head on master.

Files with missing lines Patch % Lines
espnet2/train/preprocessor.py 15.00% 51 Missing ⚠️
...et/nets/pytorch_backend/transformer/subsampling.py 9.09% 10 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (ba092ad) and HEAD (ce075c4). Click for more details.

HEAD has 9 uploads less than BASE
Flag BASE (ba092ad) HEAD (ce075c4)
test_python_espnet2 4 0
test_integration_espnetez 3 0
test_utils 2 0
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5933      +/-   ##
==========================================
- Coverage   55.60%   48.03%   -7.58%     
==========================================
  Files         824      528     -296     
  Lines       76042    47144   -28898     
==========================================
- Hits        42286    22647   -19639     
+ Misses      33756    24497    -9259     
Flag Coverage Δ
test_integration_espnet2 48.03% <14.08%> (?)
test_integration_espnetez ?
test_python_espnet2 ?
test_utils ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pyf98 pyf98 changed the title [WIP] Add OWSM-CTC Add OWSM-CTC Oct 28, 2024
@sw005320 sw005320 added this to the v.202412 milestone Oct 28, 2024
@sw005320
Copy link
Contributor

@jctian98, can you review this PR?


The training data follows the same format as the encoder-decoder OWSM v3.1, except that timestamps are removed from the `text` file. Please first follow the `egs2/owsm_v3.1/s2t1` recipe to prepare OWSM data, and then convert `text` into the new format by running `python local/convert_owsm_data.py` (the path to the BPE tokenizer needs to be modified to your path).

## Pre-trained Model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to use your own style here, but if we have our classical information about the configurations written in the other README.md, that would be more informative and more reproducible.

@sw005320
Copy link
Contributor

@jctian98, this is a reminder.
Can you review this PR?

@jctian98
Copy link
Collaborator

Sorry for my delay, will review it by the end of tomorrow.

@jctian98
Copy link
Collaborator

jctian98 commented Nov 1, 2024

The code quality is very good, and nice job! @pyf98 .
I just added a few comments to request some further clarification.

Additional comments:
(1) since we use FlashAttention that is not included in ESPnet before, can we also add an installer for flash-attn?
(1.1) update: sorry, didn't notice we already have the installer for flash attention. we can skip it.
(2) It seems we intentionally included some legacy code from the previous ASR/S2T codebase.
(2.1) If we made some modifications and then yielded a duplicated file, better to add some comments to clarify the major difference (e.g., E-branchformer encoder and its layers). If it doesn't cost too much effort, can we consider merging them into the previous modules, or using an inherited class?
(2.2) Some previous modules are included, but I'm not sure if they have been justified by some test cases and real experiments. E.g., we support >10 kinds of encoder architectures in the code, but maybe mainly used 1-2 of them in real practice; we include the LM in CTC inference, but our recipe doesn't train an LM.

The details are good, I just raised these comments at the philosophy level. Any solution is perfect for me, and thanks for the contribution!

@sw005320
Copy link
Contributor

@pyf98, any update?
We have a lot of follow up activities from this PR.

@pyf98
Copy link
Collaborator Author

pyf98 commented Nov 11, 2024

Thanks for all the comments. I have fixed them.

The LM integration is not used now, but I'm keeping it because it is theoretically possible to integrate an LM.

@sw005320
Copy link
Contributor

Thanks, @pyf98.
@jctian98, is it OK?

@jctian98
Copy link
Collaborator

I think it's ok! Thanks for the response! @pyf98

@sw005320
Copy link
Contributor

Thanks, @pyf98!

@sw005320 sw005320 merged commit 5971b1f into espnet:master Nov 12, 2024
@pyf98 pyf98 deleted the owsm-ctc-pr branch November 12, 2024 14:58
Shikhar-S pushed a commit to Shikhar-S/espnet that referenced this pull request Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants