Thanks to visit codestin.com
Credit goes to github.com

Skip to content

doc: update OWSM data preparation instructions#6026

Merged
sw005320 merged 4 commits intoespnet:masterfrom
kalvinchang:patch-2
Jan 28, 2025
Merged

doc: update OWSM data preparation instructions#6026
sw005320 merged 4 commits intoespnet:masterfrom
kalvinchang:patch-2

Conversation

@kalvinchang
Copy link
Contributor

What?

  • Updating OWSM data preparation instructions

Why?

  • I ran into issues while preparing the OWSM data and want to ensure future users do not ran into the same issues

@sw005320 sw005320 requested a review from pyf98 January 23, 2025 16:48
@sw005320 sw005320 added this to the v.202503 milestone Jan 23, 2025
@sw005320 sw005320 added the OWSM Open Whisper-style Speech Model label Jan 23, 2025
@codecov
Copy link

codecov bot commented Jan 23, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 12.70%. Comparing base (ef34ad8) to head (30794ca).
Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (ef34ad8) and HEAD (30794ca). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (ef34ad8) HEAD (30794ca)
test_utils 1 0
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6026      +/-   ##
==========================================
- Coverage   20.64%   12.70%   -7.95%     
==========================================
  Files          93      858     +765     
  Lines       10195    80496   +70301     
==========================================
+ Hits         2105    10226    +8121     
- Misses       8090    70270   +62180     
Flag Coverage Δ
test_python_espnetez 12.70% <ø> (?)
test_utils ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sw005320
Copy link
Contributor

@pyf98, is this PR OK to merge?

- This should not contain any special tokens except for `<na>`. In the example above, take the text between `<sop>` and `<sos>` and put it here.
- `text.ctc` contains the ASR transcript without any special token, which is used for the CTC loss. For ASR utterances, this can be derived from `text`, but for ST utterances, this is in a different language. If the ASR transcription is not available, `<na>` will be used.

- This should not contain any special tokens. Just take the text between `<task>` and `<eos>` and put it here (no timestamps).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just take the text between <task> and <eos> and put it here (no timestamps).

For ASR, yes. But for ST, text.ctc is different text. text.ctc is the ASR transcript.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! just fixed

@sw005320
Copy link
Contributor

Is it OK to merge this PR?

@pyf98
Copy link
Collaborator

pyf98 commented Jan 28, 2025

LGTM!

@sw005320 sw005320 merged commit 29a6e5a into espnet:master Jan 28, 2025
38 checks passed
Shikhar-S pushed a commit to Shikhar-S/espnet that referenced this pull request Mar 13, 2025
doc: update OWSM data preparation instructions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ESPnet2 OWSM Open Whisper-style Speech Model README

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants