Codestin Search App

kalvinchang · 2025-01-23T16:41:37Z

What?

Updating OWSM data preparation instructions

Why?

I ran into issues while preparing the OWSM data and want to ensure future users do not ran into the same issues

… prep

egs2/TEMPLATE/s2t1/README.md

codecov · 2025-01-23T21:14:09Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 12.70%. Comparing base (ef34ad8) to head (30794ca).
Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (ef34ad8) and HEAD (30794ca). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (ef34ad8) HEAD (30794ca)

test_utils 1 0

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6026      +/-   ##
==========================================
- Coverage   20.64%   12.70%   -7.95%     
==========================================
  Files          93      858     +765     
  Lines       10195    80496   +70301     
==========================================
+ Hits         2105    10226    +8121     
- Misses       8090    70270   +62180

Flag	Coverage Δ
test_python_espnetez	`12.70% <ø> (?)`
test_utils	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sw005320 · 2025-01-28T14:53:24Z

@pyf98, is this PR OK to merge?

pyf98 · 2025-01-28T15:05:02Z

egs2/TEMPLATE/s2t1/README.md

+    - This should not contain any special tokens except for `<na>`. In the example above, take the text between `<sop>` and `<sos>` and put it here.
 - `text.ctc` contains the ASR transcript without any special token, which is used for the CTC loss. For ASR utterances, this can be derived from `text`, but for ST utterances, this is in a different language. If the ASR transcription is not available, `<na>` will be used.
-
+    - This should not contain any special tokens. Just take the text between `<task>` and `<eos>` and put it here (no timestamps).


Just take the text between <task> and <eos> and put it here (no timestamps).

For ASR, yes. But for ST, text.ctc is different text. text.ctc is the ASR transcript.

good catch! just fixed

sw005320 · 2025-01-28T20:32:53Z

Is it OK to merge this PR?

pyf98 · 2025-01-28T20:35:59Z

LGTM!

doc: update OWSM data preparation instructions

doc: update OWSM data preparation instructions

ae69bce

mergify bot added ESPnet2 README labels Jan 23, 2025

sw005320 requested a review from pyf98 January 23, 2025 16:48

sw005320 added this to the v.202503 milestone Jan 23, 2025

sw005320 added the OWSM Open Whisper-style Speech Model label Jan 23, 2025

doc: clarify the format of text, text.prev, and text.ctc in OWSM data…

1cfe8df

… prep

pyf98 reviewed Jan 23, 2025

View reviewed changes

egs2/TEMPLATE/s2t1/README.md Outdated Show resolved Hide resolved

pyf98 reviewed Jan 23, 2025

View reviewed changes

egs2/TEMPLATE/s2t1/README.md Outdated Show resolved Hide resolved

pyf98 reviewed Jan 23, 2025

View reviewed changes

egs2/TEMPLATE/s2t1/README.md Outdated Show resolved Hide resolved

doc: fix OWSM v3.1 tips

30794ca

pyf98 reviewed Jan 28, 2025

View reviewed changes

fix: clarify text vs text.ctc for speech translation (OWSM)

3ad8d0d

sw005320 merged commit 29a6e5a into espnet:master Jan 28, 2025
38 checks passed

Shikhar-S pushed a commit to Shikhar-S/espnet that referenced this pull request Mar 13, 2025

Merge pull request espnet#6026 from kalvinchang/patch-2

9abb7e4

doc: update OWSM data preparation instructions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc: update OWSM data preparation instructions#6026

doc: update OWSM data preparation instructions#6026
sw005320 merged 4 commits intoespnet:masterfrom
kalvinchang:patch-2

kalvinchang commented Jan 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 23, 2025 •

edited

Loading

Uh oh!

sw005320 commented Jan 28, 2025

Uh oh!

pyf98 Jan 28, 2025

Uh oh!

kalvinchang Jan 28, 2025

Uh oh!

sw005320 commented Jan 28, 2025

Uh oh!

pyf98 commented Jan 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kalvinchang commented Jan 23, 2025

What?

Why?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sw005320 commented Jan 28, 2025

Uh oh!

pyf98 Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

kalvinchang Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

sw005320 commented Jan 28, 2025

Uh oh!

pyf98 commented Jan 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Jan 23, 2025 •

edited

Loading