Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add phonemized LibriTTS ASR recipe#5466

Merged
ftshijt merged 2 commits intoespnet:masterfrom
akreal:phonemized-libritts
Oct 15, 2023
Merged

Add phonemized LibriTTS ASR recipe#5466
ftshijt merged 2 commits intoespnet:masterfrom
akreal:phonemized-libritts

Conversation

@akreal
Copy link
Contributor

@akreal akreal commented Oct 8, 2023

What?

ASR recipe for LibriTTS with phonemized transcriptions.

Why?

As per discussion in #5393

See also

The system is similar to the system described in this paper.

I'll add results and model link in a week or two.

@mergify mergify bot added the ESPnet2 label Oct 8, 2023
@codecov
Copy link

codecov bot commented Oct 8, 2023

Codecov Report

Merging #5466 (f76251b) into master (71dc9a3) will decrease coverage by 1.84%.
Report is 240 commits behind head on master.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #5466      +/-   ##
==========================================
- Coverage   77.14%   75.31%   -1.84%     
==========================================
  Files         684      707      +23     
  Lines       62713    64942    +2229     
==========================================
+ Hits        48383    48913     +530     
- Misses      14330    16029    +1699     
Flag Coverage Δ
test_configuration_espnet2 ∅ <ø> (∅)
test_integration_espnet1 65.67% <ø> (+0.13%) ⬆️
test_integration_espnet2 48.76% <ø> (-0.31%) ⬇️
test_python_espnet1 19.27% <ø> (-0.68%) ⬇️
test_python_espnet2 51.31% <ø> (-1.00%) ⬇️
test_utils 23.10% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 49 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sw005320 sw005320 added Recipe ASR Automatic speech recogntion labels Oct 8, 2023
@sw005320 sw005320 requested a review from ftshijt October 8, 2023 17:58
@sw005320
Copy link
Contributor

sw005320 commented Oct 8, 2023

Many thanks!
This is really useful.

Copy link
Collaborator

@ftshijt ftshijt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. Some minor comments/questions:

./asr.sh \
--lang en \
--ngpu 2 \
--nbpe 100 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 100 bpe size an empirical good number?

Copy link
Contributor Author

@akreal akreal Oct 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it provided the best average phone error rate in the preliminary experiments:

LS VCTK Avg.
dev test dev test
Char 7.5 7.4 7.9 11.7 8.63
BPE 100 7.4 7.2 6.6 10.7 7.98
BPE 200 7.0 6.9 7.2 11.1 8.05

I guess larger BPE size makes model too biased towards the words appearing in LibriTTS.

text_phn = "".join(tokens).replace("<space>", " ")
otext.write(f"{utt} {text_phn}\n")

os.replace(f"{idir}/text.phn", f"{idir}/text")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe consider keep the original text

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@akreal akreal force-pushed the phonemized-libritts branch from 3800a13 to be22248 Compare October 14, 2023 11:26
@mergify mergify bot added the README label Oct 14, 2023
@akreal akreal changed the title [WIP] Add phonemized LibriTTS ASR recipe Add phonemized LibriTTS ASR recipe Oct 14, 2023
@akreal
Copy link
Contributor Author

akreal commented Oct 14, 2023

Thanks for the review, @ftshijt !
I've addressed the comments and added README.

@ftshijt
Copy link
Collaborator

ftshijt commented Oct 15, 2023

Looks very cool! Many thanks for your contribution.

@ftshijt ftshijt merged commit 72fd7bf into espnet:master Oct 15, 2023
@akreal akreal deleted the phonemized-libritts branch October 29, 2023 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ASR Automatic speech recogntion ESPnet2 README Recipe

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants