Add phonemized LibriTTS ASR recipe#5466
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5466 +/- ##
==========================================
- Coverage 77.14% 75.31% -1.84%
==========================================
Files 684 707 +23
Lines 62713 64942 +2229
==========================================
+ Hits 48383 48913 +530
- Misses 14330 16029 +1699
Flags with carried forward coverage won't be shown. Click here to find out more. see 49 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
Many thanks! |
ftshijt
left a comment
There was a problem hiding this comment.
Thanks for the contribution. Some minor comments/questions:
| ./asr.sh \ | ||
| --lang en \ | ||
| --ngpu 2 \ | ||
| --nbpe 100 \ |
There was a problem hiding this comment.
Is 100 bpe size an empirical good number?
There was a problem hiding this comment.
Yes, it provided the best average phone error rate in the preliminary experiments:
| LS | VCTK | Avg. | |||
| dev | test | dev | test | ||
| Char | 7.5 | 7.4 | 7.9 | 11.7 | 8.63 |
| BPE 100 | 7.4 | 7.2 | 6.6 | 10.7 | 7.98 |
| BPE 200 | 7.0 | 6.9 | 7.2 | 11.1 | 8.05 |
I guess larger BPE size makes model too biased towards the words appearing in LibriTTS.
| text_phn = "".join(tokens).replace("<space>", " ") | ||
| otext.write(f"{utt} {text_phn}\n") | ||
|
|
||
| os.replace(f"{idir}/text.phn", f"{idir}/text") |
There was a problem hiding this comment.
maybe consider keep the original text
3800a13 to
be22248
Compare
for more information, see https://pre-commit.ci
|
Thanks for the review, @ftshijt ! |
|
Looks very cool! Many thanks for your contribution. |
What?
ASR recipe for LibriTTS with phonemized transcriptions.
Why?
As per discussion in #5393
See also
The system is similar to the system described in this paper.
I'll add results and model link in a week or two.