New Recipe of tts2+aishell3#5849
New Recipe of tts2+aishell3#5849ftshijt merged 16 commits intoespnet:masterfrom Tsukasane:tts2_aishell3
Conversation
for more information, see https://pre-commit.ci
ftshijt
left a comment
There was a problem hiding this comment.
Thanks for the contribution. Very cool update (I'm especially impressed by the detailed doc in README.md)
I added my comments as follows:
| --inference_config "${inference_config}" \ | ||
| --train_set "${train_set}" \ | ||
| --valid_set "${valid_set}" \ | ||
| --test_sets "${test_sets}" \ |
There was a problem hiding this comment.
| --test_sets "${test_sets}" \ |
Given that the model is used for generate teacher label, we do not need this line
| # if you want to use officially provided phoneme text (better for the quality) | ||
| train_set=train_no_dev_phn | ||
| valid_set=dev_phn | ||
| test_sets="dev_phn test_phn" |
There was a problem hiding this comment.
| test_sets="dev_phn test_phn" |
egs2/aishell3/tts2/local/data.sh
Outdated
| @@ -0,0 +1,97 @@ | |||
| #!/usr/bin/env bash | |||
There was a problem hiding this comment.
As the preparation is the same (correct me if I'm wrong), you can simply use symlink
There was a problem hiding this comment.
There are minor modifications compared to tts1/local/data.sh, as commented on #line37(add mkdir) and #line82(train_phn_no_dev -> train_no_dev_phn).
There was a problem hiding this comment.
Since they are general fixes, how about directly modify the original file and use symlink here?
espnet2/tts2/espnet_model.py
Outdated
| speech, | ||
| speech_lengths, | ||
| feats_lengths=discrete_feats_lengths, | ||
| feats_lengths=discrete_feats_lengths, # |
There was a problem hiding this comment.
| feats_lengths=discrete_feats_lengths, # | |
| feats_lengths=discrete_feats_lengths, |
espnet2/tts2/espnet_model.py
Outdated
| speech, | ||
| speech_lengths, | ||
| feats_lengths=discrete_feats_lengths, | ||
| feats_lengths=discrete_feats_lengths, # |
There was a problem hiding this comment.
| feats_lengths=discrete_feats_lengths, # | |
| feats_lengths=discrete_feats_lengths, |
| ########################################################## | ||
| # OTHER TRAINING SETTING # | ||
| ########################################################## | ||
| num_iters_per_epoch: 500 # number of iterations per epoch |
There was a problem hiding this comment.
The number of iters per epoch seems to be small to me (please double check).
| ########################################################## | ||
| # OTHER TRAINING SETTING # | ||
| ########################################################## | ||
| num_iters_per_epoch: 100 # number of iterations per epoch |
There was a problem hiding this comment.
The number of iter_per_epoch seems very small to me. Please double check
| threshold: 0.5 # threshold to stop the generation | ||
| maxlenratio: 10.0 # maximum length of generated samples = input length * maxlenratio | ||
| minlenratio: 0.0 # minimum length of generated samples = input length * minlenratio | ||
| use_att_constraint: false # Whether to use attention constraint, which is introduced in Deep Voice 3 | ||
| backward_window: 1 # Backward window size in the attention constraint | ||
| forward_window: 3 # Forward window size in the attention constraint |
There was a problem hiding this comment.
since it is for teacher forcing, the config here is not using at all. We may simply remove it (but you may want to add use_teacher_forcing: false here
egs2/aishell3/tts2/README.md
Outdated
| vim path/to/train_hubert.txt | ||
| :r path/to/dev_hubert.txt | ||
| :r path/to/test_hubert.txt | ||
| :w path/to/newfile_all.txt | ||
| :q! |
There was a problem hiding this comment.
my understanding is cat would be easier?
egs2/aishell3/tts2/README.md
Outdated
| </table> | ||
|
|
||
|
|
||
| * CER is currently unfilled since it requires an additional asr model. |
There was a problem hiding this comment.
For CER evaluation, you can use whisper-large
ftshijt
left a comment
There was a problem hiding this comment.
Thanks for making it work. Please fix a few issues listed for wrapping up the project.
egs2/aishell3/tts2/tts.sh
Outdated
| @@ -0,0 +1,1215 @@ | |||
| #!/usr/bin/env bash | |||
There was a problem hiding this comment.
Again, please use symlink for the template instead of copying it
espnet2/bin/tts2_inference.py
Outdated
|
|
||
|
|
||
| @typechecked | ||
| # @typechecked NOTE(yiwen) --output_dir "${_logdir}"/output.JOB \ format like this cannot pass typecheck, but it is str |
There was a problem hiding this comment.
Thanks! In that case, you may consider changing
output_dir: str to output_dir: Union[Path, str]
|
The CI test might be from the additional lines in audio sampling rate. You may consider running local test of mini_an4 located at https://github.com/espnet/espnet/blob/master/ci/test_integration_espnet2.sh#L223-L232 to identify the issue |
| def load_audio(self, path: str, ref_len: Optional[int] = None): | ||
| wav, sr = sf.read(path) | ||
| assert sr == self.sample_rate, sr | ||
| # assert sr == self.sample_rate, sr |
There was a problem hiding this comment.
sorry I may miss it, but is there specific reason that you would like to comment out the line?
There was a problem hiding this comment.
I think if the audio sample rate(sr) is different from the sample rate, then this assertion cannot pass. But the mismatch is the reason we added the audio resample here? (I followed PR#5795)
egs2/TEMPLATE/tts2/tts2.sh
Outdated
| # (en hubert) | ||
| # s3prl_conf="{upstream=${s3prl_upstream_name}}" | ||
| # kmeans_feature_type=s3prl | ||
| # kmeans_feature_conf="{type=${kmeans_feature_type},conf={s3prl_conf=${s3prl_conf},download_dir=ckpt,multilayer_feature=False,layer=${feature_layer}}}" | ||
| # (zh hubert) | ||
| s3prl_conf="{upstream=${s3prl_upstream_name},path_or_url=TencentGameMate/chinese-hubert-large}" |
There was a problem hiding this comment.
Changing the default behavior is not recommended. Please keep the English hubert option and also add additional arguments regarding the new argument introduced.
Okay I run it. It seems to have some type mismatch. Now the mini_an4 test can successfully finish locally. I will update the code to see whether it can pass the CI now. |
for more information, see https://pre-commit.ci
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5849 +/- ##
==========================================
- Coverage 54.88% 47.93% -6.95%
==========================================
Files 776 500 -276
Lines 71428 44688 -26740
==========================================
- Hits 39203 21423 -17780
+ Misses 32225 23265 -8960
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
Many thanks for the updates! LGTM! |
|
@Tsukasane, it is not a critical bug, but we have the following issue, which would be caused by your PR I checked https://github.com/espnet/espnet/blob/master/egs2/aishell3/tts2/tts.sh and found that you specified the absolute path.
|

What?
Implement discrete tts on aishell-3 corpus. Create a new recipe.
Make minor adjustments to the template.
Why?
Enhance supported recipe to implement discrete TTS on a Mandarin dataset.
Minor adjustments for successful execution.
Facilitate code checking.