Thanks to visit codestin.com
Credit goes to github.com

Skip to content

New Recipe of tts2+aishell3#5849

Merged
ftshijt merged 16 commits intoespnet:masterfrom
Tsukasane:tts2_aishell3
Aug 22, 2024
Merged

New Recipe of tts2+aishell3#5849
ftshijt merged 16 commits intoespnet:masterfrom
Tsukasane:tts2_aishell3

Conversation

@Tsukasane
Copy link
Contributor

What?

Implement discrete tts on aishell-3 corpus. Create a new recipe.
Make minor adjustments to the template.

Why?

Enhance supported recipe to implement discrete TTS on a Mandarin dataset.
Minor adjustments for successful execution.

Facilitate code checking.

@ftshijt ftshijt added Recipe TTS Text-to-speech labels Jul 26, 2024
@ftshijt ftshijt added this to the v.202405 milestone Jul 26, 2024
Copy link
Collaborator

@ftshijt ftshijt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. Very cool update (I'm especially impressed by the detailed doc in README.md)

I added my comments as follows:

--inference_config "${inference_config}" \
--train_set "${train_set}" \
--valid_set "${valid_set}" \
--test_sets "${test_sets}" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--test_sets "${test_sets}" \

Given that the model is used for generate teacher label, we do not need this line

# if you want to use officially provided phoneme text (better for the quality)
train_set=train_no_dev_phn
valid_set=dev_phn
test_sets="dev_phn test_phn"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test_sets="dev_phn test_phn"

@@ -0,0 +1,97 @@
#!/usr/bin/env bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the preparation is the same (correct me if I'm wrong), you can simply use symlink

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are minor modifications compared to tts1/local/data.sh, as commented on #line37(add mkdir) and #line82(train_phn_no_dev -> train_no_dev_phn).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since they are general fixes, how about directly modify the original file and use symlink here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

speech,
speech_lengths,
feats_lengths=discrete_feats_lengths,
feats_lengths=discrete_feats_lengths, #
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
feats_lengths=discrete_feats_lengths, #
feats_lengths=discrete_feats_lengths,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert back

speech,
speech_lengths,
feats_lengths=discrete_feats_lengths,
feats_lengths=discrete_feats_lengths, #
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
feats_lengths=discrete_feats_lengths, #
feats_lengths=discrete_feats_lengths,

##########################################################
# OTHER TRAINING SETTING #
##########################################################
num_iters_per_epoch: 500 # number of iterations per epoch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of iters per epoch seems to be small to me (please double check).

##########################################################
# OTHER TRAINING SETTING #
##########################################################
num_iters_per_epoch: 100 # number of iterations per epoch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of iter_per_epoch seems very small to me. Please double check

Comment on lines +10 to +15
threshold: 0.5 # threshold to stop the generation
maxlenratio: 10.0 # maximum length of generated samples = input length * maxlenratio
minlenratio: 0.0 # minimum length of generated samples = input length * minlenratio
use_att_constraint: false # Whether to use attention constraint, which is introduced in Deep Voice 3
backward_window: 1 # Backward window size in the attention constraint
forward_window: 3 # Forward window size in the attention constraint
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it is for teacher forcing, the config here is not using at all. We may simply remove it (but you may want to add use_teacher_forcing: false here

Comment on lines +78 to +82
vim path/to/train_hubert.txt
:r path/to/dev_hubert.txt
:r path/to/test_hubert.txt
:w path/to/newfile_all.txt
:q!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is cat would be easier?

</table>


* CER is currently unfilled since it requires an additional asr model.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For CER evaluation, you can use whisper-large

Copy link
Collaborator

@ftshijt ftshijt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making it work. Please fix a few issues listed for wrapping up the project.

@@ -0,0 +1,1215 @@
#!/usr/bin/env bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, please use symlink for the template instead of copying it



@typechecked
# @typechecked NOTE(yiwen) --output_dir "${_logdir}"/output.JOB \ format like this cannot pass typecheck, but it is str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! In that case, you may consider changing
output_dir: str to output_dir: Union[Path, str]

@ftshijt
Copy link
Collaborator

ftshijt commented Aug 21, 2024

The CI test might be from the additional lines in audio sampling rate. You may consider running local test of mini_an4 located at https://github.com/espnet/espnet/blob/master/ci/test_integration_espnet2.sh#L223-L232 to identify the issue

def load_audio(self, path: str, ref_len: Optional[int] = None):
wav, sr = sf.read(path)
assert sr == self.sample_rate, sr
# assert sr == self.sample_rate, sr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I may miss it, but is there specific reason that you would like to comment out the line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the audio sample rate(sr) is different from the sample rate, then this assertion cannot pass. But the mismatch is the reason we added the audio resample here? (I followed PR#5795)

Comment on lines +595 to +600
# (en hubert)
# s3prl_conf="{upstream=${s3prl_upstream_name}}"
# kmeans_feature_type=s3prl
# kmeans_feature_conf="{type=${kmeans_feature_type},conf={s3prl_conf=${s3prl_conf},download_dir=ckpt,multilayer_feature=False,layer=${feature_layer}}}"
# (zh hubert)
s3prl_conf="{upstream=${s3prl_upstream_name},path_or_url=TencentGameMate/chinese-hubert-large}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the default behavior is not recommended. Please keep the English hubert option and also add additional arguments regarding the new argument introduced.

@Tsukasane
Copy link
Contributor Author

The CI test might be from the additional lines in audio sampling rate. You may consider running local test of mini_an4 located at https://github.com/espnet/espnet/blob/master/ci/test_integration_espnet2.sh#L223-L232 to identify the issue

Okay I run it. It seems to have some type mismatch. Now the mini_an4 test can successfully finish locally. I will update the code to see whether it can pass the CI now.

@codecov
Copy link

codecov bot commented Aug 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 47.93%. Comparing base (e0dd1cf) to head (b8d56cf).
Report is 375 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (e0dd1cf) and HEAD (b8d56cf). Click for more details.

HEAD has 6 uploads less than BASE
Flag BASE (e0dd1cf) HEAD (b8d56cf)
test_utils 4 0
test_python_espnet2 2 0
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5849      +/-   ##
==========================================
- Coverage   54.88%   47.93%   -6.95%     
==========================================
  Files         776      500     -276     
  Lines       71428    44688   -26740     
==========================================
- Hits        39203    21423   -17780     
+ Misses      32225    23265    -8960     
Flag Coverage Δ
test_integration_espnet2 47.93% <ø> (?)
test_python_espnet2 ?
test_utils ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ftshijt
Copy link
Collaborator

ftshijt commented Aug 22, 2024

Many thanks for the updates! LGTM!

@ftshijt ftshijt merged commit 38cc9e8 into espnet:master Aug 22, 2024
@sw005320
Copy link
Contributor

@Tsukasane, it is not a critical bug, but we have the following issue, which would be caused by your PR
https://github.com/espnet/espnet/actions/runs/10573141689/job/29339792526?pr=5862#step:8:812
Can you fix it?

I checked https://github.com/espnet/espnet/blob/master/egs2/aishell3/tts2/tts.sh and found that you specified the absolute path.
This should be fixed.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants