Codestin Search App

Tsukasane · 2024-07-26T01:18:43Z

What?

Implement discrete tts on aishell-3 corpus. Create a new recipe.
Make minor adjustments to the template.

Why?

Enhance supported recipe to implement discrete TTS on a Mandarin dataset.
Minor adjustments for successful execution.

Facilitate code checking.

for more information, see https://pre-commit.ci

ftshijt

Thanks for the contribution. Very cool update (I'm especially impressed by the detailed doc in README.md)

I added my comments as follows:

ftshijt · 2024-07-26T07:46:19Z

egs2/aishell3/tts2/run_train_teacher.sh

+    --inference_config "${inference_config}" \
+    --train_set "${train_set}" \
+    --valid_set "${valid_set}" \
+    --test_sets "${test_sets}" \


Suggested change

--test_sets "${test_sets}" \

Given that the model is used for generate teacher label, we do not need this line

ftshijt · 2024-07-26T07:46:26Z

egs2/aishell3/tts2/run_train_teacher.sh

+# if you want to use officially provided phoneme text (better for the quality)
+train_set=train_no_dev_phn
+valid_set=dev_phn
+test_sets="dev_phn test_phn"


Suggested change

test_sets="dev_phn test_phn"

ftshijt · 2024-07-26T07:48:56Z

egs2/aishell3/tts2/local/data.sh

@@ -0,0 +1,97 @@
+#!/usr/bin/env bash


As the preparation is the same (correct me if I'm wrong), you can simply use symlink

There are minor modifications compared to tts1/local/data.sh, as commented on #line37(add mkdir) and #line82(train_phn_no_dev -> train_no_dev_phn).

Since they are general fixes, how about directly modify the original file and use symlink here?

ftshijt · 2024-07-26T07:50:57Z

espnet2/tts2/espnet_model.py

                speech,
                speech_lengths,
-                feats_lengths=discrete_feats_lengths,
+                feats_lengths=discrete_feats_lengths,  #


Suggested change

feats_lengths=discrete_feats_lengths, #

feats_lengths=discrete_feats_lengths,

Revert back

ftshijt · 2024-07-26T07:51:12Z

espnet2/tts2/espnet_model.py

                speech,
                speech_lengths,
-                feats_lengths=discrete_feats_lengths,
+                feats_lengths=discrete_feats_lengths,  #


Suggested change

feats_lengths=discrete_feats_lengths, #

feats_lengths=discrete_feats_lengths,

ftshijt · 2024-07-26T07:59:02Z

egs2/aishell3/tts2/conf/train_teacher.yaml

+##########################################################
+#                OTHER TRAINING SETTING                  #
+##########################################################
+num_iters_per_epoch: 500    # number of iterations per epoch


The number of iters per epoch seems to be small to me (please double check).

ftshijt · 2024-07-26T08:00:33Z

egs2/aishell3/tts2/conf/train_fastspeech2.yaml

+##########################################################
+#                OTHER TRAINING SETTING                  #
+##########################################################
+num_iters_per_epoch: 100  # number of iterations per epoch


The number of iter_per_epoch seems very small to me. Please double check

ftshijt · 2024-07-26T08:06:23Z

egs2/aishell3/tts2/conf/decode_teacher.yaml

+threshold: 0.5            # threshold to stop the generation
+maxlenratio: 10.0         # maximum length of generated samples = input length * maxlenratio
+minlenratio: 0.0          # minimum length of generated samples = input length * minlenratio
+use_att_constraint: false # Whether to use attention constraint, which is introduced in Deep Voice 3
+backward_window: 1        # Backward window size in the attention constraint
+forward_window: 3         # Forward window size in the attention constraint


since it is for teacher forcing, the config here is not using at all. We may simply remove it (but you may want to add use_teacher_forcing: false here

ftshijt · 2024-07-26T08:08:31Z

egs2/aishell3/tts2/README.md

+  vim path/to/train_hubert.txt
+  :r path/to/dev_hubert.txt
+  :r path/to/test_hubert.txt
+  :w path/to/newfile_all.txt
+  :q!


my understanding is cat would be easier?

ftshijt · 2024-07-26T08:10:10Z

egs2/aishell3/tts2/README.md

+</table>
+
+
+* CER is currently unfilled since it requires an additional asr model.


For CER evaluation, you can use whisper-large

for more information, see https://pre-commit.ci

ftshijt

Thanks for making it work. Please fix a few issues listed for wrapping up the project.

egs2/aishell3/tts1/local/data.sh

ftshijt · 2024-08-15T04:45:05Z

egs2/aishell3/tts2/tts.sh

@@ -0,0 +1,1215 @@
+#!/usr/bin/env bash


Again, please use symlink for the template instead of copying it

ftshijt · 2024-08-15T04:47:20Z

espnet2/bin/tts2_inference.py



-@typechecked
+# @typechecked NOTE(yiwen) --output_dir "${_logdir}"/output.JOB \    format like this cannot pass typecheck, but it is str


Thanks! In that case, you may consider changing
output_dir: str to output_dir: Union[Path, str]

espnet2/bin/tts_inference.py

for more information, see https://pre-commit.ci

ftshijt · 2024-08-21T07:21:23Z

The CI test might be from the additional lines in audio sampling rate. You may consider running local test of mini_an4 located at https://github.com/espnet/espnet/blob/master/ci/test_integration_espnet2.sh#L223-L232 to identify the issue

ftshijt · 2024-08-21T07:21:53Z

egs2/TEMPLATE/asr1/pyscripts/feats/ssl_feature_utils.py

    def load_audio(self, path: str, ref_len: Optional[int] = None):
        wav, sr = sf.read(path)
-        assert sr == self.sample_rate, sr
+        # assert sr == self.sample_rate, sr


sorry I may miss it, but is there specific reason that you would like to comment out the line?

I think if the audio sample rate(sr) is different from the sample rate, then this assertion cannot pass. But the mismatch is the reason we added the audio resample here? (I followed PR#5795)

ftshijt · 2024-08-21T07:23:05Z

egs2/TEMPLATE/tts2/tts2.sh

+        # (en hubert)
+        # s3prl_conf="{upstream=${s3prl_upstream_name}}"
+        # kmeans_feature_type=s3prl
+        # kmeans_feature_conf="{type=${kmeans_feature_type},conf={s3prl_conf=${s3prl_conf},download_dir=ckpt,multilayer_feature=False,layer=${feature_layer}}}"
+        # (zh hubert)
+        s3prl_conf="{upstream=${s3prl_upstream_name},path_or_url=TencentGameMate/chinese-hubert-large}"


Changing the default behavior is not recommended. Please keep the English hubert option and also add additional arguments regarding the new argument introduced.

Tsukasane · 2024-08-21T12:41:05Z

The CI test might be from the additional lines in audio sampling rate. You may consider running local test of mini_an4 located at https://github.com/espnet/espnet/blob/master/ci/test_integration_espnet2.sh#L223-L232 to identify the issue

Okay I run it. It seems to have some type mismatch. Now the mini_an4 test can successfully finish locally. I will update the code to see whether it can pass the CI now.

into tts2_aishell3

for more information, see https://pre-commit.ci

codecov · 2024-08-21T15:11:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 47.93%. Comparing base (e0dd1cf) to head (b8d56cf).
Report is 375 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (e0dd1cf) and HEAD (b8d56cf). Click for more details.

HEAD has 6 uploads less than BASE

Flag BASE (e0dd1cf) HEAD (b8d56cf)

test_utils 4 0

test_python_espnet2 2 0

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5849      +/-   ##
==========================================
- Coverage   54.88%   47.93%   -6.95%     
==========================================
  Files         776      500     -276     
  Lines       71428    44688   -26740     
==========================================
- Hits        39203    21423   -17780     
+ Misses      32225    23265    -8960

Flag	Coverage Δ
test_integration_espnet2	`47.93% <ø> (?)`
test_python_espnet2	`?`
test_utils	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ftshijt · 2024-08-22T14:46:27Z

Many thanks for the updates! LGTM!

sw005320 · 2024-08-28T00:57:00Z

@Tsukasane, it is not a critical bug, but we have the following issue, which would be caused by your PR
https://github.com/espnet/espnet/actions/runs/10573141689/job/29339792526?pr=5862#step:8:812
Can you fix it?

I checked https://github.com/espnet/espnet/blob/master/egs2/aishell3/tts2/tts.sh and found that you specified the absolute path.
This should be fixed.

@Tsukasane

See #5849 (comment) @Tsukasane

Yiwen Zhao added 2 commits July 25, 2024 20:47

add aishell3_tts2 recipe

27d0633

modify model functions

0ab1ea1

mergify bot added ESPnet2 README labels Jul 26, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

ef0138f

for more information, see https://pre-commit.ci

ftshijt added Recipe TTS Text-to-speech labels Jul 26, 2024

ftshijt added this to the v.202405 milestone Jul 26, 2024

ftshijt reviewed Jul 26, 2024

View reviewed changes

Yiwen Zhao and others added 6 commits August 7, 2024 09:37

modified files according to comments-v1

c403dec

modified functions according to comments-v1

a331f28

resolve merge conflict

4894a11

[pre-commit.ci] auto fixes from pre-commit.com hooks

ed107ad

for more information, see https://pre-commit.ci

update eval results

1448097

changes in merge

0608305

ftshijt reviewed Aug 15, 2024

View reviewed changes

Yiwen Zhao and others added 4 commits August 20, 2024 07:21

refined according to comments and CI

ee4054e

[pre-commit.ci] auto fixes from pre-commit.com hooks

dd970e3

for more information, see https://pre-commit.ci

fix CI

79cd8e9

fix CI

bc5d5da

ftshijt reviewed Aug 21, 2024

View reviewed changes

Yiwen Zhao and others added 3 commits August 21, 2024 08:55

fix type mismatch on ssl feature utils

b0425e9

Merge branch 'tts2_aishell3' of https://github.com/Tsukasane/espnet_fork

7888873

into tts2_aishell3

[pre-commit.ci] auto fixes from pre-commit.com hooks

b8d56cf

for more information, see https://pre-commit.ci

ftshijt merged commit 38cc9e8 into espnet:master Aug 22, 2024

sw005320 added a commit that referenced this pull request Aug 28, 2024

Fix tts.sh path in aishell3 tts2

26da61c

See #5849 (comment) @Tsukasane

sw005320 mentioned this pull request Aug 28, 2024

Fix tts.sh path in aishell3 tts2 #5879

Merged

Tsukasane mentioned this pull request Aug 29, 2024

Fix absolute paths in aishell3_tts2 #5884

Merged

	feats_lengths=discrete_feats_lengths, #
	feats_lengths=discrete_feats_lengths,

		</table>


		* CER is currently unfilled since it requires an additional asr model.



		@typechecked
		# @typechecked NOTE(yiwen) --output_dir "${_logdir}"/output.JOB \ format like this cannot pass typecheck, but it is str

Conversation

Tsukasane commented Jul 26, 2024

What?

Why?

Uh oh!

ftshijt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ftshijt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ftshijt commented Aug 21, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tsukasane commented Aug 21, 2024

Uh oh!

codecov bot commented Aug 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ftshijt commented Aug 22, 2024

Uh oh!

sw005320 commented Aug 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Aug 21, 2024 •

edited

Loading