Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add script to use speaker averaged xvectors in TTS training#5244

Merged
kan-bayashi merged 4 commits intoespnet:masterfrom
G-Thor:spk_avg_xvectors
Jul 1, 2023
Merged

Add script to use speaker averaged xvectors in TTS training#5244
kan-bayashi merged 4 commits intoespnet:masterfrom
G-Thor:spk_avg_xvectors

Conversation

@G-Thor
Copy link
Contributor

@G-Thor G-Thor commented Jun 20, 2023

Using speaker averaged xvectors in TTS training may generalise better to inference tasks, where the utterance-specific xvector is unknown.
I added a small script to modify xvector.scp to refer to spk_xvector.ark entries instead of utterance-specific ones. It works well for my task so I figured it may be of use for others.

@mergify mergify bot added the ESPnet2 label Jun 20, 2023
@sw005320 sw005320 added Enhancement Enhancement TTS Text-to-speech labels Jun 20, 2023
@sw005320 sw005320 added this to the v.202307 milestone Jun 20, 2023
@sw005320 sw005320 requested a review from kan-bayashi June 20, 2023 14:36
@codecov
Copy link

codecov bot commented Jun 20, 2023

Codecov Report

Merging #5244 (9115a09) into master (096e2bb) will increase coverage by 0.55%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #5244      +/-   ##
==========================================
+ Coverage   74.43%   74.99%   +0.55%     
==========================================
  Files         642      655      +13     
  Lines       57611    58553     +942     
==========================================
+ Hits        42885    43909    +1024     
+ Misses      14726    14644      -82     
Flag Coverage Δ
test_integration_espnet1 66.24% <ø> (-0.05%) ⬇️
test_integration_espnet2 47.64% <ø> (+0.12%) ⬆️
test_python 65.27% <ø> (+0.12%) ⬆️
test_utils 23.27% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 48 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sw005320
Copy link
Contributor

Can you also add an example (config file?) to use this averaged xvector?

@G-Thor
Copy link
Contributor Author

G-Thor commented Jun 22, 2023

The script modifies the xvector.scp file to reference the corresponding spk_xvector.ark locations for the desires speaker(s) . No further modifications are needed as (the now modified) xvector.scp is used during training.
See e.g. https://github.com/espnet/espnet/blob/master/egs2/TEMPLATE/tts1/tts.sh#L858

The original xvector.scp is backed up so it is possible to manually revert the changes.

@sw005320
Copy link
Contributor

OK, where will it be used then?
Also, do you have some results of using this averaged xvector?

@G-Thor
Copy link
Contributor Author

G-Thor commented Jun 22, 2023

It is applied after xvector extraction (stage 2) and before model training (stage 6)

I have models trained on averaged xvectors for Icelandic (talromur and talromur2 datasets) abut no proper evaluation completed.

Copy link
Member

@kan-bayashi kan-bayashi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!
Could you add the brief description about your new function here?
https://github.com/espnet/espnet/tree/master/egs2/TEMPLATE/tts1#multi-speaker-model-with-x-vector-training
(e.g., example command to replace xvector with spk-xvector)

@mergify mergify bot added the README label Jun 23, 2023
@Fhrozen
Copy link
Member

Fhrozen commented Jun 29, 2023

I solved this issue in a different way. Only modifying the tts.sh file at lines:

log "Stage 2+: Extract X-vector: data/ -> ${dumpdir}/xvector using python toolkits"
for dset in "${train_set}" "${valid_set}" ${test_sets}; do
if [ "${dset}" = "${train_set}" ] || [ "${dset}" = "${valid_set}" ]; then
_suf="/org"
else
_suf=""
fi
if [ "${xvector_tool}" = "rawnet" ]; then
xvector_model="RawNet"
fi
pyscripts/utils/extract_xvectors.py \
--pretrained_model ${xvector_model} \
--toolkit ${xvector_tool} \
${data_feats}${_suf}/${dset} \
${dumpdir}/xvector/${dset}
done

and using a flag --use_ave_xvector:

                # Assume that others toolkits are python-based
                log "Stage 2+: Extract X-vector: data/ -> ${dumpdir}/xvector using python toolkits"
                if "${use_ave_xvector}"; then
                    use_dsets=""
                    for dset in "${train_set}" "${valid_set}" ${test_sets}; do
                        if [ "${dset}" = "${train_set}" ] || [ "${dset}" = "${valid_set}" ]; then
                            _suf="/org"
                        else
                            _suf=""
                        fi
                        use_dsets+=" ${data_feats}${_suf}/${dset}"
                    done
                    utils/combine_data.sh ${data_feats}/allsplits ${use_dsets}
                    pyscripts/utils/extract_xvectors.py \
                        --pretrained_model ${xvector_model} \
                        --toolkit ${xvector_tool} \
                        ${data_feats}/allsplits \
                        ${dumpdir}/xvector/averaged
                    
                    for dset in "${train_set}" "${valid_set}" ${test_sets}; do
                        mkdir -p ${dumpdir}/xvector/${dset}
                        if [ "${dset}" = "${train_set}" ] || [ "${dset}" = "${valid_set}" ]; then
                            _suf="/org"
                        else
                            _suf=""
                        fi
                        <"${dumpdir}/xvector/averaged/ave_xvector.scp" \
                            utils/filter_scp.pl "${data_feats}${_suf}/${dset}/wav.scp"  \
                            >"${dumpdir}/xvector/${dset}/xvector.scp"
                    done
                else
                    for dset in "${train_set}" "${valid_set}" ${test_sets}; do
                        if [ "${dset}" = "${train_set}" ] || [ "${dset}" = "${valid_set}" ]; then
                            _suf="/org"
                        else
                            _suf=""
                        fi
                        pyscripts/utils/extract_xvectors.py \
                            --pretrained_model ${xvector_model} \
                            --toolkit ${xvector_tool} \
                            ${data_feats}${_suf}/${dset} \
                            ${dumpdir}/xvector/${dset}
                    done
                fi

Ofc, I implemented for python-based toolkits in my terminal bc I do not use kaldi for xvector extraction.
Just for reference.

@Fhrozen
Copy link
Member

Fhrozen commented Jun 29, 2023

You may find some samples at: https://1drv.ms/f/s!AliZ3I0uDW8HgTxDtaGlsH8FmYcA.
A single model trained with averaged x_vectors.

@kan-bayashi
Copy link
Member

Thanks @Fhrozen for your reference!
I think @G-Thor 's is also good for reference, so I will merge it.

@kan-bayashi kan-bayashi merged commit 3651c2e into espnet:master Jul 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement Enhancement ESPnet2 README TTS Text-to-speech

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants