Codestin Search App

G-Thor · 2023-06-20T14:34:04Z

Using speaker averaged xvectors in TTS training may generalise better to inference tasks, where the utterance-specific xvector is unknown.
I added a small script to modify xvector.scp to refer to spk_xvector.ark entries instead of utterance-specific ones. It works well for my task so I figured it may be of use for others.

for more information, see https://pre-commit.ci

codecov · 2023-06-20T15:07:55Z

Codecov Report

Merging #5244 (9115a09) into master (096e2bb) will increase coverage by 0.55%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #5244      +/-   ##
==========================================
+ Coverage   74.43%   74.99%   +0.55%     
==========================================
  Files         642      655      +13     
  Lines       57611    58553     +942     
==========================================
+ Hits        42885    43909    +1024     
+ Misses      14726    14644      -82

Flag	Coverage Δ
test_integration_espnet1	`66.24% <ø> (-0.05%)`	⬇️
test_integration_espnet2	`47.64% <ø> (+0.12%)`	⬆️
test_python	`65.27% <ø> (+0.12%)`	⬆️
test_utils	`23.27% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 48 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

sw005320 · 2023-06-22T11:19:12Z

Can you also add an example (config file?) to use this averaged xvector?

G-Thor · 2023-06-22T11:38:02Z

The script modifies the xvector.scp file to reference the corresponding spk_xvector.ark locations for the desires speaker(s) . No further modifications are needed as (the now modified) xvector.scp is used during training.
See e.g. https://github.com/espnet/espnet/blob/master/egs2/TEMPLATE/tts1/tts.sh#L858

The original xvector.scp is backed up so it is possible to manually revert the changes.

sw005320 · 2023-06-22T11:43:13Z

OK, where will it be used then?
Also, do you have some results of using this averaged xvector?

G-Thor · 2023-06-22T12:12:00Z

It is applied after xvector extraction (stage 2) and before model training (stage 6)

I have models trained on averaged xvectors for Icelandic (talromur and talromur2 datasets) abut no proper evaluation completed.

kan-bayashi

Very cool!
Could you add the brief description about your new function here?
https://github.com/espnet/espnet/tree/master/egs2/TEMPLATE/tts1#multi-speaker-model-with-x-vector-training
(e.g., example command to replace xvector with spk-xvector)

Fhrozen · 2023-06-29T05:20:56Z

I solved this issue in a different way. Only modifying the tts.sh file at lines:

espnet/egs2/TEMPLATE/tts1/tts.sh

Lines 402 to 417 in abd3aa7

    
           log "Stage 2+: Extract X-vector: data/ -> ${dumpdir}/xvector using python toolkits" 
        
           for dset in "${train_set}" "${valid_set}" ${test_sets}; do 
        
               if [ "${dset}" = "${train_set}" ] || [ "${dset}" = "${valid_set}" ]; then 
        
                   _suf="/org" 
        
               else 
        
                   _suf="" 
        
               fi 
        
               if [ "${xvector_tool}" = "rawnet" ]; then 
        
                   xvector_model="RawNet" 
        
               fi 
        
               pyscripts/utils/extract_xvectors.py \ 
        
                   --pretrained_model ${xvector_model} \ 
        
                   --toolkit ${xvector_tool} \ 
        
                   ${data_feats}${_suf}/${dset} \ 
        
                   ${dumpdir}/xvector/${dset} 
        
           done

and using a flag --use_ave_xvector:

                # Assume that others toolkits are python-based
                log "Stage 2+: Extract X-vector: data/ -> ${dumpdir}/xvector using python toolkits"
                if "${use_ave_xvector}"; then
                    use_dsets=""
                    for dset in "${train_set}" "${valid_set}" ${test_sets}; do
                        if [ "${dset}" = "${train_set}" ] || [ "${dset}" = "${valid_set}" ]; then
                            _suf="/org"
                        else
                            _suf=""
                        fi
                        use_dsets+=" ${data_feats}${_suf}/${dset}"
                    done
                    utils/combine_data.sh ${data_feats}/allsplits ${use_dsets}
                    pyscripts/utils/extract_xvectors.py \
                        --pretrained_model ${xvector_model} \
                        --toolkit ${xvector_tool} \
                        ${data_feats}/allsplits \
                        ${dumpdir}/xvector/averaged
                    
                    for dset in "${train_set}" "${valid_set}" ${test_sets}; do
                        mkdir -p ${dumpdir}/xvector/${dset}
                        if [ "${dset}" = "${train_set}" ] || [ "${dset}" = "${valid_set}" ]; then
                            _suf="/org"
                        else
                            _suf=""
                        fi
                        <"${dumpdir}/xvector/averaged/ave_xvector.scp" \
                            utils/filter_scp.pl "${data_feats}${_suf}/${dset}/wav.scp"  \
                            >"${dumpdir}/xvector/${dset}/xvector.scp"
                    done
                else
                    for dset in "${train_set}" "${valid_set}" ${test_sets}; do
                        if [ "${dset}" = "${train_set}" ] || [ "${dset}" = "${valid_set}" ]; then
                            _suf="/org"
                        else
                            _suf=""
                        fi
                        pyscripts/utils/extract_xvectors.py \
                            --pretrained_model ${xvector_model} \
                            --toolkit ${xvector_tool} \
                            ${data_feats}${_suf}/${dset} \
                            ${dumpdir}/xvector/${dset}
                    done
                fi

Ofc, I implemented for python-based toolkits in my terminal bc I do not use kaldi for xvector extraction.
Just for reference.

Fhrozen · 2023-06-29T05:35:10Z

You may find some samples at: https://1drv.ms/f/s!AliZ3I0uDW8HgTxDtaGlsH8FmYcA.
A single model trained with averaged x_vectors.

kan-bayashi · 2023-07-01T14:46:30Z

Thanks @Fhrozen for your reference!
I think @G-Thor 's is also good for reference, so I will merge it.

G-Thor added 2 commits June 20, 2023 14:19

Add script to use spk avg xvectors in TTS training

94f9199

Fix stderr messages and formatting.

4d21b6a

mergify bot added the ESPnet2 label Jun 20, 2023

[pre-commit.ci] auto fixes from pre-commit.com hooks

1e43ed0

for more information, see https://pre-commit.ci

sw005320 added Enhancement Enhancement TTS Text-to-speech labels Jun 20, 2023

sw005320 added this to the v.202307 milestone Jun 20, 2023

sw005320 requested a review from kan-bayashi June 20, 2023 14:36

kan-bayashi requested changes Jun 23, 2023

View reviewed changes

Add info about spk-avg X-vectors to tts1 README.md

9115a09

mergify bot added the README label Jun 23, 2023

kan-bayashi approved these changes Jul 1, 2023

View reviewed changes

kan-bayashi merged commit 3651c2e into espnet:master Jul 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add script to use speaker averaged xvectors in TTS training#5244

Add script to use speaker averaged xvectors in TTS training#5244
kan-bayashi merged 4 commits intoespnet:masterfrom
G-Thor:spk_avg_xvectors

G-Thor commented Jun 20, 2023

Uh oh!

codecov bot commented Jun 20, 2023 •

edited

Loading

Uh oh!

sw005320 commented Jun 22, 2023

Uh oh!

G-Thor commented Jun 22, 2023

Uh oh!

sw005320 commented Jun 22, 2023

Uh oh!

G-Thor commented Jun 22, 2023

Uh oh!

kan-bayashi left a comment •

edited

Loading

Uh oh!

Fhrozen commented Jun 29, 2023 •

edited

Loading

Uh oh!

Fhrozen commented Jun 29, 2023

Uh oh!

kan-bayashi commented Jul 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

G-Thor commented Jun 20, 2023

Uh oh!

codecov bot commented Jun 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sw005320 commented Jun 22, 2023

Uh oh!

G-Thor commented Jun 22, 2023

Uh oh!

sw005320 commented Jun 22, 2023

Uh oh!

G-Thor commented Jun 22, 2023

Uh oh!

kan-bayashi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fhrozen commented Jun 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fhrozen commented Jun 29, 2023

Uh oh!

kan-bayashi commented Jul 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Jun 20, 2023 •

edited

Loading

kan-bayashi left a comment •

edited

Loading

Fhrozen commented Jun 29, 2023 •

edited

Loading