Codestin Search App

ftshijt · 2023-12-05T10:13:38Z

What?

Add ESPnet speaker embedding extractor (inference script)
Add ESPnet speaker embedding extractor for TTS purpose
Separate the spk embedding and id converting stage in TTS
- For flexibility concerns (e.g., after formatting the waveform to use different speaker embedding)
change xvector to spk_embed as suggested by @Jungjee

TODO

replace the current spk.sh template with the new inference
upload pre-trained vctk model with the ESPnet speaker pre-trained model
test function for spk_inference.py

for more information, see https://pre-commit.ci

…o spk_inference

ftshijt · 2023-12-05T10:21:34Z

@Jungjee Please feel free to have a check for the implementation~

codecov · 2023-12-05T10:38:56Z

Codecov Report

Attention: 29 lines in your changes are missing coverage. Please review.

Comparison is base (4771515) 76.53% compared to head (d0740d1) 76.49%.
Report is 2 commits behind head on master.

Files	Patch %	Lines
espnet2/bin/spk_inference.py	51.66%	29 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5579      +/-   ##
==========================================
- Coverage   76.53%   76.49%   -0.04%     
==========================================
  Files         720      720              
  Lines       66639    66607      -32     
==========================================
- Hits        51001    50951      -50     
- Misses      15638    15656      +18

Flag	Coverage Δ
test_configuration_espnet2	`∅ <ø> (∅)`
test_integration_espnet1	`62.92% <ø> (+0.14%)`	⬆️
test_integration_espnet2	`49.47% <100.00%> (-0.63%)`	⬇️
test_python_espnet1	`19.09% <0.00%> (+<0.01%)`	⬆️
test_python_espnet2	`52.55% <53.22%> (+0.15%)`	⬆️
test_utils	`22.15% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…rained models

egs2/TEMPLATE/tts1/tts.sh

Jungjee

Thanks for your effort @ftshijt !! I left some comments.
Mostly looks good to me.

egs2/TEMPLATE/asr1/pyscripts/utils/extract_xvectors.py

egs2/TEMPLATE/spk1/setup.sh

egs2/TEMPLATE/tts1/tts.sh

…o spk_inference

for more information, see https://pre-commit.ci

sw005320 · 2023-12-06T22:40:00Z

@Jungjee Hi, I applied the name change and verified most of the process (but still need double-check for some previous checkpoints of TTS). But I believe it would be ready for review.

Btw, for an example usage of the API:
from espnet2.bin.spk_inference import Speech2SpkEmbedding
import numpy as np

# for huggingface
speech2spk_embed = Speech2SpkEmbedding.from_pretrained(model_tag="espnet/voxcelebs12_rawnet3")
speech2spk_embed(np.zeros( 16500))

# for local ckpt
speech2spk_embed = Speech2SpkEmbedding(model_file="model.pth", train_config="config.yaml")
speech2spk_embed(np.zeros(32000))

One naming-level comment.
How about changing the class name from Speech2SpkEmbedding to Speech2Embedding?
We may also provide possible other embedding vectors (e.g., lang or whatever) with the same API name.
Speech2Text is based on this policy (it would be ASR or OWSM S2T).

…o spk_inference

Jungjee · 2023-12-06T23:31:53Z

One naming-level comment.
How about changing the class name from Speech2SpkEmbedding to Speech2Embedding?
We may also provide possible other embedding vectors (e.g., lang or whatever) with the same API name.
Speech2Text is based on this policy (it would be ASR or OWSM S2T).

I see, I didn't think about that.
I think it's a good suggestion! @ftshijt, sorry let's go with your first choice !

for more information, see https://pre-commit.ci

…o spk_inference

sw005320 · 2024-01-04T09:31:01Z

LGTM.
Is it ready for merge?

replace the current spk.sh template with the new inference in TODO is not checked yet.

ftshijt · 2024-01-04T09:32:57Z

LGTM. Is it ready for merge?

replace the current spk.sh template with the new inference in TODO is not checked yet.

Sorry, it is not done yet. I recently mostly focused on checking the TTS performance (which is good). Will back to that later this week.

sw005320 · 2024-01-04T09:43:00Z

Sounds good.
Please ping me if you finish it.

Jungjee · 2024-01-06T03:51:59Z

Sorry, it is not done yet. I recently mostly focused on checking the TTS performance (which is good). Will back to that later this week.

FYI, to me, replacing existing stage 6 with this extraction can be done in another PR since this can impact the speed of current inference on models and also need several tests.
(I'm bit worried about losing easy multi-GPU extraction that I've currently made. To not lose speed and at the same time use the new HF-based extraction, to me quite a lot of codes need to be fixed)

Also (maybe not a good reason but) already several users trying to use the models we uploaded, e.g., @Emrys365 for SE challenge and @underdogliu for ASVspoof5 is another reason to split the PR for me.

sw005320 · 2024-01-10T20:01:46Z

@ftshijt
Given the discussion with @Jungjee, it would be good to split the PR about "replace the current spk.sh template with the new inference"
So, I just merged this PR.
Thanks for your great PR and please continue it with the other PR!

After an extra stage was added to tts.sh in espnet#5579 , following stage numbers were updated. A few were missed in the update and this PR covers those that remained.

ftshijt added 7 commits December 5, 2023 04:33

fix setup

f4b172f

fix readme with new stage

9dc73d7

add espnet spk in tts, add spk embedding as a separate stage

824121c

fix espnet model

17a033b

minor fix

626ae9e

change type

2b1c8ac

apply black

849021c

ftshijt requested review from Fhrozen and kan-bayashi and removed request for Fhrozen December 5, 2023 10:13

mergify bot added ESPnet2 README labels Dec 5, 2023

ftshijt added Documentation TTS Text-to-speech SID Speaker identification/embedding and removed README labels Dec 5, 2023

ftshijt added this to the v.202312 milestone Dec 5, 2023

[pre-commit.ci] auto fixes from pre-commit.com hooks

0dfefab

for more information, see https://pre-commit.ci

mergify bot added the README label Dec 5, 2023

ftshijt added 2 commits December 5, 2023 05:17

update missing code for espnet type

4760f0e

Merge branch 'spk_inference' of https://github.com/ftshijt/espnet int…

0bd9fd3

…o spk_inference

wrong name update

bf84668

switch to train_config and model_file to align with huggingface pre-t…

a911df4

…rained models

Fhrozen reviewed Dec 5, 2023

View reviewed changes

egs2/TEMPLATE/tts1/tts.sh Outdated Show resolved Hide resolved

ftshijt added 2 commits December 5, 2023 05:59

fix debug info and update huggingface compatibale inference setting

3ace83c

fix Uhion typo

2b59b27

Jungjee reviewed Dec 5, 2023

View reviewed changes

ftshijt and others added 2 commits December 6, 2023 17:32

Merge branch 'spk_inference' of https://github.com/ftshijt/espnet int…

06494ea

…o spk_inference

[pre-commit.ci] auto fixes from pre-commit.com hooks

bcbfedd

for more information, see https://pre-commit.ci

ftshijt added 2 commits December 6, 2023 18:21

add ci test for spk inference

12292c8

Merge branch 'spk_inference' of https://github.com/ftshijt/espnet int…

cdacc6f

…o spk_inference

ftshijt and others added 14 commits December 7, 2023 07:30

revert back the name

85840b4

Merge branch 'espnet:master' into spk_inference

882e6d6

Update README.md

ebaf811

[pre-commit.ci] auto fixes from pre-commit.com hooks

71780b5

for more information, see https://pre-commit.ci

minor fix to non-reference issue

b3e5db8

update spk embed setting for talromur2

50f708c

revert run.sh change

3771b28

fix config xvector statement

c5d0984

update other recipes for aligned naming

a9fed0a

[pre-commit.ci] auto fixes from pre-commit.com hooks

733fbc4

for more information, see https://pre-commit.ci

bug fix

b45f86e

bug fix

f842e45

bug fix

b4c5c13

Merge branch 'spk_inference' of https://github.com/ftshijt/espnet int…

d0740d1

…o spk_inference

sw005320 merged commit 3b2e0d3 into espnet:master Jan 10, 2024

G-Thor mentioned this pull request Mar 21, 2024

X-vector based TTS model packaging broken in tts.sh #5713

Closed

G-Thor mentioned this pull request Mar 21, 2024

Fix stage references in generated run.sh in TTS recipes #5714

Merged

ftshijt deleted the spk_inference branch May 19, 2025 07:53

Conversation

ftshijt commented Dec 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What?

TODO

Uh oh!

ftshijt commented Dec 5, 2023

Uh oh!

codecov bot commented Dec 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Jungjee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sw005320 commented Dec 6, 2023

Uh oh!

Jungjee commented Dec 6, 2023

Uh oh!

sw005320 commented Jan 4, 2024

Uh oh!

ftshijt commented Jan 4, 2024

Uh oh!

sw005320 commented Jan 4, 2024

Uh oh!

Jungjee commented Jan 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sw005320 commented Jan 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ftshijt commented Dec 5, 2023 •

edited

Loading

codecov bot commented Dec 5, 2023 •

edited

Loading

Jungjee commented Jan 6, 2024 •

edited

Loading