Codestin Search App

Masao-Someki · 2023-07-25T15:56:12Z

What?

This PR adds espnetez package to make it easier to use ESPnet!

Why?

ESPnet runs primarily with a shell script, which can be difficult for beginners to use for running all the stages. The espnetez tool provides a Pythonic frontend for users, making it more user-friendly.

- ESPnet becomes super simple!

for more information, see https://pre-commit.ci

Masao-Someki · 2023-07-25T16:05:13Z

This is the sample training script for the ASR task with espnetez.
All you need is to build a dump file and pass it to the trainer.
You can easily join several datasets by just joining the dump file.

import espnetez as ez

# dataset information
# the format of wav.scp is: <audio_tag><space><file_path>\n
# and the format of text is: <audio_tag><space><text>\n
# Example for wav.scp: audio_1 /database/libri100/train/first.flac
# Example for text: audio_1 HELLO WORLD
data_inputs = {
    "speech": { "file": "wav.scp", "type": "kaldi_ark" },
    "text":{ "file": "text", "type": "text" }
}
train_dump_path = "dump/raw/train_clean_100_sp"
test_dump_path = "dump/raw/test_clean"
output_path = "exp"

# You can use configuration from the ESPnet recipes.
training_config = ez.config.from_yaml("asr", "train_asr_branchformer_e24_amp.yaml")

# and you can update with your config.
preprocessor_config = ez.utils.load_yaml("preprocess.yaml")
training_config.update(preprocessor_config)

# Define trainer
trainer = ez.trainer.Trainer(
	"asr", train_dump_path, test_dump_path, output_path,
	data_inputs, training_config,
	ngpu=1 # you can also update configuration here
)

# If you don't have stats file then you need to run this.
trainer.collect_stats()

# finally run train()
trainer.train()

Masao-Someki · 2023-07-25T16:11:27Z

ToDO

Add more tasks. (Currently, I added asr/transducer/tts for debugging.)
Add sentencepiece training
Add frontend for creating dump files
Refactor trainer class. Especially the _update_config() function.

codecov · 2023-07-25T16:16:26Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (35b8f01) 76.54% compared to head (c788444) 76.54%.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #5372   +/-   ##
=======================================
  Coverage   76.54%   76.54%           
=======================================
  Files         720      720           
  Lines       66602    66602           
=======================================
  Hits        50978    50978           
  Misses      15624    15624

Flag	Coverage Δ
test_configuration_espnet2	`∅ <ø> (∅)`
test_integration_espnet1	`62.92% <ø> (ø)`
test_integration_espnet2	`50.10% <ø> (ø)`
test_python_espnet1	`19.08% <ø> (ø)`
test_python_espnet2	`52.38% <ø> (ø)`
test_utils	`22.15% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sw005320 · 2023-07-25T19:32:32Z

@pyf98, can you help this project by reviewing and testing PRs?

…o feature/espnetez

for more information, see https://pre-commit.ci

sw005320 · 2023-08-03T21:09:30Z

@Masao-Someki, can you let me know the progress?
Can you fix the CI and finish refactoring?
I think we can make an LM training step a lower priority.

Masao-Someki · 2023-08-04T06:28:32Z

@sw005320
I'm using this espnetez to train a single ASR model but encountering some issues during training.
For example, the uid variable in the dataloader/dataset becomes float type, and I got assertion error.
I'm not sure why it happens, and debugging if the dump file is properly generated.

- Applied black to both `.py` and `.ipynb`

…o feature/espnetez

for more information, see https://pre-commit.ci

Masao-Someki · 2023-08-05T07:55:36Z

I added a demo notebook to train E-branchformer model with Librispeech-100 dataset. (link)
In my environment, the final train() function does not successfully executed on the Jupiter notebook. Maybe we need to run the train() function from command line.

- Easy task class will be used as the wrapper of AbsTask. - It is to enable finetuning the pretrained model defined by user.

…o feature/espnetez

for more information, see https://pre-commit.ci

Masao-Someki · 2023-11-09T16:41:25Z

I included a demo notebook on fine-tuning the pre-trained model using LoRA.
ESPnet-Easy simplifies fine-tuning the pretrained model from the Hugging Face hub with a custom dataset.
(Currently, it seems that there is a bug in the training process with the pretrained model.)

sw005320 · 2023-11-09T19:09:56Z

Very cool!

@juice500ml, @ftshijt, @simpleoier, can you check this?

pyf98 · 2023-11-09T19:43:49Z

Can we provide an example for fine-tuning OWSM (e.g., https://huggingface.co/espnet/owsm_v2_ebranchformer)? It will attract more users.

juice500ml · 2023-11-10T05:06:35Z

This PR is awesome!! Can we also consider packaging for pypi, so that people can easily pip install espnetez and directly use this?

…o feature/espnetez

for more information, see https://pre-commit.ci

Masao-Someki · 2023-11-11T14:59:53Z

Thank you @sw005320, @pyf98, and @juice500ml,

I apologize for the delay in development, but the bug in the fine-tuning process has been successfully fixed.
This PR is now ready for review!

Since I currently have only one GPU on my local machine, I kindly request the reviewer's assistance in checking whether the training process runs successfully with multiple GPUs.

pyf98 · 2023-12-02T03:39:01Z

espnet2/tasks/abs_task.py

    @classmethod
    def main(cls, args: argparse.Namespace = None, cmd: Sequence[str] = None):
        assert check_argument_types()
-        print(get_commandline_args(), file=sys.stderr)


Why removing this?

@pyf98
I'm sorry, I accidentally deleted this line...
I just reverted this modification.

pyf98 · 2023-12-02T03:41:21Z

LGTM!

…o feature/espnetez

Masao-Someki · 2023-12-02T12:28:10Z

I will fix the CI and add an inference guide to the notebooks to finish this PR!

Fix CI
Add inference instruction in notebooks

for more information, see https://pre-commit.ci

sw005320 · 2023-12-09T01:59:47Z

Thanks a lot, @Masao-Someki!
This is a great first step!

Masao-Someki and others added 2 commits July 26, 2023 00:51

Add espnetez

d0c527b

- ESPnet becomes super simple!

[pre-commit.ci] auto fixes from pre-commit.com hooks

dcd8485

for more information, see https://pre-commit.ci

sw005320 added the Need review label Jul 25, 2023

sw005320 added this to the v.202307 milestone Jul 25, 2023

Masao-Someki changed the title ~~Add espnetez~~ [WIP] Add espnetez Jul 25, 2023

sw005320 added TTS Text-to-speech ASR Automatic speech recogntion labels Jul 25, 2023

sw005320 requested a review from pyf98 July 25, 2023 19:32

sw005320 added New Features and removed Need review labels Jul 25, 2023

Masao-Someki and others added 6 commits July 27, 2023 23:49

Add sentencepiece

797a8c3

rename file

726e47e

Add tasks

4da9832

Refactored Trainer class

8470963

Merge branch 'feature/espnetez' of github.com:Masao-Someki/espnet int…

8cb1245

…o feature/espnetez

[pre-commit.ci] auto fixes from pre-commit.com hooks

84139df

for more information, see https://pre-commit.ci

kan-bayashi modified the milestones: v.202307, v.202312 Aug 3, 2023

Masao-Someki and others added 4 commits August 5, 2023 16:46

Bug fix and add demo notebook

2c3ee59

Applied black

3060f1f

- Applied black to both `.py` and `.ipynb`

Merge branch 'feature/espnetez' of github.com:Masao-Someki/espnet int…

b2cc971

…o feature/espnetez

[pre-commit.ci] auto fixes from pre-commit.com hooks

98e7e75

for more information, see https://pre-commit.ci

Removed log block

e8c6d58

kan-bayashi added this to the v.202312 milestone Oct 25, 2023

Masao-Someki and others added 4 commits November 9, 2023 02:19

Refactored Trainer class and add Task class

ea0952b

- Easy task class will be used as the wrapper of AbsTask. - It is to enable finetuning the pretrained model defined by user.

Add finetune demo with LoRA

549be9c

Merge branch 'feature/espnetez' of github.com:Masao-Someki/espnet int…

8065d1a

…o feature/espnetez

[pre-commit.ci] auto fixes from pre-commit.com hooks

34e33bd

for more information, see https://pre-commit.ci

Masao-Someki added 2 commits November 11, 2023 23:51

Bugfix and refactoring

8cf36c5

Merge branch 'feature/espnetez' of github.com:Masao-Someki/espnet int…

a150921

…o feature/espnetez

mergify bot added the ESPnet2 label Nov 11, 2023

[pre-commit.ci] auto fixes from pre-commit.com hooks

ee12b48

for more information, see https://pre-commit.ci

Masao-Someki changed the title ~~[WIP] Add espnetez~~ Add espnetez Nov 11, 2023

pyf98 reviewed Dec 2, 2023

View reviewed changes

Masao-Someki added 2 commits December 2, 2023 21:26

Revert change in abs_task

a9239b5

Merge branch 'feature/espnetez' of github.com:Masao-Someki/espnet int…

668fdf9

…o feature/espnetez

Masao-Someki and others added 5 commits December 3, 2023 13:04

Add inference instruction in notebook

53f2800

Add TTS demo

e0e0a5a

Add wrapper function for espnet2.bin.tokenize_text

defe770

[pre-commit.ci] auto fixes from pre-commit.com hooks

0e5ae1c

for more information, see https://pre-commit.ci

Merge branch 'master' into feature/espnetez

c788444

sw005320 mentioned this pull request Dec 5, 2023

Fine-Tuning ESPnet Models: A Request for Information and Tutorials reazon-research/ReazonSpeech#20

Closed

sw005320 merged commit a4c5547 into espnet:master Dec 9, 2023

Masao-Someki mentioned this pull request Dec 10, 2023

Support external dataset library for ESPnetEasy #5584

Merged

4 tasks

Masao-Someki deleted the feature/espnetez branch March 26, 2024 12:54

Conversation

Masao-Someki commented Jul 25, 2023

What?

Why?

Uh oh!

Masao-Someki commented Jul 25, 2023

Uh oh!

Masao-Someki commented Jul 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ToDO

Uh oh!

codecov bot commented Jul 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sw005320 commented Jul 25, 2023

Uh oh!

sw005320 commented Aug 3, 2023

Uh oh!

Masao-Someki commented Aug 4, 2023

Uh oh!

Masao-Someki commented Aug 5, 2023

Uh oh!

Masao-Someki commented Nov 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sw005320 commented Nov 9, 2023

Uh oh!

pyf98 commented Nov 9, 2023

Uh oh!

juice500ml commented Nov 10, 2023

Uh oh!

Masao-Someki commented Nov 11, 2023

Uh oh!

pyf98 Dec 2, 2023

Choose a reason for hiding this comment

Uh oh!

Masao-Someki Dec 2, 2023

Choose a reason for hiding this comment

Uh oh!

pyf98 commented Dec 2, 2023

Uh oh!

Masao-Someki commented Dec 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sw005320 commented Dec 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Masao-Someki commented Jul 25, 2023 •

edited

Loading

codecov bot commented Jul 25, 2023 •

edited

Loading

Masao-Someki commented Nov 9, 2023 •

edited

Loading

Masao-Someki commented Dec 2, 2023 •

edited

Loading