Thanks to visit codestin.com
Credit goes to github.com

Skip to content

streichgeorg/autosing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoSing

Samples

https://streichgeorg.github.io/autosing_samples/

Setup

The code was developed under Python 3.11 and expects PyTorch 2.1.2 (including torchaudio, torchvision) to be installed. To get started, clone the repository and run pip install .

Data Processing

Most of the processing is implemented in processing.py. Data and intermediate results are stored in Parquet files. The initial data is expected to be split across multiple Parquet partitions using the following directory structure:

<dataset>/
    0/index.parquet
    1/index.parquet
    2/index.parquet
    ...

Each index.parquet file should contain the following columns:

  • id: Unique identifier for each song.
  • audio: Byte string of audio data encoded in some format supported by PyTorch.

Optional fields:

  • lrc_lyrics: Lyrics in LRC format, the code will still compute a word-level alignment but will base it on the LRC timestamps.
  • raw_lyrics: Text-only lyrics.
  • artist: Artist name, used to train the embedding model.

After running the necessary processing steps, the construct_dataset.py script is used to shuffle and chunk the dataset.

Training

Training runs can be started as follows:

Text-to-Semantic

python3 autosing/train.py --task_name t2s \
--task_args '{"size": "<model size: small, medium, large>"}' \
--dataset-config '{"path": "<path to your dataset>"}' \
--tunables '{"lr0": 4e-3}'

Semantic-to-Audio

python3 autosing/train.py --task_name sm2a \
--task_args '{"size": "<model size: small, medium, large>"}' \
--dataset-config '{"path": "<path to your dataset>", "multiscale": <whether to enable multiscale training>}' \
--tunables '{"lr0": 3e-3}'

Inference

Samples can be generated using the sing.py script like this. The script expects two lines of lyrics, the first (second) line controls the first (last) 15 seconds of output.

python3 autosing/sing.py <audiofile containing reference song excerpt> \
--lyrics $'i saw you standing under moonlight your eyes like diamonds in the sky i felt a spark ignite oh couldn\'t help but catch your smile\n we were strangers in a crowded room but something pulled me close to you a whisper in the wind oh a heartbeat racing to the truth'

Acknowledgments

Our architecture and training code is based on the wonderful WhisperSpeech codebase.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages