AutoSing

Samples

https://streichgeorg.github.io/autosing_samples/

Setup

The code was developed under Python 3.11 and expects PyTorch 2.1.2 (including torchaudio, torchvision) to be installed. To get started, clone the repository and run pip install .

Data Processing

Most of the processing is implemented in processing.py. Data and intermediate results are stored in Parquet files. The initial data is expected to be split across multiple Parquet partitions using the following directory structure:

<dataset>/
    0/index.parquet
    1/index.parquet
    2/index.parquet
    ...

Each index.parquet file should contain the following columns:

id: Unique identifier for each song.
audio: Byte string of audio data encoded in some format supported by PyTorch.

Optional fields:

lrc_lyrics: Lyrics in LRC format, the code will still compute a word-level alignment but will base it on the LRC timestamps.
raw_lyrics: Text-only lyrics.
artist: Artist name, used to train the embedding model.

After running the necessary processing steps, the construct_dataset.py script is used to shuffle and chunk the dataset.

Training

Training runs can be started as follows:

Text-to-Semantic

python3 autosing/train.py --task_name t2s \
--task_args '{"size": "<model size: small, medium, large>"}' \
--dataset-config '{"path": "<path to your dataset>"}' \
--tunables '{"lr0": 4e-3}'

Semantic-to-Audio

python3 autosing/train.py --task_name sm2a \
--task_args '{"size": "<model size: small, medium, large>"}' \
--dataset-config '{"path": "<path to your dataset>", "multiscale": <whether to enable multiscale training>}' \
--tunables '{"lr0": 3e-3}'

Inference

Samples can be generated using the sing.py script like this. The script expects two lines of lyrics, the first (second) line controls the first (last) 15 seconds of output.

python3 autosing/sing.py <audiofile containing reference song excerpt> \
--lyrics $'i saw you standing under moonlight your eyes like diamonds in the sky i felt a spark ignite oh couldn\'t help but catch your smile\n we were strangers in a crowded room but something pulled me close to you a whisper in the wind oh a heartbeat racing to the truth'

Acknowledgments

Our architecture and training code is based on the wonderful WhisperSpeech codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
autosing		autosing
charsiu @ 585072a		charsiu @ 585072a
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AutoSing

Samples

Setup

Data Processing

Training

Text-to-Semantic

Semantic-to-Audio

Inference

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

streichgeorg/autosing

Folders and files

Latest commit

History

Repository files navigation

AutoSing

Samples

Setup

Data Processing

Training

Text-to-Semantic

Semantic-to-Audio

Inference

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages