Implementation of SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing

This is implementation codes of the paper: SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing.

Our demonstration page is available in Demo.

Environment setting

This code was tested with Python3.8.10, Pytorch 2.2.0+cu121. SteerMusic relies on a pretrained AudioLDM2.

You can install the Python dependencies with

pip3 install -r requirements.txt

If you encounter issues such as ImportError: cannot import name 'cached_download' from 'huggingface_hub' please manually change cached_download to hf_hub_download in diffusers/utils/dynamic_modules_utils.py.

SteerMusic for Zero-shot Text-guided Music Editing

To perform a corse-grained text-to-music editing, run

python SteerMusic_edit.py --audio_path '/path/to/source/music/' --prompt 'target prompt' --prompt_ref 'source prompt' --output_dir '/output/path/' --guidance_scale 30

Example

python SteerMusic_edit.py --audio_path "./audios/bach_anh114.wav" --prompt "Energetic harp cover with a groovy, reverberant melody." --prompt_ref "Energetic piano cover with a groovy, reverberant melody." --guidance_scale 30  --weight_aug 3

SteerMusic+ for Personalized Music Editing

SteerMusic+ relies on a fine-tuned personalized diffusion model. To fine-tune a personalized diffusion model, please refer to DreamSound. In this SteerMusic+ implementation, we plug-in SteerMusic+ to a DreamSound fine-tuned on a AudioLDM2 checkpoint. To personalize your music editing, please follow the fine-tune instruction provided in DreamSound and obtain a checkpoint captured the desired musical concept token.

To perform a fine-grained personalized music editing, please run

python SteerMusic_personalized.py --audio_path '/path/to/source/music/' --prompt_ref 'source prompt with [emphasized] edit area, e.g., a recording of [piano] music' --concept 'target concept' --personalized_ckpt '/path/to/personalized/diffusion/ckpt/' --guidance_scale 15

This is an example command. We provide an example of fine-tuned DreamSound ckpt on [bouzouki] concept which can be downloaded via the link. Please unzip the downloaded ckpt file and put to the path ./DreamSound/outputs_bouzouki/, then execute the codes as below:

python SteerMusic_personalized.py --audio_path "./audios/bach_anh114.wav" --prompt_ref "Energetic [piano] cover with a groovy, reverberant melody." --concept 'bouzouki' --personalized_ckpt './Dreamsound/outputs_bouzouki/pipeline_step_100' --guidance_scale 20

Evaluation Metrics

For CLAP and LPAPS score, please refer to CLAP and LPAPS. These codes are adapted from AudioEditingCode.
For FAD scores, please refer to fadtk
For CDPAM score, please refer to CDPAM, which is adapted from CDPAM_repo.
For CQT-1 PCC score, please refer to CQT-1

Q&A

1. The edited results are not noticeable. What should I do?

If the editing effects are not noticeable, consider increasing the guidance scale. A higher guidance scale can strengthen the influence of the editing instructions, leading to more pronounced changes in the output. See Section 5 in our paper and our supplementary materials for the discussion of the trade-off between editing effects and original music content preservation.

2. The edited results of SteerMusic+ significantly distort the source melody. What should I do?

One possible reason is that the personalized diffusion model may be overfitted. To mitigate this, try adjusting the number of fine-tuning steps used during personalization. For more details, please refer to Figure 11 in our paper.

Acknowledgement

We acknowledge the following works for sharing their implementation code:

Delta_denoising_score; Constrative_denoising_score; DreamSound; AudioEditingCodes; AudioLDM2;

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
SteerMusic+_output		SteerMusic+_output
SteerMusic_output		SteerMusic_output
audioldm_train		audioldm_train
audios		audios
config/autoencoder		config/autoencoder
eval		eval
.gitignore		.gitignore
README.md		README.md
SteerMusic_edit.py		SteerMusic_edit.py
SteerMusic_personalized.py		SteerMusic_personalized.py
preprocessor.py		preprocessor.py
requirements.txt		requirements.txt
requirements_fix.txt		requirements_fix.txt
steermusic_utils.py		steermusic_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Implementation of SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing

Environment setting

SteerMusic for Zero-shot Text-guided Music Editing

SteerMusic+ for Personalized Music Editing

Evaluation Metrics

Q&A

1. The edited results are not noticeable. What should I do?

2. The edited results of SteerMusic+ significantly distort the source melody. What should I do?

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

sony/steermusic

Folders and files

Latest commit

History

Repository files navigation

Implementation of SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing

Environment setting

SteerMusic for Zero-shot Text-guided Music Editing

SteerMusic+ for Personalized Music Editing

Evaluation Metrics

Q&A

1. The edited results are not noticeable. What should I do?

2. The edited results of SteerMusic+ significantly distort the source melody. What should I do?

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages