Implementation of SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing
This is implementation codes of the paper: SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing.
Our demonstration page is available in Demo.
This code was tested with Python3.8.10, Pytorch 2.2.0+cu121. SteerMusic relies on a pretrained AudioLDM2.
You can install the Python dependencies with
pip3 install -r requirements.txt
If you encounter issues such as
ImportError: cannot import name 'cached_download' from 'huggingface_hub'
please manually change cached_download to hf_hub_download in diffusers/utils/dynamic_modules_utils.py.
To perform a corse-grained text-to-music editing, run
python SteerMusic_edit.py --audio_path '/path/to/source/music/' --prompt 'target prompt' --prompt_ref 'source prompt' --output_dir '/output/path/' --guidance_scale 30
Example
python SteerMusic_edit.py --audio_path "./audios/bach_anh114.wav" --prompt "Energetic harp cover with a groovy, reverberant melody." --prompt_ref "Energetic piano cover with a groovy, reverberant melody." --guidance_scale 30 --weight_aug 3
SteerMusic+ relies on a fine-tuned personalized diffusion model. To fine-tune a personalized diffusion model, please refer to DreamSound. In this SteerMusic+ implementation, we plug-in SteerMusic+ to a DreamSound fine-tuned on a AudioLDM2 checkpoint. To personalize your music editing, please follow the fine-tune instruction provided in DreamSound and obtain a checkpoint captured the desired musical concept token.
To perform a fine-grained personalized music editing, please run
python SteerMusic_personalized.py --audio_path '/path/to/source/music/' --prompt_ref 'source prompt with [emphasized] edit area, e.g., a recording of [piano] music' --concept 'target concept' --personalized_ckpt '/path/to/personalized/diffusion/ckpt/' --guidance_scale 15
This is an example command. We provide an example of fine-tuned DreamSound ckpt on [bouzouki] concept which can be downloaded via the link. Please unzip the downloaded ckpt file and put to the path ./DreamSound/outputs_bouzouki/, then execute the codes as below:
python SteerMusic_personalized.py --audio_path "./audios/bach_anh114.wav" --prompt_ref "Energetic [piano] cover with a groovy, reverberant melody." --concept 'bouzouki' --personalized_ckpt './Dreamsound/outputs_bouzouki/pipeline_step_100' --guidance_scale 20
-
For CLAP and LPAPS score, please refer to CLAP and LPAPS. These codes are adapted from AudioEditingCode.
-
For FAD scores, please refer to fadtk
-
For CDPAM score, please refer to CDPAM, which is adapted from CDPAM_repo.
-
For CQT-1 PCC score, please refer to CQT-1
If the editing effects are not noticeable, consider increasing the guidance scale. A higher guidance scale can strengthen the influence of the editing instructions, leading to more pronounced changes in the output. See Section 5 in our paper and our supplementary materials for the discussion of the trade-off between editing effects and original music content preservation.
One possible reason is that the personalized diffusion model may be overfitted. To mitigate this, try adjusting the number of fine-tuning steps used during personalization. For more details, please refer to Figure 11 in our paper.
We acknowledge the following works for sharing their implementation code:
Delta_denoising_score; Constrative_denoising_score; DreamSound; AudioEditingCodes; AudioLDM2;