vocal 🔥

Take a look at main.ipynb!

Project Overview

The objective of our project was to extract vocals from one song and accompaniment from another, and then merge the two. Future improvements might include pitch tuning and denoising. The project primarily leverages a pre-trained Spleeter model by Deezer, with additional operations performed using the powerful audio manipulation library, Librosa.

Audio Representation in Computers

Each sound at a given instant can be represented as a sum of various frequencies, each with its own intensity. The sample rate is the number of frequencies we can distinguish. When we convert a song into a numerical array, the result is an array of shape (sample rate x number of time units). Each value in this array represents the intensity of a specific frequency at a given moment. The concept can be visualized through a spectrogram as shown below:

Introduction to Spleeter

Spleeter is powered by U-net, an encoder-decoder architecture that employs convolutional neural networks instead of fully connected layers. We use transposed filters for upscaling, and the model incorporates several other strategies, such as skip connections from encoder outputs to the decoder, batch normalization in the encoder, 0.5 dropout in the decoder, and the use of LeakyReLU and ELU instead of ReLUs. Spleeter consists of 12 layers in total (6 in the encoder, 6 in the decoder). Source code: https://github.com/deezer/spleeter/blob/master/spleeter/model/functions/unet.py Here's a diagram to give you an intuition of how U-net works:

Loudness Normalization

To ensure that the merged tracks have a balanced volume, we split both songs into frames and compute the average intensity for each frequency in each frame. We have experimented with two approaches: adjusting vocal intensity all at once or by individual frames. It appears that modifying the intensity all at once yields superior results.

Denoising (WIP)

Denoising was implemented using Short-Time Fourier Transform (STFT). The process involves converting the audio signal into the frequency domain and creating a binary mask that identifies whether the intensity at a given frequency exceeds a certain threshold. If it does, it is kept; otherwise, it is set to zero, thereby reducing noise. Denoising is not used in the final version though, as it produces some super weird sounds.

Morphing (WIP)

Morphing is the process of blending two audio clips in a way that results in a seamless transition between them. We attempted to use Deep Convolutional Generative Adversarial Networks (DCGAN) for this task, but found that it required prohibitive amounts of time to train.

Beat Alignment (WIP Beat Alignment 2.0 using nn)

Beat alignment ensures that the beats of the vocals and accompaniment are synchronized. The align_beat function detects the beat pattern in each track and calculates the necessary adjustments to align them. It then modifies the audio clips accordingly, and the final outputs are trimmed to a common length.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
andrew		andrew
data		data
den		den
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
main.ipynb		main.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

vocal 🔥

Project Overview

Audio Representation in Computers

Introduction to Spleeter

Loudness Normalization

Denoising (WIP)

Morphing (WIP)

Beat Alignment (WIP Beat Alignment 2.0 using nn)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Uh oh!

Uh oh!

songs-merger/vocal

Folders and files

Latest commit

History

Repository files navigation

vocal 🔥

Project Overview

Audio Representation in Computers

Introduction to Spleeter

Loudness Normalization

Denoising (WIP)

Morphing (WIP)

Beat Alignment (WIP Beat Alignment 2.0 using nn)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages