Musiplexity: Classifying Music by Genre Using Data Analysis and Machine Learning

This is project I conducted as part of an unpaid internship with the Patriot Machine Learning Research Group at Francis Marion University. This research analyzes how well machines can sort musical selections into predetermined genres using persistent homology and k-means clustering.

Inspiration

When I was completing my undergraduate degree in Computer Science, I wanted an opportunity to explore interest areas outside of pure software engineering. Having a musical background from high school, I wanted to learn more about audio signal analysis and how it can be incorporated with artifical intelligence. The university had just recently conducted an artifical intelligence course, so I decided to discuss the project with the lecturer, Dr. Ivan Dungan. After workshopping the project and scaling back the intensity, we landed on our research question: Can a computer sort musical selections into predetermined genres with little to no human interaction?

Background

Persistent Homology

Though the two main topics for this research were signal analysis and artifical intelligence, this work also introduced me to more intricate concepts of data science. For this project, we utilized topological data science, using simple complexes from topology to describe large structure sets. Within topological data science is a method named persistent homology that studies features within graphical planes. Speficially, this targets persistent features at different spatial scales to separate important aspects from noise.

To study these identities, I utilized the Viteoris-Rips complex, a method that analyzes these features as "holes." Given a set of points, we can form a simplex (a simple n-dimensional shape) by defining a distance from each of these points and creating a relationship with all other points that fall within this diameter.

Below is a simple example of the Viteoris-Rips complex. Here we have two sets of points that form squares with different spatial dimensions. When our distance or radius is zero, there are no relationships between the points. However, as we increase the radius, we begin to see the simplex form. With a distance of two radii, line segments or edges are formed within the set of points that form the smaller dimensional square, whereas the set of points that form the larger dimensional square still have no perceived relationship. With a radii of three, the smaller dimensional square is fully formed and what was now once a "hole" is fully realized as a simplex.

K-Means Clustering

K-means clustering is a machine learning algorithm that takes a finite set of data points, separates the data into clusters, and finds centroids to give the best representation for that set of data. The algorithm first generates N number of centroids by randomly selecting data points from the set. Each data point is then assigned into a cluster by measuring the Euclidean distance between the centroid and point. Once all points have been categorized, a new centroid will be determined by taking the average of all data points within that cluster. If the distance between the new centroid and previous centroid are equal to zero, then the most accurate centroid has been found and the algorithm is finished. Otherwise, the method is called recursively with the newly defined centroid until this distance equals zero. Below is a flowchart of this algorithm explained.

Methodology

The initial method was to take an audio signal, convert it into a time series, and filter the signal using a Butterworth low-pass filter to reduce the number of frequencies and leave a smooth, low-frequency signal for analysis. Our hypothesis was that if we isolated the lower frequencies, we would obtain the most important features that could be used to identify the genre. These filtered samples are then processed through a window function to produce different views of the dataset and predict future trends. Furthermore, this function would define the point cloud that would be processed by our persistence diagram to determine features.

Though it was functional, the output was almost impossible to decipher any unique features from either the point cloud or persistence diagram as seen below. We needed to rethink our methodology and reduce our sample size from the audio signal, while still obtaining the most meaningful pieces of data from the input.

Dense Point Cloud	Dense Persistence Diagram

Instead of looking at the lowest frequencies, we decided to look at the highest frequencies and take the maximum values within a fixed time interval. This idea surfaced after a brief discussion with musical experts that highlighted how higher frequencies can carry more distinctive characteristics than lower frequencies. To verify this new hypothesis, we implemented this idea with a simple sine curve, taking the maximum-value at every two-second interval. The resulting point cloud from the window function outputs a perfect circle, a unique and identifiable shape to describe the basic audio signal. When utilizing this new method, we are given more unique and decipherable point clouds to analyze.

When analyzing the persistence diagram, there are two sets of data that are being visualized: $H_0$ and $H_1$. We are more concerned with $H_1$ where there are more variations between the birth and death of the points (birth and death meaning how long do they persist). After separating the two sets, we then rotate the data points $45\degree$ clockwise to align the points along the x-axis for better visualization.

Now, we discretize the data points within a specific range to fit within a two-dimensional matrix. This matrix is defined with an eight-by-eight resolution to create the most unique and identifiable structures for each song. These matrices are created for each song, creating identifiable vectors for each selection. For testing, we wanted to chose two distinct genres that were (mostly) easily decipherable by the human ear for comparison. We ended up choosing Rock and Country as our two distinct genre categories, selecting 50 songs from each genre to create two distinct centroids.

Results

Using K-Fold Cross-Validation, we measured the performance of our method by randomly selecting ten songs for ten tests, and then averaging the binary score of either correctly identifying the genre (1) or incorrectly identifying the genre (0). After the ten tests, the overall accuracy of the method is 61% with a standard deviation of 0.94. Our goal was to have the methodology score higher than 50% to be deemed successful. Though it does reach that standard, there are many variables that can be altered to test for higher validity. This includes the variables for the window function that create the point clouds, or looking further into how the data can be discretized into a more meaningful vector.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
images		images
README.md		README.md
country_cluster.py		country_cluster.py
k_means.py		k_means.py
main.py		main.py
musiplexity.py		musiplexity.py
rock_cluster.py		rock_cluster.py
soundv17.py		soundv17.py
visualization.py		visualization.py
window_function.py		window_function.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Musiplexity: Classifying Music by Genre Using Data Analysis and Machine Learning

Inspiration

Background

Persistent Homology

K-Means Clustering

Methodology

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Musiplexity: Classifying Music by Genre Using Data Analysis and Machine Learning

Inspiration

Background

Persistent Homology

K-Means Clustering

Methodology

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages