In today’s day and age, we’re seeing more crossover than ever between musical artists of different genres. This project builds a model which predicts a song's genre based solely on its lyrical content.
A full description of the project can be found at saisenberg.com.
-
Python:
- bs4, numpy, pandas, re, requests, sklearn, string, warnings (
all installed with Anaconda) - json (
!pip install json) - nltk (
!pip install nltk) - xgboost (
!pip install xgboost)
- bs4, numpy, pandas, re, requests, sklearn, string, warnings (
-
R:
lib <- c('dplyr', 'geniusR', 'jsonlite', 'lubridate', 'stringr')
install_packages(lib)
This code scrapes Billboard, Ranker, and TheTopTens for artists of different genres. Any duplicate artists are removed as appropriate.
The output of /python/artist_collection.ipynb can also be found at /data/json_genres.json.
This program scrapes and cleans lyrics from Genius, categorizing results by genre. Visit Genius to view or obtain a Genius client access token.
The output of /r/genius_scraper.R can also be found at /data/lyrics.csv.
This code preprocesses all lyrics for modeling, and runs Naïve Bayes, support vector machine, and gradient boosting models to predict a song's genre from its lyrics.
- Sam Isenberg - saisenberg.com | github.com/saisenberg
This project is licensed under the MIT License - see the LICENSE.md file for details.