0% found this document useful (0 votes)

27 views5 pages

PI Analysis

Uploaded by

cameronjemalsmith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views5 pages

PI Analysis

Uploaded by

cameronjemalsmith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Popularity Index Analysis

Marc Schmieder

2/28/2022

1. Exploration

1.1 Dataset attributes

This document summarizes a first draft from the Spotify Popularity Index (PI) analysis. In the current
state, 331 data points from the community were provided, whereas each data point equals one song at a
given time. The data points were anonymized by creating an artist/song ID. Overall, 31 variables (columns)
were provided or engineered through existing data. The column names are

## [1] "ArtistSongId"
## [2] "PopularityIndex"
## [3] "Timestamp"
## [4] "DaysSinceRelease"
## [5] "ReleaseDate"
## [6] "StreamsLast28Days"
## [7] "ListenersLast28Days"
## [8] "SavesLast28Days"
## [9] "StreamsAllTime"
## [10] "ListenersAllTime"
## [11] "NumberOfBlogsThatCoveredTheSong"
## [12] "NumberOfPlaylistsAllTime"
## [13] "EmailAddress"
## [14] "CurrentSpotifyFollowers"
## [15] "PopularityIndexSource"
## [16] "StreamsLast7Days"
## [17] "ListenersLast7Days"
## [18] "SavesLast7Days"
## [19] "DiscoverWeeklyStreamsLast28Days"
## [20] "DiscoverWeeklyStreamsLast7Days"
## [21] "ReleaseRadarStreamsLast28Days"
## [22] "ReleaseRadarStreamsLast7Days"
## [23] "NumberOfBlogsThatCoveredTheSong_num"
## [24] "StreamsLast28Days_PerListener"
## [25] "SavesLast28Days_PerListener"
## [26] "StreamsAllTime_PerListener"
## [27] "NumberOfPlaylistsAllTime_PerListener"
## [28] "StreamsLast7Days_PerListener"
## [29] "SavesLast7Days_PerListener"
## [30] "PopularityIndexSource_Bin"
## [31] "ReleasePassed21Days"

1
As to be seen later in the analysis, the variables

c("StreamsLast28Days", "ListenersLast28Days",
"StreamsLast7Days", "ListenersLast7Days")

## [1] "StreamsLast28Days" "ListenersLast28Days" "StreamsLast7Days"

## [4] "ListenersLast7Days"

are the most important ones for the later conducted model, predicting the Popularity Index

1.2 Mean values for different PI

As certain PI numbers are criterias for getting into algorithmic playlists, the following table shows the mean
values in the data for PI indices of 20-22, 30-32 and 40-42. With that information, an artist can estimate
what streams/listeners/saves they need to achieve those PI’s.

## Lower Upper StreamsLast28Days ListenersLast28Days SavesLast28Days

## 1: 20 22 2503.057 993.8529 375.5143
## 2: 30 32 9217.862 4097.1724 447.2759
## 3: 40 42 32185.889 13334.5556 1020.4444
## StreamsLast7Days ListenersLast7Days SavesLast7Days
## 1: 776.500 384.750 67.15000
## 2: 2348.966 1334.759 85.89655
## 3: 10506.778 5801.778 343.00000

1.3 Variables plottet vs PI

## Warning: Removed 126 rows containing missing values (geom_point).

2
Scaled variable vs PI

Variable StreamsLast28Days ListenersLast28Days StreamsLast7Days ListenersLast7Days

52
between 0 and 1 scaled values

0.00 0.11 0.22 0.33 0.44 0.56 0.67 0.78 0.89 1.00

between 0 and 1 scaled values

The grafik shows the between 0 and 1 scaled values of the Streams last 28/7 days and listeners last 28/7
days. It can be seen that the Popularity index is dependent in a (not perfect) quadratic function from those
variables.

2. Model
This is a regression problem where any model can predict continuous numbers, but the true values can only
be of integer type. As an algorithm, the XGboost regression tree was chosen, a (still) state of the art machine
learning algorithm that builds on tree boosting.
The model was trained on 263 obversations (75 percent) and fitted on 68 (25 percent) observations (never
seen by the model). The test set of 68 data points was sampled at random, but taking into account an equal
distribution between new songs (<21 days) and older songs.

2.1 Model performance

## [1] 1.524063

The mean absolute deviation (mad) of 1.52 states that the predictions of the model deviate in mean 1.52
from the true PI. For example if the true PI for one song is 32, we can expect, that the model will predict a
value that is around 33.5 or 30.5. So the model is working well but not perfect. There can be several reasons
for that the prediction of the model is not perfect.

1. Quality of data: false numbers entered or false PI entered

3
2. Time-delay of PI (updates every few days, also in Spotify for artist it is not updated in real time)
3. Rounding of PI: Internally the formula from Spotify probably results in a continues number (e.g. 24.32)
but is then rounded for the public display. This is sort of imperfect information that can bias the model.
4. There are factors/variables that contribute to the formula that are not publicly accessible/not in the
community driven dataset.
5. The algorithm itself lacks potential (model tuning could increase performance or choosing another
model like random forest or a neural net)

I personally suspect that it is a combination of 1-3 and think that 4 is rather unlikely.

2.3 Variable importance

Variable Importance in XGBoost model

StreamsLast28Days
ListenersLast28Days
ListenersAllTime
StreamsAllTime
SavesLast28Days_PerListener
StreamsLast7Days
DaysSinceRelease
StreamsAllTime_PerListener
CurrentSpotifyFollowers
ListenersLast7Days
Variable

StreamsLast7Days_PerListener
NumberOfPlaylistsAllTime_PerListener
SavesLast28Days
DiscoverWeeklyStreamsLast28Days
ReleaseRadarStreamsLast28Days
NumberOfPlaylistsAllTime
SavesLast7Days
StreamsLast28Days_PerListener
SavesLast7Days_PerListener
NumberOfBlogsThatCoveredTheSong_num
DiscoverWeeklyStreamsLast7Days
ReleaseRadarStreamsLast7Days
PopularityIndexSource_Bin
0.0 0.2 0.4 0.6 0.8
Gain in XGBoost model
The grafik shows the variable importance of the computed model. That is, how much each variable
contributes to the model. It is seen that streams last 28 days and listeners last 28 days are by far
the most important variables. It should be noted that other variables that are highly correlate with
StreamsLast28Days and ListenersLast28Days are important too, but the XGBoost model only needs one
representor for those sets of variables. It can be observed, that the number of saves do appear very late in
the variable importance ranking.

2.3 Graph of predictions

Lets take a look at the predictions and true values of the test data set.

4
Evaluation Plot: Comparison of predictions and true values
Model: XGboost

Type of point: actuals pred_xg

35
Popularity Index

0
0 5 10 15 20 25 30 35 40 45 50 55 60 65

Data point in unseen data

The grafik shows the prediction of the XGBoost model vs the actual values. Most observations were predicted
with decent accuracy. Only the predictions of some points show a serious deviation from the actual values.
It could be of merit to look into those data points to see which reasons 1-5 from 2.2 do apply.

Dostojewski Notatki Z Podziemia (Całość)
No ratings yet
Dostojewski Notatki Z Podziemia (Całość)
102 pages
Hit Song Prediction Based On Early Adopter Data and Audio Features
No ratings yet
Hit Song Prediction Based On Early Adopter Data and Audio Features
2 pages
Middlemarch: Realism Explored
100% (1)
Middlemarch: Realism Explored
31 pages
ML (Project) Merged
No ratings yet
ML (Project) Merged
16 pages
Exploring Spotifys Music Popularity Dynamics and
No ratings yet
Exploring Spotifys Music Popularity Dynamics and
7 pages
Spotify Final Research Report
No ratings yet
Spotify Final Research Report
99 pages
Music Popularity Prediction Through Data Analysis
No ratings yet
Music Popularity Prediction Through Data Analysis
6 pages
Spotify Analysis
No ratings yet
Spotify Analysis
3 pages
Spotify Music Trends Analysis Project
No ratings yet
Spotify Music Trends Analysis Project
14 pages
De CBP B3 Spotify
No ratings yet
De CBP B3 Spotify
11 pages
Spotify Music Trends Analysis Report
No ratings yet
Spotify Music Trends Analysis Report
4 pages
ML Models Predicting Song Hits
No ratings yet
ML Models Predicting Song Hits
5 pages
Aneesha Big Data Project
No ratings yet
Aneesha Big Data Project
4 pages
Mcon 1 2 ML Methodology MCON
No ratings yet
Mcon 1 2 ML Methodology MCON
2 pages
B3 Song Popularity Analysis
No ratings yet
B3 Song Popularity Analysis
65 pages
Ieee Paper
No ratings yet
Ieee Paper
6 pages
Project Report
No ratings yet
Project Report
39 pages
Raagalytics - PPT For ICBAI
No ratings yet
Raagalytics - PPT For ICBAI
7 pages
QT Project (B)
No ratings yet
QT Project (B)
16 pages
Spotify Data Analysis Report
No ratings yet
Spotify Data Analysis Report
6 pages
Assignment 2-Individual Assignment MMA867
No ratings yet
Assignment 2-Individual Assignment MMA867
2 pages
T Sivaprakash MBA BA03 040 Capstone Project
No ratings yet
T Sivaprakash MBA BA03 040 Capstone Project
16 pages
Spotify Song Cohort Analysis
No ratings yet
Spotify Song Cohort Analysis
5 pages
Spotify Case Study
No ratings yet
Spotify Case Study
7 pages
DS Final Project PDF
No ratings yet
DS Final Project PDF
20 pages
Project Spotify Haseeb
No ratings yet
Project Spotify Haseeb
37 pages
Internet News Data With Readers Engagement
No ratings yet
Internet News Data With Readers Engagement
3 pages
FWD - Akansha Vendor (Report) Only 2500 Wc... Deadline - Tomorrow (5 PM)
No ratings yet
FWD - Akansha Vendor (Report) Only 2500 Wc... Deadline - Tomorrow (5 PM)
12 pages
PA Assignment 2
No ratings yet
PA Assignment 2
3 pages
Spotify Data Explaination
No ratings yet
Spotify Data Explaination
2 pages
Escal - GT3 - Jupyter Notebook
No ratings yet
Escal - GT3 - Jupyter Notebook
14 pages
A Model For Predicting Music Popularity On Streami
No ratings yet
A Model For Predicting Music Popularity On Streami
10 pages
Ip Spotify Music Analysis
No ratings yet
Ip Spotify Music Analysis
11 pages
Spotify Analysis
No ratings yet
Spotify Analysis
1 page
Project 2 Spotify
No ratings yet
Project 2 Spotify
2 pages
Data Mining
No ratings yet
Data Mining
16 pages
Moodify Elevating Music Discovery 1738957979
No ratings yet
Moodify Elevating Music Discovery 1738957979
8 pages
Spotify 1
No ratings yet
Spotify 1
7 pages
Spotify Assignment
No ratings yet
Spotify Assignment
10 pages
Spotify's Tech-Driven Growth
No ratings yet
Spotify's Tech-Driven Growth
11 pages
Music Recommendation On Spotify Using Deep Learning: Chhavi Maheshwari
No ratings yet
Music Recommendation On Spotify Using Deep Learning: Chhavi Maheshwari
9 pages
C1M4 PracticeLab 1 Spotify Case Study Attachment
No ratings yet
C1M4 PracticeLab 1 Spotify Case Study Attachment
11 pages
Music Data Analysis with R Packages
No ratings yet
Music Data Analysis with R Packages
76 pages
Music - Genre - Classification Final Paper1 Copy Final
No ratings yet
Music - Genre - Classification Final Paper1 Copy Final
16 pages
Stats End Sem Presentation
No ratings yet
Stats End Sem Presentation
14 pages
A Song Classifier For Predicting User Preference Based On Spotify Song Attributes
No ratings yet
A Song Classifier For Predicting User Preference Based On Spotify Song Attributes
6 pages
Songs
No ratings yet
Songs
3 pages
Paper 2 (Spotify)
No ratings yet
Paper 2 (Spotify)
9 pages
AI-Powered Music Recommendations
No ratings yet
AI-Powered Music Recommendations
10 pages
Spotify Top Hits
No ratings yet
Spotify Top Hits
6 pages
Predicting Song Popularity: James Pham Edric Kyauk Edwin Park
No ratings yet
Predicting Song Popularity: James Pham Edric Kyauk Edwin Park
5 pages
Harsh Veer Python Project
No ratings yet
Harsh Veer Python Project
20 pages
DM Final Report
No ratings yet
DM Final Report
4 pages
HDSC Spring 23 Premier Project
No ratings yet
HDSC Spring 23 Premier Project
18 pages
Predicting Music Popularity Using Spotify and YouT
No ratings yet
Predicting Music Popularity Using Spotify and YouT
14 pages
Report
No ratings yet
Report
14 pages
R Exam
No ratings yet
R Exam
3 pages
Leveraging Social Media in The Music Industry: September 2019
No ratings yet
Leveraging Social Media in The Music Industry: September 2019
63 pages
GPM Music Group's The Spotify Success Formula
No ratings yet
GPM Music Group's The Spotify Success Formula
14 pages
UMG 3Q2023 Results Press Release
No ratings yet
UMG 3Q2023 Results Press Release
8 pages
v1 Housing Crisis Netherlands
No ratings yet
v1 Housing Crisis Netherlands
6 pages
CWR06-1950 CWR Implementation Spreadsheet 2016-07-01 EN
No ratings yet
CWR06-1950 CWR Implementation Spreadsheet 2016-07-01 EN
3 pages
CWR11-1991R4 Functional Specifications CWR Version 2-1 Rev8 2019-10-18 en
No ratings yet
CWR11-1991R4 Functional Specifications CWR Version 2-1 Rev8 2019-10-18 en
79 pages
Final Report
No ratings yet
Final Report
28 pages
Digital Marketing Insights 2022
No ratings yet
Digital Marketing Insights 2022
33 pages
Goodwill Social Marketing Plan
No ratings yet
Goodwill Social Marketing Plan
36 pages
Songtrust Terms of Service Mar 08 2023
No ratings yet
Songtrust Terms of Service Mar 08 2023
11 pages
CO2 Fire Suppression Systems Guide
100% (2)
CO2 Fire Suppression Systems Guide
21 pages
Audels Engineers and Mechanics Guide Volume 5 From WWW Jgokey Com
No ratings yet
Audels Engineers and Mechanics Guide Volume 5 From WWW Jgokey Com
556 pages
Health Psychology: Well-Being in A Diverse World Regan A R Gurung Instant Download
100% (1)
Health Psychology: Well-Being in A Diverse World Regan A R Gurung Instant Download
59 pages
Teaching Tools for Parsing Education
No ratings yet
Teaching Tools for Parsing Education
5 pages
Autodesk Inventor - Design Accelerator
No ratings yet
Autodesk Inventor - Design Accelerator
23 pages
SKF3013 - Manual Amali PDF
No ratings yet
SKF3013 - Manual Amali PDF
26 pages
Linear Array Operations in C++
No ratings yet
Linear Array Operations in C++
4 pages
BALLOU Inclusion VS Empathy
No ratings yet
BALLOU Inclusion VS Empathy
5 pages
Ensayo Sobre El Patriotismo
100% (1)
Ensayo Sobre El Patriotismo
6 pages
Effectiveness of Structured Teaching Programme On Knowledge Regarding Acid Peptic Disease and Its Prevention Among The Industrial Workers
No ratings yet
Effectiveness of Structured Teaching Programme On Knowledge Regarding Acid Peptic Disease and Its Prevention Among The Industrial Workers
6 pages
Higher Education Strategy 2011-2016
No ratings yet
Higher Education Strategy 2011-2016
4 pages
Baxi Roca
No ratings yet
Baxi Roca
3 pages
Monograph (Cha0406) MULA - Dead Leaves Fall (Oef)
No ratings yet
Monograph (Cha0406) MULA - Dead Leaves Fall (Oef)
135 pages
KMS-GL-QUA-SOP-12-PFL.04 - 3rd Party Inspection Process Flowchart
No ratings yet
KMS-GL-QUA-SOP-12-PFL.04 - 3rd Party Inspection Process Flowchart
3 pages
Joseph Matthews - The Renegade Rapport
No ratings yet
Joseph Matthews - The Renegade Rapport
21 pages
PHP Pizza Form
No ratings yet
PHP Pizza Form
1 page
Distributed-Lag Models: Dynamic Effects of Temporary and Permanent Changes
No ratings yet
Distributed-Lag Models: Dynamic Effects of Temporary and Permanent Changes
20 pages
Compitators
No ratings yet
Compitators
32 pages
The Practice of Ecological Art Sacha KAGAN, Institute of Sociology 2014
No ratings yet
The Practice of Ecological Art Sacha KAGAN, Institute of Sociology 2014
7 pages
EfkaPB2001 TDS
No ratings yet
EfkaPB2001 TDS
2 pages
Lab Ex1
100% (1)
Lab Ex1
7 pages
RSettings For 64GT & 99GT PDF
No ratings yet
RSettings For 64GT & 99GT PDF
7 pages
Viviane Namaste - Undoing Theory
No ratings yet
Viviane Namaste - Undoing Theory
23 pages
Types of False Ceilings: 1. Gypsum Plasterboard False Ceiling System
No ratings yet
Types of False Ceilings: 1. Gypsum Plasterboard False Ceiling System
15 pages
Custom DateTimePicker - Custom Controls WinForm C # - RJ Code Advance
No ratings yet
Custom DateTimePicker - Custom Controls WinForm C # - RJ Code Advance
12 pages
Elemental Battle Armor (AP Gauss) (Sqd6)
No ratings yet
Elemental Battle Armor (AP Gauss) (Sqd6)
1 page
A For: Homework #2 Solution
No ratings yet
A For: Homework #2 Solution
3 pages
KUGWETSA Biology End of Term 1
100% (1)
KUGWETSA Biology End of Term 1
12 pages