0% found this document useful (0 votes)

10 views20 pages

Harsh Veer Python Project

This document outlines a data analysis project focused on extracting insights from music datasets using Python and various libraries such as Pandas, Numpy, Matplotlib, and Seaborn. It details the process of data exploration, including identifying null values, analyzing song popularity, and visualizing relationships through various plots. The project emphasizes the importance of data analysis in understanding trends in the music industry, particularly through the lens of Spotify's data.

Uploaded by

Harsh Veer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views20 pages

Harsh Veer Python Project

Uploaded by

Harsh Veer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Spotify

Data
Analysis
Python
Project🎼🎧
Harsh veer / Data analyst
INTRODUCTION:
In todays changing world data analysis has become crucial in fields such, as
business, research and meteorology.

The immense potential of data analysis is evident in this project, which focuses on
extracting insights from music related datasets using Python. At its core Spotify
takes stage as an audio streaming giant with captivating features like seamless
song sharing and synchronized lyrics display.

From analyzing to visualizing the data this project covers all aspects of data
processing. The interactive environment provided by Jupyter notebook enhanced
my experience by allowing me to engage with the data and discover patterns.

• Tools Used:
Programming Language: Python

• Libraries: Pandas, Numpy, Matplotlib, Seaborn

• IDE: Jupyter Notebook

IMPORT REQUIRED LIBRARIES

• import numpy as np: This imports the NumPy library and aliases it as 'np'.
NumPy is used for numerical computations and provides support for arrays
and matrices.
• import pandas as pd: This imports the Pandas library and aliases it as 'pd'.
Pandas is used for data manipulation and analysis, providing data structures
like DataFrames for tabular data.
• import matplotlib.pyplot as plt: This imports the Pyplot module from the
Matplotlib library and aliases it as 'plt'. Matplotlib is a popular plotting library
in Python, and Pyplot provides a convenient interface to create
visualizations.
• import seaborn as sns: This imports the Seaborn library and aliases it as 'sns'.
Seaborn is built on top of Matplotlib and offers a higher-level interface for
creating attractive statistical visualizations.
EXPLORING THE DATASET

• sp_tracks = pd.read_csv('D:/spotifydata/tracks.csv')
• sp_feature = pd.read_csv('D:/spotifydata/SpotifyFeatures.csv')

• sp_tracks = pd.read_csv('D:/spotifydata/tracks.csv'): This line reads a CSV

file named 'tracks.csv' located at the 'D:/spotifydata/' directory and loads its
data into a Pandas DataFrame called sp_tracks. This DataFrame is likely to
contain information about tracks.
• sp_feature = pd.read_csv('D:/spotifydata/SpotifyFeatures.csv'): This line
reads another CSV file named 'SpotifyFeatures.csv' from the same directory
and loads its data into a separate Pandas DataFrame called sp_feature.
This DataFrame probably contains additional features or attributes related to
the Spotify tracks.
#viewing the tracks data
sp_tracks.head()
• NOTE:The image provided is not the entirety of the complete image,
as there are restrictions in capturing full images through
screenshots. To access the comprehensive table, please refer to the
Jupyter notebook folder within this repository.
• sp_tracks.head(): This line of code calls the head() method on the
'sp_tracks' DataFrame. The head() method is used to display the first few
rows of the DataFrame. This is useful for quickly getting an overview of the
data.
• sp_tracks.head(): This line of code calls the head() method on the
#viewing the feature data
• sp_feature.head()
• #viewing the feature data: This is a comment that indicates the following line
of code is intended to display or view the data in the 'sp_feature'
DataFrame.
• sp_feature.head(): This line of code calls the head() method on the
'sp_feature' DataFrame. The head() method is used to display the first few
rows of the DataFrame. This allows you to quickly inspect the initial records
and get a sense of the data.
IDENTIFYING NULL
VALUES IN THE DATASET
#checking null in tracks data
• pd.isnull(sp_tracks).sum()

pd.isnull(sp_tracks).sum(): This line of code

uses the pd.isnull() function on the 'sp_tracks'
DataFrame to create a boolean DataFrame
where each cell contains True if the
corresponding cell in the original DataFrame is
null and False otherwise. The .sum() function is
then used to count the number of True values in
each column, effectively giving you the count of
missing values in each column.

Did the same for the feature data.

Dataset Overview: Rows, Columns, Data Types, and
Memory Usage

#checking info in tracks

• data sp_tracks.info()

• sp_tracks.info(): This line of code calls

the info() method on the 'sp_tracks'
DataFrame. The info() method provides a
concise summary of the DataFrame,
including the data types of each column, the
number of non-null values, and memory
usage. It's a useful way to get a quick
overview of the data and its structure.

• Did the same for features data.

Extracting Insights from the Dataset through Analysis📊
1.Exploring the 10 Least Popular Songs in the
Spotify Dataset

a=sp_tracks.sort_values('popularity',ascendi
ng=True)[0:10]
a[['name','popularity']]

a = sp_tracks.sort_values('popularity',
ascending=True)[0:10]: This line of code
creates a new DataFrame a by sorting the
'sp_tracks' DataFrame based on the
'popularity' column in ascending order.
The [0:10] notation selects the first 10 rows
of the sorted DataFrame, effectively selecting
the 10 least popular tracks.

a[['name', 'popularity']]: This line of code

selects specific columns, namely 'name' and
'popularity', from the DataFrame a created in
the previous line. This will show the names of
the 10 least popular tracks along with their
corresponding popularity values.
2.Discovering the Top 10 Popular
Songs in the Spotify Dataset

a=sp_tracks
b=a[a['popularity']>90].sort_values('popularity',asc
ending=False)[:10] b[['name','popularity','artists']]

a = sp_tracks: This line of code assigns the 'sp_tracks'

DataFrame to a new DataFrame variable a.

b = a[a['popularity'] > 90].sort_values('popularity',

ascending=False)[:10]: This line of code creates a new
DataFrame b by selecting rows from the DataFrame a where
the 'popularity' column is greater than 90. The DataFrame is
then sorted in descending order based on the 'popularity'
column, and the first 10 rows are selected. This effectively
gives you the top 10 most popular tracks.

b[['name', 'popularity', 'artists']]: This line of code selects

specific columns ('name', 'popularity', and 'artists') from the
DataFrame b created in the previous line. This will display
the names, popularity values, and artist information of the
top 10 most popular tracks.
3.Setting Release Date as the Index Column
sp_tracks.set_index('release_date',inplace=True)
sp_tracks.index=pd.to_datetime(sp_tracks.index)
sp_tracks.head()

• sp_tracks.set_index('release_date', inplace=True): This line of code sets the 'release_date' column as

the index of the 'sp_tracks' DataFrame. The inplace=True argument modifies the DataFrame in place,
meaning the change is applied directly to the original DataFrame.
• sp_tracks.index = pd.to_datetime(sp_tracks.index): This line of code converts the index of the
'sp_tracks' DataFrame to a datetime format using the pd.to_datetime() function. This is often done to
ensure that the index represents dates in a meaningful way, allowing for time-based operations.
• sp_tracks.head(): This line of code calls the head() method on the 'sp_tracks' DataFrame, which will
display the first few rows of the DataFrame with the updated index.
4. Converting Song Duration from Milliseconds to Seconds
sp_tracks['duration'] = sp_tracks['duration_ms'].apply (lambda x : round(x/1000))
sp_tracks.drop('duration_ms', inplace = True, axis=1)
sp_tracks.duration.head()

sp_tracks['duration'] = sp_tracks['duration_ms'].apply(lambda x: round(x/1000)): This line of code creates a new column

called 'duration' in the 'sp_tracks' DataFrame. It calculates the duration in seconds by applying a lambda function to the
'duration_ms' column. The lambda function divides the 'duration_ms' value by 1000 and rounds it to get the duration in seconds.

sp_tracks.drop('duration_ms', inplace=True, axis=1): This line of code removes the original 'duration_ms' column from the
'sp_tracks' DataFrame. The inplace=True argument makes the change directly to the DataFrame.

sp_tracks.duration.head(): This line of code displays the first few values from the newly created 'duration' column in the 'sp_tracks'
DataFrame.
Visualization: Pearson Correlation Heatmap for Two Variables

td = sp_tracks.drop(['key','mode','explicit'], axis=1).corr(method =
'pearson')
plt.figure(figsize=(9,5))hmap = sns.heatmap(td, annot = True, fmt =
'.1g', vmin=-1, vmax=1, center=0, cmap='Greens', linewidths=0.1,
linecolor='black')
hmap.set_title('CorrelationHeatMap')hmap.set_xticklabels(hmap.g
et_xticklabels(), rotation=90

• td = sp_tracks.drop(['key', 'mode', 'explicit'], axis=1).corr(method='pearson'): This line

of code creates a correlation matrix by calculating Pearson correlation coefficients
between numeric columns in the 'sp_tracks' DataFrame. It drops the columns 'key',
'mode', and 'explicit' before calculating the correlations.

• plt.figure(figsize=(9, 5)): This line of code sets the figure size for the upcoming
heatmap visualization using Matplotlib.

• hmap.set_title('Correlation HeatMap'): This line of code sets the title for the heatmap
visualization.
• hmap = sns.heatmap(td, annot=True, fmt='.1g', vmin=-1, vmax=1, center=0,
cmap='Greens', linewidths=0.1, linecolor='black'): This line of code uses
Seaborn's heatmap() function to create a heatmap visualization of the
correlation matrix. It displays the correlation values as annotations, uses
a color map ('Greens') to represent the correlation strength,
and sets the range of correlation values to be between -1 and 1.

• hmap.set_xticklabels(hmap.get_xticklabels(), rotation=90): This line of

code rotates the x-axis labels of the heatmap for better readability.
Regression Plot of Loudness vs. Energy with Regression Line

plt.figure(figsize=(8,4))
sns.regplot(data=sample_sp, y='loudness', x='energy',
color='#054907').set(title='Regression Plot - Loudness vs Energy Correlation')

• plt.figure(figsize=(8, 4)): This line of code sets the figure size for the
upcoming visualization using Matplotlib.
• sns.regplot(data=sample_sp, y='loudness', x='energy', color='#054907'):
This line of code uses Seaborn's regplot() function to create a regression
plot. It visualizes the relationship between the 'loudness' and 'energy'
columns from the sample_sp DataFrame. The color='#054907' argument
sets the color of the plot.
• .set(title='Regression Plot - Loudness vs Energy Correlation'): This line of
code sets the title for the regression plot.
Line Graph: Duration of Songs Over Each Year
total_dr = sp_tracks.duration
fig_dims = (15,5)
fig, ax = plt.subplots(figsize=fig_dims)
fig = sns.barplot(x = years, y = total_dr, ax = ax, errwidth = False).set(title='Years vs
Duration')
plt.xticks(rotation=90)

• total_dr = sp_tracks.duration: This line of code creates a new

variable total_dr and assigns the values from the 'duration' column of the
'sp_tracks' DataFrame to it.
• fig_dims = (15, 5): This line of code sets the dimensions of the figure for the
upcoming visualization.
• fig, ax = plt.subplots(figsize=fig_dims): This line of code uses Matplotlib to create
a subplot figure with the specified dimensions. It returns two variables: fig (the
figure) and ax (the axis).
• fig = sns.barplot(x=years, y=total_dr, ax=ax, errwidth=False): This line of code uses
Seaborn's barplot() function to create a bar plot. It plots the 'years' on the x-axis
and 'total_dr' (duration) on the y-axis, using the provided axis ax.
The errwidth=False argument disables error bars.
• .set(title='Years vs Duration'): This line of code sets the title for the bar plot.
plt.xticks(rotation=90): This line of code rotates the x-axis tick labels for better
readability.
Bar Plot: Top Five Genres by Popularity

sns.set_style(style='darkgrid')
plt.figure(figsize=(8,4))
Top = sp_feature.sort_values('popularity', ascending=False)[:10]
sns.barplot(y = 'genre', x = 'popularity', data = Top).set(title='Genres by
Popularity-Top 5')
• sns.set_style(style='darkgrid'): This line of code sets the style of the Seaborn plots to
'darkgrid' style, which Includes a dark grid in the background of the plot.

• plt.figure(figsize=(8, 4)): This line of code sets the figure size for the upcoming
visualization using Matplotlib.

• Top = sp_feature.sort_values('popularity', ascending=False)[:10]: This line of code

creates a new DataFrame Top by sorting the 'sp_feature' DataFrame in descending order
based on the 'popularity' column and selecting the top 10 rows.

• sns.barplot(y='genre', x='popularity', data=Top).set(title='Genres by Popularity-Top 5'):

This line of code uses Seaborn's barplot() function to create a bar plot. It plots the 'genre'
on the y-axis and 'popularity' on the x-axis from the Top DataFrame. The .set() function
sets the title for the plot.
THANK YOU!

DAV Project
No ratings yet
DAV Project
11 pages
Aneesha Big Data Project
No ratings yet
Aneesha Big Data Project
4 pages
Project Spotify Haseeb
No ratings yet
Project Spotify Haseeb
37 pages
Lab Numpy Pandas Matplot
No ratings yet
Lab Numpy Pandas Matplot
5 pages
T Sivaprakash MBA BA03 040 Capstone Project
No ratings yet
T Sivaprakash MBA BA03 040 Capstone Project
16 pages
Loading and Wrangling Data With Pandas and NumPy
No ratings yet
Loading and Wrangling Data With Pandas and NumPy
46 pages
Analyse
No ratings yet
Analyse
2 pages
Spottify 1
No ratings yet
Spottify 1
8 pages
DataFrame Basics in Data Analytics
No ratings yet
DataFrame Basics in Data Analytics
9 pages
Music Recommendation System Regression
No ratings yet
Music Recommendation System Regression
57 pages
Escal - GT3 - Jupyter Notebook
No ratings yet
Escal - GT3 - Jupyter Notebook
14 pages
Pandas Numpy Handing Data
No ratings yet
Pandas Numpy Handing Data
32 pages
ML Lab1 Python Panda
No ratings yet
ML Lab1 Python Panda
9 pages
Ip - Report - Kuti Page
No ratings yet
Ip - Report - Kuti Page
37 pages
Spotify 1
No ratings yet
Spotify 1
7 pages
Pandas Notes
No ratings yet
Pandas Notes
27 pages
Lab 10
No ratings yet
Lab 10
2 pages
Python Data Visualization Guide
No ratings yet
Python Data Visualization Guide
15 pages
Advanced Plot Types With Seaborn
No ratings yet
Advanced Plot Types With Seaborn
8 pages
TV Scientific Assessment
No ratings yet
TV Scientific Assessment
9 pages
Pandas For Machine Learning: Acadview
No ratings yet
Pandas For Machine Learning: Acadview
18 pages
Data Frame
No ratings yet
Data Frame
95 pages
Spotify Analysis
No ratings yet
Spotify Analysis
3 pages
Ip Project
No ratings yet
Ip Project
20 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
1 page
Day 30 UnderstandingYourData 7steps
No ratings yet
Day 30 UnderstandingYourData 7steps
4 pages
Updated New Eda Manual
No ratings yet
Updated New Eda Manual
76 pages
Python For Data Analysis Jan 28
No ratings yet
Python For Data Analysis Jan 28
105 pages
Spotify
No ratings yet
Spotify
20 pages
ITS62604 Tutorial 6 (Answer)
No ratings yet
ITS62604 Tutorial 6 (Answer)
2 pages
ML Expt 1 Description
No ratings yet
ML Expt 1 Description
15 pages
Spotify Final Research Report
No ratings yet
Spotify Final Research Report
99 pages
Data Analysis Guide for Beginners
No ratings yet
Data Analysis Guide for Beginners
26 pages
Pandas Notes
No ratings yet
Pandas Notes
8 pages
R Final
No ratings yet
R Final
19 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
ReportAI - Project 1 - Group 13
No ratings yet
ReportAI - Project 1 - Group 13
16 pages
Spotify Data Analysis SQL Project 1712710947
No ratings yet
Spotify Data Analysis SQL Project 1712710947
23 pages
30000songs - Sets - Ipynb - Colaboratory
No ratings yet
30000songs - Sets - Ipynb - Colaboratory
11 pages
Spotify - Dataset Description
No ratings yet
Spotify - Dataset Description
4 pages
MLStack Cafe 2
No ratings yet
MLStack Cafe 2
11 pages
Statss
No ratings yet
Statss
25 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
Data Analytics
No ratings yet
Data Analytics
34 pages
Lesson 1 - Data Visualisation
No ratings yet
Lesson 1 - Data Visualisation
35 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
CRAI AI BOOTCAMP Week Two 2025
No ratings yet
CRAI AI BOOTCAMP Week Two 2025
29 pages
De CBP B3 Spotify
No ratings yet
De CBP B3 Spotify
11 pages
Lecture3 Pandas and Scraping
No ratings yet
Lecture3 Pandas and Scraping
54 pages
PJT Explanation of Code Line by Line
No ratings yet
PJT Explanation of Code Line by Line
2 pages
ML Unit-2 Notes
No ratings yet
ML Unit-2 Notes
17 pages
Pyspark Cheatsheet
No ratings yet
Pyspark Cheatsheet
21 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
Attribute Types
No ratings yet
Attribute Types
11 pages
One-Day Intensive Python Data Analysis and Visuali
No ratings yet
One-Day Intensive Python Data Analysis and Visuali
6 pages
Introduction To Data Analytics: Instructor: Parisa Pouladzadeh Email: Parisa - Pouladzadeh@humber - Ca
No ratings yet
Introduction To Data Analytics: Instructor: Parisa Pouladzadeh Email: Parisa - Pouladzadeh@humber - Ca
81 pages
Dilip PP
No ratings yet
Dilip PP
9 pages
Current Affairs August 2025
No ratings yet
Current Affairs August 2025
46 pages
Comprehensive Report On Immigration Lawyer in UK, Denmark and Netherlandss
No ratings yet
Comprehensive Report On Immigration Lawyer in UK, Denmark and Netherlandss
10 pages
Digital Marketing at A Glance
No ratings yet
Digital Marketing at A Glance
23 pages
Report On Top 20 Law Firms in India
No ratings yet
Report On Top 20 Law Firms in India
7 pages
Financial Services
No ratings yet
Financial Services
25 pages
Report On Top Lawyer Firms in Indiaa
No ratings yet
Report On Top Lawyer Firms in Indiaa
24 pages
Instagram User Analytics
100% (1)
Instagram User Analytics
7 pages
Analyzing The Impact of Automation On Employment in Different Us
No ratings yet
Analyzing The Impact of Automation On Employment in Different Us
71 pages
Mukesh 2
No ratings yet
Mukesh 2
13 pages
Stock Market Prediction Using Machine Learning
100% (1)
Stock Market Prediction Using Machine Learning
49 pages
B. SC - Data Science
No ratings yet
B. SC - Data Science
50 pages
Class 12 IP Practice Assignment Series 2
No ratings yet
Class 12 IP Practice Assignment Series 2
4 pages
Pandas - 1
No ratings yet
Pandas - 1
45 pages
Python in Large-Scale Linear Algebra
No ratings yet
Python in Large-Scale Linear Algebra
11 pages
Python Unit Iv - Pandas
No ratings yet
Python Unit Iv - Pandas
36 pages
GSD Documentation: Release 2.1.1
No ratings yet
GSD Documentation: Release 2.1.1
99 pages
Haar Cascades 2 Ref
No ratings yet
Haar Cascades 2 Ref
59 pages
AI ML Python Content
No ratings yet
AI ML Python Content
4 pages
KMeans Clustering Bidimensional Daniel Ames Camayo
No ratings yet
KMeans Clustering Bidimensional Daniel Ames Camayo
15 pages
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
No ratings yet
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
5 pages
Python For Data Science
No ratings yet
Python For Data Science
22 pages
R for Data Science Enthusiasts
No ratings yet
R for Data Science Enthusiasts
85 pages
Matplotlib-Users Guide 0.90.0
No ratings yet
Matplotlib-Users Guide 0.90.0
101 pages
Python Programming Overview
No ratings yet
Python Programming Overview
6 pages
GAN Tutorial for Python Developers
No ratings yet
GAN Tutorial for Python Developers
5 pages
ML Assignment 02
No ratings yet
ML Assignment 02
8 pages
AIML LAB Final
No ratings yet
AIML LAB Final
13 pages
DL Programs
No ratings yet
DL Programs
13 pages
CV Lab 03 - Introduction To OpenCV (Updated 2024)
No ratings yet
CV Lab 03 - Introduction To OpenCV (Updated 2024)
9 pages
NumPy For IDL Users - Mathesaurus
No ratings yet
NumPy For IDL Users - Mathesaurus
13 pages
C3 W1 Anomaly Detection
No ratings yet
C3 W1 Anomaly Detection
14 pages
Learning Opencv 3 Computer Vision With Python Up
No ratings yet
Learning Opencv 3 Computer Vision With Python Up
49 pages
Python ML Tutorial: Scikit-Learn Wine Quality
No ratings yet
Python ML Tutorial: Scikit-Learn Wine Quality
16 pages
Machine Learning Lab Course Overview
No ratings yet
Machine Learning Lab Course Overview
49 pages
Group 12
No ratings yet
Group 12
54 pages
Experiment No 11
No ratings yet
Experiment No 11
4 pages
AD3411 - 1 To 5
No ratings yet
AD3411 - 1 To 5
11 pages

Harsh Veer Python Project

Uploaded by

Harsh Veer Python Project

Uploaded by

Spotify

• Libraries: Pandas, Numpy, Matplotlib, Seaborn

• IDE: Jupyter Notebook

• sp_tracks = pd.read_csv('D:/spotifydata/tracks.csv'): This line reads a CSV

pd.isnull(sp_tracks).sum(): This line of code

Did the same for the feature data.

#checking info in tracks

• sp_tracks.info(): This line of code calls

• Did the same for features data.

a[['name', 'popularity']]: This line of code

a = sp_tracks: This line of code assigns the 'sp_tracks'

b = a[a['popularity'] > 90].sort_values('popularity',

b[['name', 'popularity', 'artists']]: This line of code selects

• sp_tracks.set_index('release_date', inplace=True): This line of code sets the 'release_date' column as

sp_tracks['duration'] = sp_tracks['duration_ms'].apply(lambda x: round(x/1000)): This line of code creates a new column

• td = sp_tracks.drop(['key', 'mode', 'explicit'], axis=1).corr(method='pearson'): This line

• hmap.set_xticklabels(hmap.get_xticklabels(), rotation=90): This line of

• total_dr = sp_tracks.duration: This line of code creates a new

• Top = sp_feature.sort_values('popularity', ascending=False)[:10]: This line of code

• sns.barplot(y='genre', x='popularity', data=Top).set(title='Genres by Popularity-Top 5'):

You might also like