Analysis Report
Analysis Report
net/publication/387498935
CITATIONS READS
0 19
1 author:
SEE PROFILE
All content following this page was uploaded by W.M Anushka Sudeera Bandara on 29 December 2024.
Cyclistic
Bike-Share Analysis
As Cyclistic continues to expand, its marketing strategy has primarily focused on creating general
awareness and appealing to broad consumer segments. However, the company’s leadership
recognizes that future growth lies in converting casual riders, who opt for single-ride or full-day
passes, into annual members. Casual riders are already familiar with Cyclistic’s o erings, and their
conversion to membership presents an opportunity to enhance revenue and user retention. To
support this strategic goal, Cyclistic’s marketing analytics team has been tasked with analyzing
historical trip data to uncover insights about the usage patterns of casual riders versus annual
members. These insights will inform the development of a targeted marketing strategy aimed at
increasing annual memberships.
Problem Statement
The success of Cyclistic’s growth strategy hinges on understanding the di erences and
similarities between casual riders and annual members. Casual riders tend to have longer ride
durations and favor weekend and seasonal use, while annual members exhibit consistent
weekday usage patterns. Despite casual riders accounting for 36.6% of all rides, their longer
average ride times indicate a high potential for revenue if converted into members.
To design an e ective marketing strategy, it is crucial to identify key behavioral trends, usage
preferences, and opportunities to in uence casual riders to adopt annual memberships. By
leveraging Cyclistic’s historical trip data, this analysis aims to answer the following question:
1.How do annual members and casual riders use Cyclistic bikes di erently?
The ultimate goal is to provide actionable recommendations, backed by data, to help Cyclistic
increase its annual membership base and drive long-term pro tability.
Ask :
How do annual members and casual riders use Cyclistic bikes di erently :
The goal is to identify what makes annual members di er from the casual riders and what
similarities do they have. And also how can we use those di erences and similarities to convert
Casula riders to annual memberships.
By these insights, the company will create a marketing plan, which will in uence casual riders
become in to annual members. Which the stakeholders (Specially Manor) assume will increase the
pro ts.
fi
ff
fl
fl
ff
ff
fi
ff
fl
ff
ff
ff
ff
Prepare :
The data has been provided by the Motivate International Inc. We use last 11 moths of data
(from 2024 January to November). All the data are public and access by clicking here.
Each month of data are stored in separate css les and each contains details about rider_id,
bike_type, start and end date and time, start and end station name and id, start and end
longitudes and latitudes, and nally the membership type.
There are no any private data about the users, so we can assume there wont be any privacy
issues. All the data will be downloaded to our work computers from the public source and a
backup of that data will be stored in our system.
There are some data type problems and missing values problems and those will be solved.
Because of the number of rows which has missing values are very low, we will remove those rows
from our data and which won’t be a huge a ect on data.
Process :
Because of the size of data, I couldn’t able to use excel or google sheets. So I used the Python
Pandas library and Jupyter notebook.
First I download all the data and combine to a one le. Before doing anything to it, I create a
backup and exported as a CSV le. Then I started the pre cleaning.
There were 5,682,196 rows of data and 13 columns of features. Here are the initial data types of
those features.
ride_id object
rideable_type object
started_at object
ended_at object
start_station_name object
start_station_id object
end_station_name object
end_station_id object
start_lat float64
start_lng float64
end_lat float64
end_lng float64
member_casual object
First, I change the data type of ‘started_at’ and ‘ended_at’ columns to date_time.
fi
fi
ff
fi
fi
Then I checked for any missing values in the data.
ride_id 0
rideable_type 0
started_at 0
ended_at 0
start_station_name 1044760
start_station_id 1044760
end_station_name 1073877
end_station_id 1073877
start_lat 0
start_lng 0
end_lat 7101
end_lng 7101
member_casual 0
Next, I checked the duration of each ride and if a one is less than 1 minute or more than 24 hours,
I removed them too as outliers.
There were only 3 types and each has signi cant amount of rides. So, I assume there are no any
outliers here.
Finally I created 3 more columns (day of week, hour, month) and lled the values.
Now we have a data set of 4,077,579 columns and 17 Rows. As a precautionary measure I
created a backup of the cleaned dataset and saved as a CSV le.
(All the code for preprocessing can be found in the Notebook le.)
fi
fi
fi
fi
Analyze
First I analyze the ‘member_casual’ feature and I understood there are only 2 types of members
and there are more member riders than the casual riders.
member_casual
member 2584181
casual 1493398
Before doing anything, I created 2 di erent data frames for member riders and casual riders. And
then I ran some basic stat analysis using pandas ‘describe’ function.
----------------------
----------------------
In average we can see a casual rider rides for more time than a member rider. Also casual riders
has ride for more time than members in total even though number of member riders are
higher than the casual riders. So we can see there is a potential of gaining pro ts by converting
those casual riders to members.
----------------------
----------------------
fi
Number of rides per day (Member) :
Wednesday 431,950
Tuesday 402,386
Thursday 401,161
Monday 378,374
Friday 364,577
Saturday 321,949
Sunday 283,784
There is a trend that More casual riders rides in week ends and, More member riders rides in
week days. Overall the most favorite day to ride is Saturday.
8 541,323
7 540,941
9 536,997
6 494,342
10 449,116
5 442,289
4 297,798
11 245,951
3 230,278
2 184,736
1 113,808
----------------------
7 231,970
8 228,518
9 216,143
6 208,397
5 167,552
10 159,354
4 93,944
11 68,816
3 62,821
2 38,170
1 17,713
----------------------
Number of rides per Month (Member) :
9 320,854
8 312,805
7 308,971
10 289,762
6 285,945
5 274,737
4 203,854
11 177,135
3 167,457
2 146,566
1 96,095
There is a trend which in overall, least amount of riders tend to ride in Winter season
(November, December, January, February). And most amount of riders tend to ride in Autumn
(July, August, September).
classic_bike 2,657,760
electric_bike 1,371,992
electric_scooter 47,827
----------------------
classic_bike 955,850
electric_bike 511,808
electric_scooter 25,740
----------------------
classic_bike 1701,910
electric_bike 860,184
electric_scooter 22,087
Top 10 start stations for all riders : Top 10 end stations for all riders :
Streeter Dr & Grand Ave 61,666 Streeter Dr & Grand Ave 63193
DuSable Lake Shore Dr & Monroe St 40,971 DuSable Lake Shore Dr & North Blvd 39997
DuSable Lake Shore Dr & North Blvd 36,427 DuSable Lake Shore Dr & Monroe St 39754
Michigan Ave & Oak St 35,923 Michigan Ave & Oak St 36164
Kingsbury St & Kinzie St 34,164 Kingsbury St & Kinzie St 33754
---------------------- ----------------------
Top 10 start stations for Casual riders : Top 10 end stations for Casual riders :
Streeter Dr & Grand Ave 47,916 Streeter Dr & Grand Ave 51945
DuSable Lake Shore Dr & Monroe St 31,782 DuSable Lake Shore Dr & Monroe St 29778
Michigan Ave & Oak St 23,156 DuSable Lake Shore Dr & North Blvd 25037
DuSable Lake Shore Dr & North Blvd 21,274 Michigan Ave & Oak St 24035
Millennium Park 20,574 Millennium Park 22548
---------------------- ----------------------
Top 10 start stations for Member riders : Top 10 end stations for Member riders :
As we can see the most visited statins among casual riders are ‘Streeter Dr & Grand Ave’ and
‘DuSable Lake Shore Dr & Monroe St’. But among member riders ‘Kingsbury St & Kinzie St’ and
‘Clinton St & Washington Blvd’ stations are the most visited.
Share :
Also we can see that the average time of a casual rider is much
higher than a member rider.
Here we can see in week days the number of member riders are
much higher comparing to the week ends.
But in week ends at the number of casual riders are much higher.
As a summery,
Casual riders tend to ride on week ends in autumn, at noon and evening. They takes longer rides
in average. And they mostly prefer classical bikes.
Member riders who are the majority of users are tend to ride on week days at morning and
evening. The month of the year a ect the member riders but do not a ect as casual riders. They
also like classical bikes too.
The most visited station among casual riders are ‘Streeter Dr & Grand Ave’ and ‘DuSable Lake
Shore Dr & Monroe St’. But among member riders ‘Kingsbury St & Kinzie St’ and ‘Clinton St &
Washington Blvd’ stations are the most visited.
ff
ff
Act :
After analyzing all the information, here are my top 3 recommendations :
1. Advertise the bene ts of getting memberships at the most visited stations of casual riders like
‘Kingsbury St & Kinzie St’ and ‘Clinton St & Washington Blvd’.
2. Organize campaigns at week ends to inform about the membership program. May be give
some discounts to early adopters.
3. Show casual riders that how members use the service in high tra c time periods like morning
to make it simple their life.
fi
ffi
import pandas as pd
import numpy as np
import matplotlib as plt
%matplotlib inline
import datetime as dt
plt.rcParams.update({'font.size': 20})
start_station_name start_station_id
end_station_name \
0 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
1 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
2 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
3 Wells St & Randolph St TA1305000030 Larrabee St & Webster
Ave
4 Lincoln Ave & Waveland Ave 13253 Kingsbury St & Kinzie
St
df_data.shape
(5682196, 13)
ride_id object
rideable_type object
started_at object
ended_at object
start_station_name object
start_station_id object
end_station_name object
end_station_id object
start_lat float64
start_lng float64
end_lat float64
end_lng float64
member_casual object
dtype: object
df_data['started_at'] = pd.to_datetime(df_data['started_at'],
format='mixed')
df_data['ended_at'] = pd.to_datetime(df_data['ended_at'],
format='mixed')
df_data.head()
start_station_name start_station_id
end_station_name \
0 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
1 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
2 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
3 Wells St & Randolph St TA1305000030 Larrabee St & Webster
Ave
4 Lincoln Ave & Waveland Ave 13253 Kingsbury St & Kinzie
St
df_data.dtypes
ride_id object
rideable_type object
started_at datetime64[ns]
ended_at datetime64[ns]
start_station_name object
start_station_id object
end_station_name object
end_station_id object
start_lat float64
start_lng float64
end_lat float64
end_lng float64
member_casual object
dtype: object
ride_id 0
rideable_type 0
started_at 0
ended_at 0
start_station_name 1044760
start_station_id 1044760
end_station_name 1073877
end_station_id 1073877
start_lat 0
start_lng 0
end_lat 7101
end_lng 7101
member_casual 0
dtype: int64
(4077579, 13)
df_data.isnull().sum()
ride_id 0
rideable_type 0
started_at 0
ended_at 0
start_station_name 0
start_station_id 0
end_station_name 0
end_station_id 0
start_lat 0
start_lng 0
end_lat 0
end_lng 0
member_casual 0
dtype: int64
df_data['rideable_type'].value_counts()
rideable_type
classic_bike 2657760
electric_bike 1371992
electric_scooter 47827
Name: count, dtype: int64
start_station_name start_station_id
end_station_name \
0 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
1 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
2 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
3 Wells St & Randolph St TA1305000030 Larrabee St & Webster
Ave
4 Lincoln Ave & Waveland Ave 13253 Kingsbury St & Kinzie
St
ride_length
0 7.533333
1 7.216667
2 8.000000
3 29.816667
4 26.200000
#Size of the dataset which has less than 1 moniute or more than 24
hours of ride length.
df_data[(df_data['ride_length'] < 1) | (df_data['ride_length'] >
(60*24))].shape
(39146, 14)
#The dataset which has less than 1 moniute or more than 24 hours of
ride length.
df_data[(df_data['ride_length'] < 1) | (df_data['ride_length'] >
(60*24))].head(5)
start_station_name start_station_id \
79 Lincoln Ave & Waveland Ave 13253
812 Canal St & Madison St 13341
2108 Bissell St & Armitage Ave* chargingstx1
2109 Bissell St & Armitage Ave* chargingstx1
2647 Clark St & Elm St TA1307000039
(4077579, 14)
df_data2.head()
start_station_name start_station_id
end_station_name \
0 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
1 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
2 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
3 Wells St & Randolph St TA1305000030 Larrabee St & Webster
Ave
4 Lincoln Ave & Waveland Ave 13253 Kingsbury St & Kinzie
St
ride_length
0 7.533333
1 7.216667
2 8.000000
3 29.816667
4 26.200000
ride_length day
0 7.533333 Friday
1 7.216667 Monday
2 8.000000 Saturday
3 29.816667 Monday
4 26.200000 Wednesday
(4077579, 17)
Analysis
Analyze the member_casual column
df_data2['member_casual'].unique()
array(['member', 'casual'], dtype=object)
member_types = pd.DataFrame(df_data2['member_casual'].value_counts())
member_types
count
member_casual
member 2584181
casual 1493398
import pandas as pd
import matplotlib.pyplot as plt
start_station_name start_station_id
end_station_name \
34 Stockton Dr & Wrightwood Ave 13276 Kingsbury St &
Kinzie St
35 Stockton Dr & Wrightwood Ave 13276 Kingsbury St &
Kinzie St
36 Stockton Dr & Wrightwood Ave 13276 Kingsbury St &
Kinzie St
93 Clark St & Chicago Ave 13303 Ogden Ave &
Race Ave
98 Indiana Ave & 26th St TA1307000005 Halsted St &
18th St
start_station_name start_station_id
end_station_name \
0 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
1 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
2 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
3 Wells St & Randolph St TA1305000030 Larrabee St & Webster
Ave
4 Lincoln Ave & Waveland Ave 13253 Kingsbury St & Kinzie
St
Basic statistics
df_data2.describe()
started_at ended_at \
count 4077579 4077579
mean 2024-07-10 09:44:49.608702208 2024-07-10 10:01:39.079223808
min 2024-01-01 00:01:01 2024-01-01 00:07:01
25% 2024-05-16 01:13:34.500000 2024-05-16 01:40:53
50% 2024-07-17 17:06:27.833999872 2024-07-17 17:21:42.572999936
75% 2024-09-12 15:30:57.483500032 2024-09-12 15:45:41.916999936
max 2024-11-30 23:50:53.449000 2024-11-30 23:57:43.002000
std NaN NaN
month hour
count 4.077579e+06 4.077579e+06
mean 6.790688e+00 1.399652e+01
min 1.000000e+00 0.000000e+00
25% 5.000000e+00 1.100000e+01
50% 7.000000e+00 1.500000e+01
75% 9.000000e+00 1.700000e+01
max 1.100000e+01 2.300000e+01
std 2.612567e+00 4.780118e+00
df_casual.describe()
started_at ended_at \
count 1493398 1493398
mean 2024-07-17 13:12:33.980971520 2024-07-17 13:36:47.391314944
min 2024-01-01 00:02:15 2024-01-01 00:07:01
25% 2024-05-31 08:14:38.500000 2024-05-31 08:31:13
50% 2024-07-21 18:53:38.862999808 2024-07-21 19:21:07.451500032
75% 2024-09-09 22:38:52.508750080 2024-09-09 22:58:43.360000
max 2024-11-30 23:47:31.938000 2024-11-30 23:56:38.164000
std NaN NaN
month hour
count 1.493398e+06 1.493398e+06
mean 7.027042e+00 1.430723e+01
min 1.000000e+00 0.000000e+00
25% 5.000000e+00 1.100000e+01
50% 7.000000e+00 1.500000e+01
75% 9.000000e+00 1.800000e+01
max 1.100000e+01 2.300000e+01
std 2.307024e+00 4.771942e+00
df_member.describe()
started_at ended_at \
count 2584181 2584181
mean 2024-07-06 06:39:32.575712256 2024-07-06 06:52:05.493449472
min 2024-01-01 00:01:01 2024-01-01 00:12:38
25% 2024-05-04 16:11:04 2024-05-04 16:25:07
50% 2024-07-13 15:03:47.819000064 2024-07-13 15:18:56.516999936
75% 2024-09-13 19:07:43.577999872 2024-09-13 19:19:42.275000064
max 2024-11-30 23:50:53.449000 2024-11-30 23:57:43.002000
std NaN NaN
month hour
count 2.584181e+06 2.584181e+06
mean 6.654099e+00 1.381696e+01
min 1.000000e+00 0.000000e+00
25% 5.000000e+00 1.000000e+01
50% 7.000000e+00 1.500000e+01
75% 9.000000e+00 1.700000e+01
max 1.100000e+01 2.300000e+01
std 2.764641e+00 4.775630e+00
max_len_all = df_data2['ride_length'].max()
max_len_casual = df_casual['ride_length'].max()
max_len_member = df_member['ride_length'].max()
sum_len_all = df_data2['ride_length'].sum()
sum_len_casual = df_casual['ride_length'].sum()
sum_len_member = df_member['ride_length'].sum()
print("\n----------------------\n")
print("\n----------------------\n")
----------------------
Max length of all rides : 1509.3666666666666
Max length of casual rides : 1509.3666666666666
Max length of member rides : 1497.65
----------------------
plt.figure(figsize=(10,10),)
plt.pie(arr, labels=lables, autopct='%1.1f%%')
plt.title("Total Ride Lengths\n")
plt.legend()
plt.show()
def addlabels(x,y):
for i in range(len(x)):
plt.text(i,y[i],round(y[i], 3))
x = ['Casual', 'Member']
y = [mean_len_casual, mean_len_member]
plt.figure(figsize=(15,10),)
bars = plt.bar(x, y)
addlabels(x, y)
bars[1].set_color('C1')
plt.xlabel('Membership types')
plt.ylabel('Average ride length (minutes)')
plt.title('Average Ride Times\n')
print("\n----------------------\n")
print("\n----------------------\n")
----------------------
----------------------
b1 = pd.DataFrame(df_casual['day'].value_counts(sort=False))
b2 = pd.DataFrame(df_member['day'].value_counts(sort=False))
b = pd.concat([b1, b2], axis=1)
b.columns = ['Casual', 'Member']
b['number'] = [1,2,3,6,5,4,7]
b.set_index('number', inplace=True)
b.sort_index(inplace=True)
b['days'] = ['Monday', 'Tuesday', 'Wednsday','Thursday', 'Friday',
'Saturday', 'Sunday']
b.set_index('days', inplace=True)
# Add y-label
ax.set_ylabel('Ride Count')
ax.set_xlabel('Days of week')
# Add title
ax.set_title('Ride Counts Of Each Day In a Week\n')
plt.show()
There are more casual riders in week ends, but more members rides in week days.
Month analysis
print("Number of rides per Month (General) : ")
print(df_data2['month'].value_counts())
print("\n----------------------\n")
print("\n----------------------\n")
print("Number of rides per Month (Member) : ")
print(df_member['month'].value_counts())
----------------------
----------------------
b1 = df_casual['month'].value_counts()
b2 = df_member['month'].value_counts()
b = pd.concat([b1,b2], axis=1)
# Add y-label
ax.set_ylabel('Ride Count')
ax.set_xlabel('Months Of Year')
# Add title
ax.set_title('Ride Counts Of Each Month\n')
plt.show()
In winter season (january, February, March, November) there are less number of rides by all
riders.
print("\n----------------------\n")
print("\n----------------------\n")
----------------------
----------------------
Number of rides per Hour (Member) :
hour
17 283589
16 244246
18 212062
8 185248
15 174836
7 149993
19 146882
12 142198
14 142120
13 141464
11 124581
9 121868
10 107058
20 101534
6 77031
21 75436
22 52381
23 31850
5 24716
0 18865
1 11045
2 5934
4 5180
3 4064
Name: count, dtype: int64
b1 = df_casual['hour'].value_counts()
b2 = df_member['hour'].value_counts()
b = pd.concat([b1,b2], axis=1)
b.sort_index(inplace=True)
b.columns = ['Casual', 'Member']
# Add y-label
ax.set_ylabel('Ride Count')
ax.set_xlabel('Hours of Day')
# Add title
ax.set_title('Ride Counts Of Each Hour In a Day\n')
plt.show()
Most of riders tend to ride at evening (3-5pm) But in Members, we can see there is a increase in
morning times (7-8am).
print("\n----------------------\n")
print("\n----------------------\n")
----------------------
----------------------
plt.figure(figsize=(10,10))
b = df_data2['rideable_type'].value_counts()
b.plot(kind='pie', autopct='%1.1f%%', colors=['C3','C9','C4'])
plt.legend(loc=1)
plt.title("Bike Types of General Users\n")
# Add y-label
ax.set_ylabel('Ride Count')
ax.set_xlabel('Bike Type')
# Add title
ax.set_title('Ride Counts Of Each Bike Type\n')
plt.show()
print("\n----------------------\n")
print("Top 10 start stations for Casual riders :")
print(df_casual['start_station_name'].value_counts()[:5])
print("\n----------------------\n")
----------------------
----------------------
print("\n----------------------\n")
print("\n----------------------\n")
print("Top 10 end stations for Member riders :")
print(df_member['end_station_name'].value_counts()[:5])
----------------------
----------------------
caual_lst = [
[cx1,cy1],
[cx2,cy2],
[cx3,cy3],
[cx4,cy4],
[cx5,cy5]
]
import folium
locations = folium.map.FeatureGroup()
lables = ['Streeter Dr & Grand Ave', 'DuSable Lake Shore Dr & Monroe
St', 'DuSable Lake Shore Dr & North Blvd', 'Michigan Ave & Oak St',
'Millennium Park']
casual_map.add_child(locations)
<folium.folium.Map at 0x3f084c690>
member_list = [
[mx1,my1],
[mx2,my2],
[mx3,my3],
[mx4,my4],
[mx5,my5]
]
locations = folium.map.FeatureGroup()
member_map.add_child(locations)
<folium.folium.Map at 0x3f073ed90>