Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
31 views54 pages

Analysis Report

The technical report analyzes Cyclistic, a bike-share company in Chicago, focusing on converting casual riders into annual members to enhance revenue and user retention. It examines historical trip data to identify usage patterns and behavioral trends between casual and annual riders, revealing that casual riders have longer ride durations but lower overall membership. The analysis aims to inform a targeted marketing strategy to increase annual memberships and drive long-term profitability.

Uploaded by

syna2210
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views54 pages

Analysis Report

The technical report analyzes Cyclistic, a bike-share company in Chicago, focusing on converting casual riders into annual members to enhance revenue and user retention. It examines historical trip data to identify usage patterns and behavioral trends between casual and annual riders, revealing that casual riders have longer ride durations but lower overall membership. The analysis aims to inform a targeted marketing strategy to increase annual memberships and drive long-term profitability.

Uploaded by

syna2210
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/387498935

Cyclistic Bike-Share Analysis for Converting Casual Riders into Loyal


Members

Technical Report · December 2024

CITATIONS READS

0 19

1 author:

W.M Anushka Sudeera Bandara


University of Sri Jayewardenepura
2 PUBLICATIONS 0 CITATIONS

SEE PROFILE

All content following this page was uploaded by W.M Anushka Sudeera Bandara on 29 December 2024.

The user has requested enhancement of the downloaded file.


Converting Casual Riders into Loyal Members

Cyclistic
Bike-Share Analysis

By Anushka Sudeera Bandara


For Google Data Analytics Capstone Project
Introduction
Cyclistic, a prominent bike-share company in Chicago, has established itself as a leader in urban
mobility with a eet of over 5,800 bicycles and 600 docking stations. The company o ers various
bike options, including traditional bicycles, tricycles, and cargo bikes, making its services
accessible to a wide range of riders, including those with disabilities. Cyclistic’s bikes are used
primarily for leisure, though approximately 30% of its riders use them for daily commutes.

As Cyclistic continues to expand, its marketing strategy has primarily focused on creating general
awareness and appealing to broad consumer segments. However, the company’s leadership
recognizes that future growth lies in converting casual riders, who opt for single-ride or full-day
passes, into annual members. Casual riders are already familiar with Cyclistic’s o erings, and their
conversion to membership presents an opportunity to enhance revenue and user retention. To
support this strategic goal, Cyclistic’s marketing analytics team has been tasked with analyzing
historical trip data to uncover insights about the usage patterns of casual riders versus annual
members. These insights will inform the development of a targeted marketing strategy aimed at
increasing annual memberships.

Problem Statement
The success of Cyclistic’s growth strategy hinges on understanding the di erences and
similarities between casual riders and annual members. Casual riders tend to have longer ride
durations and favor weekend and seasonal use, while annual members exhibit consistent
weekday usage patterns. Despite casual riders accounting for 36.6% of all rides, their longer
average ride times indicate a high potential for revenue if converted into members.

To design an e ective marketing strategy, it is crucial to identify key behavioral trends, usage
preferences, and opportunities to in uence casual riders to adopt annual memberships. By
leveraging Cyclistic’s historical trip data, this analysis aims to answer the following question:

1.How do annual members and casual riders use Cyclistic bikes di erently?

The ultimate goal is to provide actionable recommendations, backed by data, to help Cyclistic
increase its annual membership base and drive long-term pro tability.

Ask :
How do annual members and casual riders use Cyclistic bikes di erently :

The goal is to identify what makes annual members di er from the casual riders and what
similarities do they have. And also how can we use those di erences and similarities to convert
Casula riders to annual memberships.

By these insights, the company will create a marketing plan, which will in uence casual riders
become in to annual members. Which the stakeholders (Specially Manor) assume will increase the
pro ts.
fi
ff
fl
fl
ff
ff
fi
ff
fl
ff
ff
ff
ff
Prepare :
The data has been provided by the Motivate International Inc. We use last 11 moths of data
(from 2024 January to November). All the data are public and access by clicking here.

Each month of data are stored in separate css les and each contains details about rider_id,
bike_type, start and end date and time, start and end station name and id, start and end
longitudes and latitudes, and nally the membership type.

There are no any private data about the users, so we can assume there wont be any privacy
issues. All the data will be downloaded to our work computers from the public source and a
backup of that data will be stored in our system.

There are some data type problems and missing values problems and those will be solved.
Because of the number of rows which has missing values are very low, we will remove those rows
from our data and which won’t be a huge a ect on data.

Process :
Because of the size of data, I couldn’t able to use excel or google sheets. So I used the Python
Pandas library and Jupyter notebook.

First I download all the data and combine to a one le. Before doing anything to it, I create a
backup and exported as a CSV le. Then I started the pre cleaning.

There were 5,682,196 rows of data and 13 columns of features. Here are the initial data types of
those features.

ride_id object
rideable_type object
started_at object
ended_at object
start_station_name object
start_station_id object
end_station_name object
end_station_id object
start_lat float64
start_lng float64
end_lat float64
end_lng float64
member_casual object

First, I change the data type of ‘started_at’ and ‘ended_at’ columns to date_time.
fi
fi
ff
fi
fi
Then I checked for any missing values in the data.

ride_id 0
rideable_type 0
started_at 0
ended_at 0
start_station_name 1044760
start_station_id 1044760
end_station_name 1073877
end_station_id 1073877
start_lat 0
start_lng 0
end_lat 7101
end_lng 7101
member_casual 0

So then, I removed those rows from the dataset.

Next, I checked the duration of each ride and if a one is less than 1 minute or more than 24 hours,
I removed them too as outliers.

Then I checked rideable types.


classic_bike 2657760
electric_bike 1371992
electric_scooter 47827

There were only 3 types and each has signi cant amount of rides. So, I assume there are no any
outliers here.

Finally I created 3 more columns (day of week, hour, month) and lled the values.

Now we have a data set of 4,077,579 columns and 17 Rows. As a precautionary measure I
created a backup of the cleaned dataset and saved as a CSV le.

(All the code for preprocessing can be found in the Notebook le.)
fi
fi
fi
fi
Analyze
First I analyze the ‘member_casual’ feature and I understood there are only 2 types of members
and there are more member riders than the casual riders.

member_casual
member 2584181
casual 1493398

Before doing anything, I created 2 di erent data frames for member riders and casual riders. And
then I ran some basic stat analysis using pandas ‘describe’ function.

After that I started to analyze each feature.


ff
First I analyze the ride length feature. I calculated the Average and Max values of rides for general,
Members and Casuals.

Average length of all rides : 16.824508744494384


Average length of casual rides : 24.223505704998495
Average length of member rides : 12.54862897337042

----------------------

Max length of all rides : 1509.3666666666666


Max length of casual rides : 1509.3666666666666
Max length of member rides : 1497.65

----------------------

Total length of all rides : 68,603,263.54186666


Total length of casual rides : 36,175,334.97283334
Total length of member rides : 32,427,928.569033343

In average we can see a casual rider rides for more time than a member rider. Also casual riders
has ride for more time than members in total even though number of member riders are
higher than the casual riders. So we can see there is a potential of gaining pro ts by converting
those casual riders to members.

Then I analyze the Day of week for each type of members.

Number of rides per day (General) :


Saturday 635,144
Wednesday 616,112
Friday 584,153
Thursday 581,077
Tuesday 561,066
Monday 554,699
Sunday 545,328

----------------------

Number of rides per day (Casual) :


Saturday 313,195
Sunday 261,544
Friday 219,576
Wednesday 184,162
Thursday 179,916
Monday 176,325
Tuesday 158,680

----------------------
fi
Number of rides per day (Member) :
Wednesday 431,950
Tuesday 402,386
Thursday 401,161
Monday 378,374
Friday 364,577
Saturday 321,949
Sunday 283,784

There is a trend that More casual riders rides in week ends and, More member riders rides in
week days. Overall the most favorite day to ride is Saturday.

The I analyzed the Month feature.

Number of rides per Month (General) :

8 541,323
7 540,941
9 536,997
6 494,342
10 449,116
5 442,289
4 297,798
11 245,951
3 230,278
2 184,736
1 113,808

----------------------

Number of rides per Month (Casual) :

7 231,970
8 228,518
9 216,143
6 208,397
5 167,552
10 159,354
4 93,944
11 68,816
3 62,821
2 38,170
1 17,713

----------------------
Number of rides per Month (Member) :
9 320,854
8 312,805
7 308,971
10 289,762
6 285,945
5 274,737
4 203,854
11 177,135
3 167,457
2 146,566
1 96,095

There is a trend which in overall, least amount of riders tend to ride in Winter season
(November, December, January, February). And most amount of riders tend to ride in Autumn
(July, August, September).

Then I analyze the Hour feature.

Number of rides per Number of rides per Number of rides per


Hour (General) : Hour (Casual) : Hour (Member) :

17 428,025 17 144,436 17 283,589


16 380,644 16 136,398 16 244,246
18 331,247 15 120,239 18 212,062
15 295,075 18 119,185 8 185,248
14 252,545 14 110,425 15 174,836
13 248,141 13 106,677 7 149,993
12 245,128 12 102,930 19 146,882
8 238,158 11 87,985 12 142,198
19 232,654 19 85,772 14 142,120
11 212,566 10 69,059 13 141,464
7 187,669 20 61,819 11 124,581
10 176,117 9 54,124 9 121,868
9 175,992 8 52,910 10 107,058
20 163,353 21 49,825 20 101,534
21 125,261 22 43,573 6 77,031
6 96,560 7 37,676 21 75,436
22 95,954 23 30,065 22 52,381
23 61,915 0 21,744 23 31,850
0 40,609 6 19,529 5 24,716
5 32,780 1 14,379 0 18,865
1 25,424 2 8,442 1 11,045
2 14,376 5 8,064 2 5,934
4 8,754 3 4,568 4 5,180
3 8,632 4 3,574 3 4,064
Here we can see The most popular hour in riders in 5-6 p.m period. Most casual riders prefer
noon or evening for a ride. But in member riders we can see a some what high demand in 7-8
a.m period too.

Then I analyze the Rideable type feature.

Number of rides pre bike type (General) :

classic_bike 2,657,760
electric_bike 1,371,992
electric_scooter 47,827

----------------------

Number of rides pre bike type (Casual) :

classic_bike 955,850
electric_bike 511,808
electric_scooter 25,740

----------------------

Number of rides pre bike type (Member) :

classic_bike 1701,910
electric_bike 860,184
electric_scooter 22,087

We can see most favorite bike type is the Classic bike.


Then I analyze and calculated the top 5 start and end stations among riders.

Top 10 start stations for all riders : Top 10 end stations for all riders :

Streeter Dr & Grand Ave 61,666 Streeter Dr & Grand Ave 63193
DuSable Lake Shore Dr & Monroe St 40,971 DuSable Lake Shore Dr & North Blvd 39997
DuSable Lake Shore Dr & North Blvd 36,427 DuSable Lake Shore Dr & Monroe St 39754
Michigan Ave & Oak St 35,923 Michigan Ave & Oak St 36164
Kingsbury St & Kinzie St 34,164 Kingsbury St & Kinzie St 33754

---------------------- ----------------------

Top 10 start stations for Casual riders : Top 10 end stations for Casual riders :

Streeter Dr & Grand Ave 47,916 Streeter Dr & Grand Ave 51945
DuSable Lake Shore Dr & Monroe St 31,782 DuSable Lake Shore Dr & Monroe St 29778
Michigan Ave & Oak St 23,156 DuSable Lake Shore Dr & North Blvd 25037
DuSable Lake Shore Dr & North Blvd 21,274 Michigan Ave & Oak St 24035
Millennium Park 20,574 Millennium Park 22548

---------------------- ----------------------

Top 10 start stations for Member riders : Top 10 end stations for Member riders :

Kingsbury St & Kinzie St 25,362 Kingsbury St & Kinzie St 25486


Clinton St & Washington Blvd 23,882 Clinton St & Washington Blvd 24214
Clinton St & Madison St 21,545 Clinton St & Madison St 22200
Clark St & Elm St 21,412 Clark St & Elm St 21265
Clinton St & Jackson Blvd 17,648 Clinton St & Jackson Blvd 17556

As we can see the most visited statins among casual riders are ‘Streeter Dr & Grand Ave’ and
‘DuSable Lake Shore Dr & Monroe St’. But among member riders ‘Kingsbury St & Kinzie St’ and
‘Clinton St & Washington Blvd’ stations are the most visited.

Share :

Here we can see most of the riders who


use our service are already members. 36.6
precent of riders are not registered as
members.
But according the data casual riders has
the longest rides in total. There is 5.4
margin.

Also we can see that the average time of a casual rider is much
higher than a member rider.
Here we can see in week days the number of member riders are
much higher comparing to the week ends.

But in week ends at the number of casual riders are much higher.

Overall number of riders are decreased in winter season. In autumn


the number of rides were increased. But the variation of Casual
riders are much higher than the member riders.
We can see a hype in casual riders at noon and evening. There is a
hype of member rides at that same time. But there is a another spike
in member riders in morning too (7,8 am).

Here we can see most of our rides uses


Classic bikes and there are very small
amount of electric scooters has been used
too.
Here we can see how casual and member riders prefer di erent bike
types.

These are the most popular stations among


Casual Riders :

Streeter Dr & Grand Ave 47,916


DuSable Lake Shore Dr & Monroe St 31,782
Michigan Ave & Oak St 23,156
DuSable Lake Shore Dr & North Blvd 21,274
Millennium Park 20,574
ff
These are the most popular stations among
Member Riders :

Kingsbury St & Kinzie St 25,362


Clinton St & Washington Blvd 23,882
Clinton St & Madison St 21,545
Clark St & Elm St 21,412
Clinton St & Jackson Blvd 17,648

As a summery,

Casual riders tend to ride on week ends in autumn, at noon and evening. They takes longer rides
in average. And they mostly prefer classical bikes.

Member riders who are the majority of users are tend to ride on week days at morning and
evening. The month of the year a ect the member riders but do not a ect as casual riders. They
also like classical bikes too.

The most visited station among casual riders are ‘Streeter Dr & Grand Ave’ and ‘DuSable Lake
Shore Dr & Monroe St’. But among member riders ‘Kingsbury St & Kinzie St’ and ‘Clinton St &
Washington Blvd’ stations are the most visited.
ff
ff
Act :
After analyzing all the information, here are my top 3 recommendations :

1. Advertise the bene ts of getting memberships at the most visited stations of casual riders like
‘Kingsbury St & Kinzie St’ and ‘Clinton St & Washington Blvd’.

2. Organize campaigns at week ends to inform about the membership program. May be give
some discounts to early adopters.

3. Show casual riders that how members use the service in high tra c time periods like morning
to make it simple their life.
fi
ffi
import pandas as pd
import numpy as np
import matplotlib as plt

%matplotlib inline

import datetime as dt

import matplotlib.pyplot as plt

plt.rcParams.update({'font.size': 20})

Importing all the data


dfJan = pd.read_csv('202401-divvy-tripdata.csv')
dfFeb = pd.read_csv('202402-divvy-tripdata.csv')
dfMar = pd.read_csv('202403-divvy-tripdata.csv')
dfApr = pd.read_csv('202404-divvy-tripdata.csv')
dfMay = pd.read_csv('202405-divvy-tripdata.csv')
dfJun = pd.read_csv('202406-divvy-tripdata.csv')
dfJul = pd.read_csv('202407-divvy-tripdata.csv')
dfAug = pd.read_csv('202408-divvy-tripdata.csv')
dfSep = pd.read_csv('202409-divvy-tripdata.csv')
dfOct = pd.read_csv('202410-divvy-tripdata.csv')
dfNov = pd.read_csv('202411-divvy-tripdata.csv')

#Combining all the csv files to gether.


df_data = pd.concat([dfJan, dfFeb, dfMar, dfApr, dfMay, dfJun, dfJul,
dfAug, dfSep, dfOct,dfNov], ignore_index=True)
df_data.head()

ride_id rideable_type started_at


ended_at \
0 C1D650626C8C899A electric_bike 2024-01-12 15:30:27 2024-01-12
15:37:59
1 EECD38BDB25BFCB0 electric_bike 2024-01-08 15:45:46 2024-01-08
15:52:59
2 F4A9CE78061F17F7 electric_bike 2024-01-27 12:27:19 2024-01-27
12:35:19
3 0A0D9E15EE50B171 classic_bike 2024-01-29 16:26:17 2024-01-29
16:56:06
4 33FFC9805E3EFF9A classic_bike 2024-01-31 05:43:23 2024-01-31
06:09:35

start_station_name start_station_id
end_station_name \
0 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
1 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
2 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
3 Wells St & Randolph St TA1305000030 Larrabee St & Webster
Ave
4 Lincoln Ave & Waveland Ave 13253 Kingsbury St & Kinzie
St

end_station_id start_lat start_lng end_lat end_lng


member_casual
0 KA1503000043 41.903267 -87.634737 41.889177 -87.638506
member
1 KA1503000043 41.902937 -87.634440 41.889177 -87.638506
member
2 KA1503000043 41.902951 -87.634470 41.889177 -87.638506
member
3 13193 41.884295 -87.633963 41.921822 -87.644140
member
4 KA1503000043 41.948797 -87.675278 41.889177 -87.638506
member

#Creating a backup of joined files.


df_data.to_csv('2024_Jan-Nov_tripdata.csv')

print("the data has been stored.")

the data has been stored.

df_data.shape

(5682196, 13)

Adding date time data type


df_data.dtypes

ride_id object
rideable_type object
started_at object
ended_at object
start_station_name object
start_station_id object
end_station_name object
end_station_id object
start_lat float64
start_lng float64
end_lat float64
end_lng float64
member_casual object
dtype: object
df_data['started_at'] = pd.to_datetime(df_data['started_at'],
format='mixed')
df_data['ended_at'] = pd.to_datetime(df_data['ended_at'],
format='mixed')

df_data.head()

ride_id rideable_type started_at


ended_at \
0 C1D650626C8C899A electric_bike 2024-01-12 15:30:27 2024-01-12
15:37:59
1 EECD38BDB25BFCB0 electric_bike 2024-01-08 15:45:46 2024-01-08
15:52:59
2 F4A9CE78061F17F7 electric_bike 2024-01-27 12:27:19 2024-01-27
12:35:19
3 0A0D9E15EE50B171 classic_bike 2024-01-29 16:26:17 2024-01-29
16:56:06
4 33FFC9805E3EFF9A classic_bike 2024-01-31 05:43:23 2024-01-31
06:09:35

start_station_name start_station_id
end_station_name \
0 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
1 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
2 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
3 Wells St & Randolph St TA1305000030 Larrabee St & Webster
Ave
4 Lincoln Ave & Waveland Ave 13253 Kingsbury St & Kinzie
St

end_station_id start_lat start_lng end_lat end_lng


member_casual
0 KA1503000043 41.903267 -87.634737 41.889177 -87.638506
member
1 KA1503000043 41.902937 -87.634440 41.889177 -87.638506
member
2 KA1503000043 41.902951 -87.634470 41.889177 -87.638506
member
3 13193 41.884295 -87.633963 41.921822 -87.644140
member
4 KA1503000043 41.948797 -87.675278 41.889177 -87.638506
member

df_data.dtypes

ride_id object
rideable_type object
started_at datetime64[ns]
ended_at datetime64[ns]
start_station_name object
start_station_id object
end_station_name object
end_station_id object
start_lat float64
start_lng float64
end_lat float64
end_lng float64
member_casual object
dtype: object

Remove any missing values.


df_data.isnull().sum()

ride_id 0
rideable_type 0
started_at 0
ended_at 0
start_station_name 1044760
start_station_id 1044760
end_station_name 1073877
end_station_id 1073877
start_lat 0
start_lng 0
end_lat 7101
end_lng 7101
member_casual 0
dtype: int64

#Remove raws with null values


df_data.dropna(inplace=True)
df_data.shape

(4077579, 13)

df_data.isnull().sum()

ride_id 0
rideable_type 0
started_at 0
ended_at 0
start_station_name 0
start_station_id 0
end_station_name 0
end_station_id 0
start_lat 0
start_lng 0
end_lat 0
end_lng 0
member_casual 0
dtype: int64

Check rideable types


print(df_data['rideable_type'].unique())

['electric_bike' 'classic_bike' 'electric_scooter']

df_data['rideable_type'].value_counts()

rideable_type
classic_bike 2657760
electric_bike 1371992
electric_scooter 47827
Name: count, dtype: int64

Remove any trip which is less than 1 minitue or more than


24 hours.
#Creating the ride length column.
df_data['ride_length'] = (df_data['ended_at'] -
df_data['started_at'])/ pd.Timedelta(minutes=1)
df_data.head()

ride_id rideable_type started_at


ended_at \
0 C1D650626C8C899A electric_bike 2024-01-12 15:30:27 2024-01-12
15:37:59
1 EECD38BDB25BFCB0 electric_bike 2024-01-08 15:45:46 2024-01-08
15:52:59
2 F4A9CE78061F17F7 electric_bike 2024-01-27 12:27:19 2024-01-27
12:35:19
3 0A0D9E15EE50B171 classic_bike 2024-01-29 16:26:17 2024-01-29
16:56:06
4 33FFC9805E3EFF9A classic_bike 2024-01-31 05:43:23 2024-01-31
06:09:35

start_station_name start_station_id
end_station_name \
0 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
1 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
2 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
3 Wells St & Randolph St TA1305000030 Larrabee St & Webster
Ave
4 Lincoln Ave & Waveland Ave 13253 Kingsbury St & Kinzie
St

end_station_id start_lat start_lng end_lat end_lng


member_casual \
0 KA1503000043 41.903267 -87.634737 41.889177 -87.638506
member
1 KA1503000043 41.902937 -87.634440 41.889177 -87.638506
member
2 KA1503000043 41.902951 -87.634470 41.889177 -87.638506
member
3 13193 41.884295 -87.633963 41.921822 -87.644140
member
4 KA1503000043 41.948797 -87.675278 41.889177 -87.638506
member

ride_length
0 7.533333
1 7.216667
2 8.000000
3 29.816667
4 26.200000

#Size of the dataset which has less than 1 moniute or more than 24
hours of ride length.
df_data[(df_data['ride_length'] < 1) | (df_data['ride_length'] >
(60*24))].shape

(39146, 14)

#The dataset which has less than 1 moniute or more than 24 hours of
ride length.
df_data[(df_data['ride_length'] < 1) | (df_data['ride_length'] >
(60*24))].head(5)

ride_id rideable_type started_at


ended_at \
79 C24AD33E4203FE44 classic_bike 2024-01-21 09:01:01 2024-01-22
09:33:24
812 61A9D9377D5839AB electric_bike 2024-01-01 15:50:04 2024-01-01
15:51:02
2108 D58F22AA9EBA2636 electric_bike 2024-01-07 21:15:20 2024-01-07
21:16:01
2109 0071753021E244ED electric_bike 2024-01-26 15:38:23 2024-01-26
15:39:04
2647 AC8ED872B83346CF electric_bike 2024-01-08 04:11:57 2024-01-08
04:12:32

start_station_name start_station_id \
79 Lincoln Ave & Waveland Ave 13253
812 Canal St & Madison St 13341
2108 Bissell St & Armitage Ave* chargingstx1
2109 Bissell St & Armitage Ave* chargingstx1
2647 Clark St & Elm St TA1307000039

end_station_name end_station_id start_lat


start_lng \
79 Lincoln Ave & Roscoe St* chargingstx5 41.948797 -
87.675278
812 Canal St & Monroe St 13056 41.882078 -
87.640069
2108 W Armitage Ave & N Sheffield Ave 20254.0 41.918442 -
87.652293
2109 W Armitage Ave & N Sheffield Ave 20254.0 41.918446 -
87.652090
2647 N Clark St & W Elm St 20249.0 41.902701 -
87.631433

end_lat end_lng member_casual ride_length


79 41.943350 -87.670668 member 1472.383333
812 41.881690 -87.639530 member 0.966667
2108 41.917805 -87.653449 member 0.683333
2109 41.917805 -87.653449 member 0.683333
2647 41.902901 -87.631282 member 0.583333

#After removing outliers form the data set


df_data2 = df_data[(df_data['ride_length'] >= 1) |
(df_data['ride_length'] <= (60*24))]
df_data2.shape

(4077579, 14)

df_data2.head()

ride_id rideable_type started_at


ended_at \
0 C1D650626C8C899A electric_bike 2024-01-12 15:30:27 2024-01-12
15:37:59
1 EECD38BDB25BFCB0 electric_bike 2024-01-08 15:45:46 2024-01-08
15:52:59
2 F4A9CE78061F17F7 electric_bike 2024-01-27 12:27:19 2024-01-27
12:35:19
3 0A0D9E15EE50B171 classic_bike 2024-01-29 16:26:17 2024-01-29
16:56:06
4 33FFC9805E3EFF9A classic_bike 2024-01-31 05:43:23 2024-01-31
06:09:35

start_station_name start_station_id
end_station_name \
0 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
1 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
2 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
3 Wells St & Randolph St TA1305000030 Larrabee St & Webster
Ave
4 Lincoln Ave & Waveland Ave 13253 Kingsbury St & Kinzie
St

end_station_id start_lat start_lng end_lat end_lng


member_casual \
0 KA1503000043 41.903267 -87.634737 41.889177 -87.638506
member
1 KA1503000043 41.902937 -87.634440 41.889177 -87.638506
member
2 KA1503000043 41.902951 -87.634470 41.889177 -87.638506
member
3 13193 41.884295 -87.633963 41.921822 -87.644140
member
4 KA1503000043 41.948797 -87.675278 41.889177 -87.638506
member

ride_length
0 7.533333
1 7.216667
2 8.000000
3 29.816667
4 26.200000

Create Week day and hour of each trip


#Creating the week of the day column
df_data2['day'] = df_data2['started_at'].dt.day_name()
df_data2.head()

ride_id rideable_type started_at


ended_at \
0 C1D650626C8C899A electric_bike 2024-01-12 15:30:27 2024-01-12
15:37:59
1 EECD38BDB25BFCB0 electric_bike 2024-01-08 15:45:46 2024-01-08
15:52:59
2 F4A9CE78061F17F7 electric_bike 2024-01-27 12:27:19 2024-01-27
12:35:19
3 0A0D9E15EE50B171 classic_bike 2024-01-29 16:26:17 2024-01-29
16:56:06
4 33FFC9805E3EFF9A classic_bike 2024-01-31 05:43:23 2024-01-31
06:09:35
start_station_name start_station_id
end_station_name \
0 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
1 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
2 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
3 Wells St & Randolph St TA1305000030 Larrabee St & Webster
Ave
4 Lincoln Ave & Waveland Ave 13253 Kingsbury St & Kinzie
St

end_station_id start_lat start_lng end_lat end_lng


member_casual \
0 KA1503000043 41.903267 -87.634737 41.889177 -87.638506
member
1 KA1503000043 41.902937 -87.634440 41.889177 -87.638506
member
2 KA1503000043 41.902951 -87.634470 41.889177 -87.638506
member
3 13193 41.884295 -87.633963 41.921822 -87.644140
member
4 KA1503000043 41.948797 -87.675278 41.889177 -87.638506
member

ride_length day
0 7.533333 Friday
1 7.216667 Monday
2 8.000000 Saturday
3 29.816667 Monday
4 26.200000 Wednesday

#Creating the month of year column


df_data2['month'] = df_data2['started_at'].dt.month
df_data2.head()

ride_id rideable_type started_at


ended_at \
0 C1D650626C8C899A electric_bike 2024-01-12 15:30:27 2024-01-12
15:37:59
1 EECD38BDB25BFCB0 electric_bike 2024-01-08 15:45:46 2024-01-08
15:52:59
2 F4A9CE78061F17F7 electric_bike 2024-01-27 12:27:19 2024-01-27
12:35:19
3 0A0D9E15EE50B171 classic_bike 2024-01-29 16:26:17 2024-01-29
16:56:06
4 33FFC9805E3EFF9A classic_bike 2024-01-31 05:43:23 2024-01-31
06:09:35
start_station_name start_station_id
end_station_name \
0 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
1 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
2 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
3 Wells St & Randolph St TA1305000030 Larrabee St & Webster
Ave
4 Lincoln Ave & Waveland Ave 13253 Kingsbury St & Kinzie
St

end_station_id start_lat start_lng end_lat end_lng


member_casual \
0 KA1503000043 41.903267 -87.634737 41.889177 -87.638506
member
1 KA1503000043 41.902937 -87.634440 41.889177 -87.638506
member
2 KA1503000043 41.902951 -87.634470 41.889177 -87.638506
member
3 13193 41.884295 -87.633963 41.921822 -87.644140
member
4 KA1503000043 41.948797 -87.675278 41.889177 -87.638506
member

ride_length day month


0 7.533333 Friday 1
1 7.216667 Monday 1
2 8.000000 Saturday 1
3 29.816667 Monday 1
4 26.200000 Wednesday 1

#Creating hour of the day colummn


df_data2['hour'] = df_data2['started_at'].dt.hour
df_data2.head()

ride_id rideable_type started_at


ended_at \
0 C1D650626C8C899A electric_bike 2024-01-12 15:30:27 2024-01-12
15:37:59
1 EECD38BDB25BFCB0 electric_bike 2024-01-08 15:45:46 2024-01-08
15:52:59
2 F4A9CE78061F17F7 electric_bike 2024-01-27 12:27:19 2024-01-27
12:35:19
3 0A0D9E15EE50B171 classic_bike 2024-01-29 16:26:17 2024-01-29
16:56:06
4 33FFC9805E3EFF9A classic_bike 2024-01-31 05:43:23 2024-01-31
06:09:35
start_station_name start_station_id
end_station_name \
0 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
1 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
2 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
3 Wells St & Randolph St TA1305000030 Larrabee St & Webster
Ave
4 Lincoln Ave & Waveland Ave 13253 Kingsbury St & Kinzie
St

end_station_id start_lat start_lng end_lat end_lng


member_casual \
0 KA1503000043 41.903267 -87.634737 41.889177 -87.638506
member
1 KA1503000043 41.902937 -87.634440 41.889177 -87.638506
member
2 KA1503000043 41.902951 -87.634470 41.889177 -87.638506
member
3 13193 41.884295 -87.633963 41.921822 -87.644140
member
4 KA1503000043 41.948797 -87.675278 41.889177 -87.638506
member

ride_length day month hour


0 7.533333 Friday 1 15
1 7.216667 Monday 1 15
2 8.000000 Saturday 1 12
3 29.816667 Monday 1 16
4 26.200000 Wednesday 1 5

Save the cleaned dataframe


df_data2.shape

(4077579, 17)

#Creating a backup of cleaned and pre processed data


df_data2.to_csv('cleaned_data.csv')

Analysis
Analyze the member_casual column
df_data2['member_casual'].unique()
array(['member', 'casual'], dtype=object)

member_types = pd.DataFrame(df_data2['member_casual'].value_counts())
member_types

count
member_casual
member 2584181
casual 1493398

import pandas as pd
import matplotlib.pyplot as plt

# Assuming you have your DataFrame 'df_data2'

# Get member and casual rider counts


member_count = df_data2['member_casual'].value_counts().get('member',
0)
casual_count = df_data2['member_casual'].value_counts().get('casual',
0)

# Create a list of colors for the pie chart slices


colors = ['C1', 'C0'] # You can customize these colors

# Create the pie chart


plt.figure(figsize=(10, 10))
plt.pie([member_count, casual_count], labels=['Member Riders', 'Casual
Riders'], autopct='%1.1f%%', startangle=140, colors=colors)
plt.title("Member Riders vs Casual Riders")
plt.legend()
plt.show()
There are more members than the casual riders.

df_casual = df_data2[df_data2['member_casual'] == 'casual']


df_casual.head()

ride_id rideable_type started_at


ended_at \
34 6EAAD9E1649F7CA0 classic_bike 2024-01-29 19:38:44 2024-01-29
20:02:18
35 147CF5271DCCE46E classic_bike 2024-01-30 11:39:20 2024-01-30
11:59:57
36 434F9696B1DB3EBD classic_bike 2024-01-10 16:40:05 2024-01-10
17:04:00
93 FB53908A4480763D classic_bike 2024-01-20 19:31:20 2024-01-20
19:46:05
98 04D460EFDFA6CFCE classic_bike 2024-01-19 15:44:55 2024-01-19
16:01:38

start_station_name start_station_id
end_station_name \
34 Stockton Dr & Wrightwood Ave 13276 Kingsbury St &
Kinzie St
35 Stockton Dr & Wrightwood Ave 13276 Kingsbury St &
Kinzie St
36 Stockton Dr & Wrightwood Ave 13276 Kingsbury St &
Kinzie St
93 Clark St & Chicago Ave 13303 Ogden Ave &
Race Ave
98 Indiana Ave & 26th St TA1307000005 Halsted St &
18th St

end_station_id start_lat start_lng end_lat end_lng


member_casual \
34 KA1503000043 41.931320 -87.638742 41.889177 -87.638506
casual
35 KA1503000043 41.931320 -87.638742 41.889177 -87.638506
casual
36 KA1503000043 41.931320 -87.638742 41.889177 -87.638506
casual
93 13194 41.896750 -87.630890 41.891795 -87.658751
casual
98 13099 41.845687 -87.622481 41.857506 -87.645991
casual

ride_length day month hour


34 23.566667 Monday 1 19
35 20.616667 Tuesday 1 11
36 23.916667 Wednesday 1 16
93 14.750000 Saturday 1 19
98 16.716667 Friday 1 15

df_member = df_data2[df_data2['member_casual'] == 'member']


df_member.head()

ride_id rideable_type started_at


ended_at \
0 C1D650626C8C899A electric_bike 2024-01-12 15:30:27 2024-01-12
15:37:59
1 EECD38BDB25BFCB0 electric_bike 2024-01-08 15:45:46 2024-01-08
15:52:59
2 F4A9CE78061F17F7 electric_bike 2024-01-27 12:27:19 2024-01-27
12:35:19
3 0A0D9E15EE50B171 classic_bike 2024-01-29 16:26:17 2024-01-29
16:56:06
4 33FFC9805E3EFF9A classic_bike 2024-01-31 05:43:23 2024-01-31
06:09:35

start_station_name start_station_id
end_station_name \
0 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
1 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
2 Wells St & Elm St KA1504000135 Kingsbury St & Kinzie
St
3 Wells St & Randolph St TA1305000030 Larrabee St & Webster
Ave
4 Lincoln Ave & Waveland Ave 13253 Kingsbury St & Kinzie
St

end_station_id start_lat start_lng end_lat end_lng


member_casual \
0 KA1503000043 41.903267 -87.634737 41.889177 -87.638506
member
1 KA1503000043 41.902937 -87.634440 41.889177 -87.638506
member
2 KA1503000043 41.902951 -87.634470 41.889177 -87.638506
member
3 13193 41.884295 -87.633963 41.921822 -87.644140
member
4 KA1503000043 41.948797 -87.675278 41.889177 -87.638506
member

ride_length day month hour


0 7.533333 Friday 1 15
1 7.216667 Monday 1 15
2 8.000000 Saturday 1 12
3 29.816667 Monday 1 16
4 26.200000 Wednesday 1 5

Basic statistics
df_data2.describe()

started_at ended_at \
count 4077579 4077579
mean 2024-07-10 09:44:49.608702208 2024-07-10 10:01:39.079223808
min 2024-01-01 00:01:01 2024-01-01 00:07:01
25% 2024-05-16 01:13:34.500000 2024-05-16 01:40:53
50% 2024-07-17 17:06:27.833999872 2024-07-17 17:21:42.572999936
75% 2024-09-12 15:30:57.483500032 2024-09-12 15:45:41.916999936
max 2024-11-30 23:50:53.449000 2024-11-30 23:57:43.002000
std NaN NaN

start_lat start_lng end_lat end_lng


ride_length \
count 4.077579e+06 4.077579e+06 4.077579e+06 4.077579e+06
4.077579e+06
mean 4.189925e+01 -8.764386e+01 4.189977e+01 -8.764416e+01
1.682451e+01
min 4.164850e+01 -8.785797e+01 4.164850e+01 -8.784396e+01 -
5.542570e+01
25% 4.188032e+01 -8.765641e+01 4.188042e+01 -8.765694e+01
5.883333e+00
50% 4.189465e+01 -8.764079e+01 4.189472e+01 -8.764098e+01
1.023592e+01
75% 4.192549e+01 -8.762680e+01 4.192560e+01 -8.762680e+01
1.841549e+01
max 4.206487e+01 -8.752748e+01 4.206485e+01 -8.752823e+01
1.509367e+03
std 4.329797e-02 2.587108e-02 4.347192e-02 2.597788e-02
3.580514e+01

month hour
count 4.077579e+06 4.077579e+06
mean 6.790688e+00 1.399652e+01
min 1.000000e+00 0.000000e+00
25% 5.000000e+00 1.100000e+01
50% 7.000000e+00 1.500000e+01
75% 9.000000e+00 1.700000e+01
max 1.100000e+01 2.300000e+01
std 2.612567e+00 4.780118e+00

df_casual.describe()

started_at ended_at \
count 1493398 1493398
mean 2024-07-17 13:12:33.980971520 2024-07-17 13:36:47.391314944
min 2024-01-01 00:02:15 2024-01-01 00:07:01
25% 2024-05-31 08:14:38.500000 2024-05-31 08:31:13
50% 2024-07-21 18:53:38.862999808 2024-07-21 19:21:07.451500032
75% 2024-09-09 22:38:52.508750080 2024-09-09 22:58:43.360000
max 2024-11-30 23:47:31.938000 2024-11-30 23:56:38.164000
std NaN NaN

start_lat start_lng end_lat end_lng


ride_length \
count 1.493398e+06 1.493398e+06 1.493398e+06 1.493398e+06
1.493398e+06
mean 4.189967e+01 -8.764150e+01 4.190038e+01 -8.764193e+01
2.422351e+01
min 4.164850e+01 -8.784405e+01 4.164850e+01 -8.784396e+01 -
5.478065e+01
25% 4.188096e+01 -8.765424e+01 4.188096e+01 -8.765453e+01
7.583333e+00
50% 4.189399e+01 -8.763628e+01 4.189467e+01 -8.763639e+01
1.353333e+01
75% 4.192416e+01 -8.762408e+01 4.192533e+01 -8.762408e+01
2.586447e+01
max 4.206485e+01 -8.752823e+01 4.206485e+01 -8.752823e+01
1.509367e+03
std 4.266687e-02 2.699502e-02 4.277049e-02 2.723981e-02
5.034885e+01

month hour
count 1.493398e+06 1.493398e+06
mean 7.027042e+00 1.430723e+01
min 1.000000e+00 0.000000e+00
25% 5.000000e+00 1.100000e+01
50% 7.000000e+00 1.500000e+01
75% 9.000000e+00 1.800000e+01
max 1.100000e+01 2.300000e+01
std 2.307024e+00 4.771942e+00

df_member.describe()

started_at ended_at \
count 2584181 2584181
mean 2024-07-06 06:39:32.575712256 2024-07-06 06:52:05.493449472
min 2024-01-01 00:01:01 2024-01-01 00:12:38
25% 2024-05-04 16:11:04 2024-05-04 16:25:07
50% 2024-07-13 15:03:47.819000064 2024-07-13 15:18:56.516999936
75% 2024-09-13 19:07:43.577999872 2024-09-13 19:19:42.275000064
max 2024-11-30 23:50:53.449000 2024-11-30 23:57:43.002000
std NaN NaN

start_lat start_lng end_lat end_lng


ride_length \
count 2.584181e+06 2.584181e+06 2.584181e+06 2.584181e+06
2.584181e+06
mean 4.189901e+01 -8.764523e+01 4.189942e+01 -8.764545e+01
1.254863e+01
min 4.164850e+01 -8.785797e+01 4.164850e+01 -8.784396e+01 -
5.542570e+01
25% 4.187936e+01 -8.765842e+01 4.187943e+01 -8.765862e+01
5.250000e+00
50% 4.189472e+01 -8.764275e+01 4.189488e+01 -8.764288e+01
8.883333e+00
75% 4.192560e+01 -8.762963e+01 4.192586e+01 -8.762963e+01
1.513830e+01
max 4.206487e+01 -8.752748e+01 4.206485e+01 -8.752823e+01
1.497650e+03
std 4.365672e-02 2.509752e-02 4.386831e-02 2.512959e-02
2.253841e+01

month hour
count 2.584181e+06 2.584181e+06
mean 6.654099e+00 1.381696e+01
min 1.000000e+00 0.000000e+00
25% 5.000000e+00 1.000000e+01
50% 7.000000e+00 1.500000e+01
75% 9.000000e+00 1.700000e+01
max 1.100000e+01 2.300000e+01
std 2.764641e+00 4.775630e+00

Ride length analysis


mean_len_all = df_data2['ride_length'].mean()
mean_len_casual = df_casual['ride_length'].mean()
mean_len_member = df_member['ride_length'].mean()

max_len_all = df_data2['ride_length'].max()
max_len_casual = df_casual['ride_length'].max()
max_len_member = df_member['ride_length'].max()

sum_len_all = df_data2['ride_length'].sum()
sum_len_casual = df_casual['ride_length'].sum()
sum_len_member = df_member['ride_length'].sum()

print("Average length of all rides : ", mean_len_all)


print("Average length of casual rides : ", mean_len_casual)
print("Average length of member rides : ", mean_len_member)

print("\n----------------------\n")

print("Max length of all rides : ", max_len_all)


print("Max length of casual rides : ", max_len_casual)
print("Max length of member rides : ", max_len_member)

print("\n----------------------\n")

print("Total length of all rides : ", sum_len_all)


print("Total length of casual rides : ", sum_len_casual)
print("Total length of member rides : ", sum_len_member)

Average length of all rides : 16.824508744494384


Average length of casual rides : 24.223505704998495
Average length of member rides : 12.54862897337042

----------------------
Max length of all rides : 1509.3666666666666
Max length of casual rides : 1509.3666666666666
Max length of member rides : 1497.65

----------------------

Total length of all rides : 68603263.54186666


Total length of casual rides : 36175334.97283334
Total length of member rides : 32427928.569033343

arr = np.array([sum_len_casual, sum_len_member])


lables = ['Casual', 'Member']

plt.figure(figsize=(10,10),)
plt.pie(arr, labels=lables, autopct='%1.1f%%')
plt.title("Total Ride Lengths\n")
plt.legend()
plt.show()
def addlabels(x,y):
for i in range(len(x)):
plt.text(i,y[i],round(y[i], 3))

x = ['Casual', 'Member']
y = [mean_len_casual, mean_len_member]

plt.figure(figsize=(15,10),)
bars = plt.bar(x, y)
addlabels(x, y)
bars[1].set_color('C1')
plt.xlabel('Membership types')
plt.ylabel('Average ride length (minutes)')
plt.title('Average Ride Times\n')

Text(0.5, 1.0, 'Average Ride Times\n')

Day of Week analysis


print("Number of rides per day (General) : ")
print(df_data2['day'].value_counts())

print("\n----------------------\n")

print("Number of rides per day (Casual) : ")


print(df_casual['day'].value_counts())

print("\n----------------------\n")

print("Number of rides per day (Member) : ")


print(df_member['day'].value_counts())
Number of rides per day (General) :
day
Saturday 635144
Wednesday 616112
Friday 584153
Thursday 581077
Tuesday 561066
Monday 554699
Sunday 545328
Name: count, dtype: int64

----------------------

Number of rides per day (Casual) :


day
Saturday 313195
Sunday 261544
Friday 219576
Wednesday 184162
Thursday 179916
Monday 176325
Tuesday 158680
Name: count, dtype: int64

----------------------

Number of rides per day (Member) :


day
Wednesday 431950
Tuesday 402386
Thursday 401161
Monday 378374
Friday 364577
Saturday 321949
Sunday 283784
Name: count, dtype: int64

b1 = pd.DataFrame(df_casual['day'].value_counts(sort=False))
b2 = pd.DataFrame(df_member['day'].value_counts(sort=False))
b = pd.concat([b1, b2], axis=1)
b.columns = ['Casual', 'Member']

b['number'] = [1,2,3,6,5,4,7]
b.set_index('number', inplace=True)
b.sort_index(inplace=True)
b['days'] = ['Monday', 'Tuesday', 'Wednsday','Thursday', 'Friday',
'Saturday', 'Sunday']
b.set_index('days', inplace=True)

# Create the figure with specified size


fig, ax = plt.subplots(figsize=(20, 10))

# Plot the bar chart on the created axes


b.plot.bar(ax=ax)

# Add y-label
ax.set_ylabel('Ride Count')
ax.set_xlabel('Days of week')

# Add title
ax.set_title('Ride Counts Of Each Day In a Week\n')

plt.show()

There are more casual riders in week ends, but more members rides in week days.

Month analysis
print("Number of rides per Month (General) : ")
print(df_data2['month'].value_counts())

print("\n----------------------\n")

print("Number of rides per Month (Casual) : ")


print(df_casual['month'].value_counts())

print("\n----------------------\n")
print("Number of rides per Month (Member) : ")
print(df_member['month'].value_counts())

Number of rides per Month (General) :


month
8 541323
7 540941
9 536997
6 494342
10 449116
5 442289
4 297798
11 245951
3 230278
2 184736
1 113808
Name: count, dtype: int64

----------------------

Number of rides per Month (Casual) :


month
7 231970
8 228518
9 216143
6 208397
5 167552
10 159354
4 93944
11 68816
3 62821
2 38170
1 17713
Name: count, dtype: int64

----------------------

Number of rides per Month (Member) :


month
9 320854
8 312805
7 308971
10 289762
6 285945
5 274737
4 203854
11 177135
3 167457
2 146566
1 96095
Name: count, dtype: int64

b1 = df_casual['month'].value_counts()
b2 = df_member['month'].value_counts()
b = pd.concat([b1,b2], axis=1)

b.columns = ['Casual', 'Member']


b.sort_index(inplace=True)

# Create the figure with specified size


fig, ax = plt.subplots(figsize=(20, 10))

# Plot the bar chart on the created axes


b.plot.bar(ax=ax)

# Add y-label
ax.set_ylabel('Ride Count')
ax.set_xlabel('Months Of Year')

# Add title
ax.set_title('Ride Counts Of Each Month\n')

# Get the current x-tick positions


x_ticks = ax.get_xticks()

# Define new x-tick labels (replace with your desired labels)


new_x_tick_labels = ['January', 'February', 'March', 'April', 'May',
'June', 'July', 'August', 'September', 'October', 'November']

# Set the x-ticks and their labels


ax.set_xticks(x_ticks)
ax.set_xticklabels(new_x_tick_labels)

plt.show()
In winter season (january, February, March, November) there are less number of rides by all
riders.

Hour of Day analysis


print("Number of rides per Hour (General) : ")
print(df_data2['hour'].value_counts())

print("\n----------------------\n")

print("Number of rides per Hour (Casual) : ")


print(df_casual['hour'].value_counts())

print("\n----------------------\n")

print("Number of rides per Hour (Member) : ")


print(df_member['hour'].value_counts())

Number of rides per Hour (General) :


hour
17 428025
16 380644
18 331247
15 295075
14 252545
13 248141
12 245128
8 238158
19 232654
11 212566
7 187669
10 176117
9 175992
20 163353
21 125261
6 96560
22 95954
23 61915
0 40609
5 32780
1 25424
2 14376
4 8754
3 8632
Name: count, dtype: int64

----------------------

Number of rides per Hour (Casual) :


hour
17 144436
16 136398
15 120239
18 119185
14 110425
13 106677
12 102930
11 87985
19 85772
10 69059
20 61819
9 54124
8 52910
21 49825
22 43573
7 37676
23 30065
0 21744
6 19529
1 14379
2 8442
5 8064
3 4568
4 3574
Name: count, dtype: int64

----------------------
Number of rides per Hour (Member) :
hour
17 283589
16 244246
18 212062
8 185248
15 174836
7 149993
19 146882
12 142198
14 142120
13 141464
11 124581
9 121868
10 107058
20 101534
6 77031
21 75436
22 52381
23 31850
5 24716
0 18865
1 11045
2 5934
4 5180
3 4064
Name: count, dtype: int64

b1 = df_casual['hour'].value_counts()
b2 = df_member['hour'].value_counts()
b = pd.concat([b1,b2], axis=1)
b.sort_index(inplace=True)
b.columns = ['Casual', 'Member']

# Create the figure with specified size


fig, ax = plt.subplots(figsize=(20, 10))

# Plot the bar chart on the created axes


b.plot.bar(ax=ax)

# Add y-label
ax.set_ylabel('Ride Count')
ax.set_xlabel('Hours of Day')

# Add title
ax.set_title('Ride Counts Of Each Hour In a Day\n')

plt.show()
Most of riders tend to ride at evening (3-5pm) But in Members, we can see there is a increase in
morning times (7-8am).

Rideable type analysis


print('Number of rides pre bike type (General) : ')
print(df_data2['rideable_type'].value_counts())

print("\n----------------------\n")

print('Number of rides pre bike type (Casual) : ')


print(df_casual['rideable_type'].value_counts())

print("\n----------------------\n")

print('Number of rides pre bike type (Member) : ')


print(df_member['rideable_type'].value_counts())

Number of rides pre bike type (General) :


rideable_type
classic_bike 2657760
electric_bike 1371992
electric_scooter 47827
Name: count, dtype: int64

----------------------

Number of rides pre bike type (Casual) :


rideable_type
classic_bike 955850
electric_bike 511808
electric_scooter 25740
Name: count, dtype: int64

----------------------

Number of rides pre bike type (Member) :


rideable_type
classic_bike 1701910
electric_bike 860184
electric_scooter 22087
Name: count, dtype: int64

plt.figure(figsize=(10,10))
b = df_data2['rideable_type'].value_counts()
b.plot(kind='pie', autopct='%1.1f%%', colors=['C3','C9','C4'])
plt.legend(loc=1)
plt.title("Bike Types of General Users\n")

Text(0.5, 1.0, 'Bike Types of General Users\n')


b1 = df_casual['rideable_type'].value_counts()
b2 = df_member['rideable_type'].value_counts()
b = pd.concat([b1,b2], axis=1)
b.sort_index(inplace=True)
b.columns = ['Casual', 'Member']

# Create the figure with specified size


fig, ax = plt.subplots(figsize=(20, 10))

# Plot the bar chart on the created axes


b.plot.bar(ax=ax)

# Add y-label
ax.set_ylabel('Ride Count')
ax.set_xlabel('Bike Type')

# Add title
ax.set_title('Ride Counts Of Each Bike Type\n')

# Get the current x-tick positions


x_ticks = ax.get_xticks()

# Define new x-tick labels (replace with your desired labels)


new_x_tick_labels = ['Classic Bike', 'Electric Bike', 'Electric
Scooter']

# Set the x-ticks and their labels


ax.set_xticks(x_ticks)
ax.set_xticklabels(new_x_tick_labels)

plt.show()

Most popular rideable type of all is clasic bike type.

Start and End station analysis


print("Top 10 start stations for all riders :")
print(df_data2['start_station_name'].value_counts()[:5])

print("\n----------------------\n")
print("Top 10 start stations for Casual riders :")
print(df_casual['start_station_name'].value_counts()[:5])

print("\n----------------------\n")

print("Top 10 start stations for Member riders :")


print(df_member['start_station_name'].value_counts()[:5])

Top 10 start stations for all riders :


start_station_name
Streeter Dr & Grand Ave 61666
DuSable Lake Shore Dr & Monroe St 40971
DuSable Lake Shore Dr & North Blvd 36427
Michigan Ave & Oak St 35923
Kingsbury St & Kinzie St 34164
Name: count, dtype: int64

----------------------

Top 10 start stations for Casual riders :


start_station_name
Streeter Dr & Grand Ave 47916
DuSable Lake Shore Dr & Monroe St 31782
Michigan Ave & Oak St 23156
DuSable Lake Shore Dr & North Blvd 21274
Millennium Park 20574
Name: count, dtype: int64

----------------------

Top 10 start stations for Member riders :


start_station_name
Kingsbury St & Kinzie St 25362
Clinton St & Washington Blvd 23882
Clinton St & Madison St 21545
Clark St & Elm St 21412
Clinton St & Jackson Blvd 17648
Name: count, dtype: int64

print("Top 10 end stations for all riders :")


print(df_data2['end_station_name'].value_counts()[:5])

print("\n----------------------\n")

print("Top 10 end stations for Casual riders :")


print(df_casual['end_station_name'].value_counts()[:5])

print("\n----------------------\n")
print("Top 10 end stations for Member riders :")
print(df_member['end_station_name'].value_counts()[:5])

Top 10 end stations for all riders :


end_station_name
Streeter Dr & Grand Ave 63193
DuSable Lake Shore Dr & North Blvd 39997
DuSable Lake Shore Dr & Monroe St 39754
Michigan Ave & Oak St 36164
Kingsbury St & Kinzie St 33754
Name: count, dtype: int64

----------------------

Top 10 end stations for Casual riders :


end_station_name
Streeter Dr & Grand Ave 51945
DuSable Lake Shore Dr & Monroe St 29778
DuSable Lake Shore Dr & North Blvd 25037
Michigan Ave & Oak St 24035
Millennium Park 22548
Name: count, dtype: int64

----------------------

Top 10 end stations for Member riders :


end_station_name
Kingsbury St & Kinzie St 25486
Clinton St & Washington Blvd 24214
Clinton St & Madison St 22200
Clark St & Elm St 21265
Clinton St & Jackson Blvd 17556
Name: count, dtype: int64

cx1,cy1 = df_casual[df_casual['start_station_name']=='Streeter Dr &


Grand Ave'][['start_lng', 'start_lat']].mean()
cx2,cy2 = df_casual[df_casual['start_station_name']=='DuSable Lake
Shore Dr & Monroe St'][['start_lng', 'start_lat']].mean()
cx3,cy3 = df_casual[df_casual['start_station_name']=='DuSable Lake
Shore Dr & North Blvd'][['start_lng', 'start_lat']].mean()
cx4,cy4 = df_casual[df_casual['start_station_name']=='Michigan Ave &
Oak St'][['start_lng', 'start_lat']].mean()
cx5,cy5 = df_casual[df_casual['start_station_name']=='Millennium
Park'][['start_lng', 'start_lat']].mean()

caual_lst = [
[cx1,cy1],
[cx2,cy2],
[cx3,cy3],
[cx4,cy4],
[cx5,cy5]
]

import folium

Most visited stations of Casual riders


avg_lat = sum(location[1] for location in caual_lst) / len(caual_lst)
avg_lon = sum(location[0] for location in caual_lst) / len(caual_lst)
map_center = [avg_lat, avg_lon]
# map_center = [cy1, cx1]

# Initialize the map


casual_map = folium.Map(location=map_center, zoom_start=14)

locations = folium.map.FeatureGroup()

for lon, lat in caual_lst:


locations.add_child(
folium.vector_layers.CircleMarker(
[lat, lon],
radius=10,
color='yellow',
fill=True,
fill_color='blue',
fill_opacity=0.7
)
)

lables = ['Streeter Dr & Grand Ave', 'DuSable Lake Shore Dr & Monroe
St', 'DuSable Lake Shore Dr & North Blvd', 'Michigan Ave & Oak St',
'Millennium Park']

for location, label in zip(caual_lst, lables):


lon, lat = location # Unpack latitude and longitude
folium.Marker([lat, lon], popup=label).add_to(casual_map)

casual_map.add_child(locations)

<folium.folium.Map at 0x3f084c690>

mx1,my1 = df_casual[df_casual['start_station_name']=='Kingsbury St &


Kinzie St'][['start_lng', 'start_lat']].mean()
mx2,my2 = df_casual[df_casual['start_station_name']=='Clinton St &
Washington Blvd'][['start_lng', 'start_lat']].mean()
mx3,my3 = df_casual[df_casual['start_station_name']=='Clinton St &
Madison St'][['start_lng', 'start_lat']].mean()
mx4,my4 = df_casual[df_casual['start_station_name']=='Clark St & Elm
St'][['start_lng', 'start_lat']].mean()
mx5,my5 = df_casual[df_casual['start_station_name']=='Clinton St &
Jackson Blvd'][['start_lng', 'start_lat']].mean()

member_list = [
[mx1,my1],
[mx2,my2],
[mx3,my3],
[mx4,my4],
[mx5,my5]
]

Most visited stations of Member riders


avg_lat = sum(location[1] for location in member_list) /
len(member_list)
avg_lon = sum(location[0] for location in member_list) /
len(member_list)
map_center = [avg_lat, avg_lon]
# map_center = [cy1, cx1]

# Initialize the map


member_map = folium.Map(location=map_center, zoom_start=14)

locations = folium.map.FeatureGroup()

for lon, lat in member_list:


locations.add_child(
folium.vector_layers.CircleMarker(
[lat, lon],
radius=10,
color='yellow',
fill=True,
fill_color='blue',
fill_opacity=0.7
)
)

lables = ['Kingsbury St & Kinzie St', 'Clinton St & Washington Blvd',


'Clinton St & Madison St', 'Clark St & Elm St', 'Clark St & Elm St']

for location, label in zip(member_list, lables):


lon, lat = location # Unpack latitude and longitude
folium.Marker([lat, lon], popup=label).add_to(member_map)

member_map.add_child(locations)
<folium.folium.Map at 0x3f073ed90>

View publication stats

You might also like