Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
271 views2 pages

Data Analysis for Bike Sharing

The document provides instructions to analyze a bike sharing dataset using Python and Pandas. It includes tasks like reading multiple data sheets, identifying attribute types, plotting charts to analyze average hourly demand across seasons and weather, comparing hourly totals using box plots, exploring relationships between demand and temperature/weather attributes using scatter plots and binning, and generating a correlation heatmap. Insights from the analyses can provide recommendations to the bike sharing business.

Uploaded by

nahar570
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
271 views2 pages

Data Analysis for Bike Sharing

The document provides instructions to analyze a bike sharing dataset using Python and Pandas. It includes tasks like reading multiple data sheets, identifying attribute types, plotting charts to analyze average hourly demand across seasons and weather, comparing hourly totals using box plots, exploring relationships between demand and temperature/weather attributes using scatter plots and binning, and generating a correlation heatmap. Insights from the analyses can provide recommendations to the bike sharing business.

Uploaded by

nahar570
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

For Bike sharing data set, please conduct the following analysis.

1. Read the Bike sharing dataset in python using pandas library. Read both the sheets of
data into different dataframes and concatenate these two dataframes to make it a single
dataframe containing the records from both the sheets.

Hint: Use read_excel() function in python and explore it’s usage from internet

2. Identify nominal, ordinal, binary, and numeric attributes in the bike sharing dataset.

3. Plot the bar chart for average hourly bikes rented for different values of season, weather
situation, month, hour, holiday, workingday, and weekday attributes. Highlight and
discuss relevant analysis from the charts.

Hint: You may use groupby function in pandas to compute average of cnt attribute for
various values of season attribute and then plot the results using a bar chart in python.
Season_Average_Demand=data.groupby(['Season’])['cnt'].mean()
sns.barplot(Season_Average_Demand.index,Season_Average_Demand.values,
alpha=0.5)

4. Plot the bar chart to compare the average hourly demand during holiday and non-
holiday days across different seasons. Highlight and discuss any interesting insight
observed in these charts.

Hint: Use the crosstab function and specify which attribute values you want to be
aggregated instead of counts. You also need to provide which aggregation function is
to be applied on the target attribute. For example: following command will extract you
average value of registered column for various combinations of weathersit and holiday.
pd.crosstab(data.weathersit, data.holiday, values=data.registered,
aggfunc='mean').round(0).plot(kind=”bar”)
5. Compare the hourly total bikes rented across various seasons and weather situation
using box plot in python. Highlight and discuss relevant insights from these charts.

6. Find the relationship between temp attribute and total bikes rented attribute. Using
following methods:

A: Plot a scatter plot between temp attribute and cnt variable. Is there a linear trend
between the two?

B: Create 10 bins for temp attribute in new column in dataframe. Plot the bar chat for
average cnt variable across all the temperature bins.

Hint: Create bins for temperature using cut function in pandas. For example we need
to create 10 equal sized bins for temp attribute. Following code will save the bin label
for each temp value by creating new column ‘temp_bin’. Bin labels are assigned as
1,2,3,4,.. ,10.
num_bins=10
data['temp_bin'] = pd.cut(data['temp'],bins=num_bins,labels=range(1,num_bins+1))

Do A and B provide similar insights? Depending upon the insights, what would you
recommend to the business?

7. Conduct the same analysis for windspeed and cnt attribute. Similarly, please find the
relationship between humidity and total bikes rented. Please provide your suggestions
for the business.

8. Generate the heatmap of correlation among cnt, temp, atemp, hum, and windspeed
variables. Discuss the correlation score of cnt with other attributes.

You might also like