Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
15 views21 pages

09 Lineplot

The document provides a detailed guide on using Seaborn's lineplot function to visualize parking occupancy data from a UCI dataset. It includes steps for data acquisition, cleaning, and visualization, along with various customization options for the plots. The document also covers advanced features like bootstrapping and visual semantics such as hue, style, and size in line plots.

Uploaded by

kart238
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views21 pages

09 Lineplot

The document provides a detailed guide on using Seaborn's lineplot function to visualize parking occupancy data from a UCI dataset. It includes steps for data acquisition, cleaning, and visualization, along with various customization options for the plots. The document also covers advanced features like bootstrapping and visual semantics such as hue, style, and size in line plots.

Uploaded by

kart238
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

09-lineplot

August 13, 2024

1 Seaborn: lineplot
[1]: import seaborn as sns
from matplotlib import pyplot as plt

import pandas as pd

Grab parking data from UCI resource with Bash commands and read in as pandas DataFrame.
[2]: !wget https://archive.ics.uci.edu/ml/machine-learning-databases/00482/dataset.
↪zip

--2020-08-12 14:49:33-- https://archive.ics.uci.edu/ml/machine-learning-


databases/00482/dataset.zip
Resolving archive.ics.uci.edu (archive.ics.uci.edu)… 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443…
connected.
HTTP request sent, awaiting response… 200 OK
Length: 240539 (235K) [application/x-httpd-php]
Saving to: ‘dataset.zip’

dataset.zip 100%[===================>] 234.90K 545KB/s in 0.4s

2020-08-12 14:49:34 (545 KB/s) - ‘dataset.zip’ saved [240539/240539]

[3]: !unzip dataset.zip

Archive: dataset.zip
inflating: dataset.csv

[4]: df = pd.read_csv('dataset.csv', parse_dates=[3])

[5]: df.head()

[5]: SystemCodeNumber Capacity Occupancy LastUpdated


0 BHMBCCMKT01 577 61 2016-10-04 07:59:42
1 BHMBCCMKT01 577 64 2016-10-04 08:25:42

1
2 BHMBCCMKT01 577 80 2016-10-04 08:59:42
3 BHMBCCMKT01 577 107 2016-10-04 09:32:46
4 BHMBCCMKT01 577 150 2016-10-04 09:59:48

Do a bit of data cleaning: - Rename some columns - Create three new date/time columns for later
aggregations - Filter down to only two garage locations
[6]: df.rename(columns={'SystemCodeNumber': 'Location', 'LastUpdated': 'Timestamp'},␣
↪inplace=True)

df['Day'] = df.Timestamp.dt.date
df['Month'] = df.Timestamp.dt.month
df['Hour'] = df.Timestamp.dt.hour

[7]: park = df[df.Location.isin(['Broad Street', 'NIA South'])]

[8]: park.head()

[8]: Location Capacity Occupancy Timestamp Day \


20171 Broad Street 690 178 2016-10-04 07:59:42 2016-10-04
20172 Broad Street 690 269 2016-10-04 08:25:42 2016-10-04
20173 Broad Street 690 415 2016-10-04 08:59:42 2016-10-04
20174 Broad Street 690 530 2016-10-04 09:32:46 2016-10-04
20175 Broad Street 690 600 2016-10-04 09:59:48 2016-10-04

Month Hour
20171 10 7
20172 10 8
20173 10 8
20174 10 9
20175 10 9

1.1 Intro Visuals


[9]: blue, orange, green, red = sns.color_palette()[:4]

[10]: sns.set_style('white')
plt.rc('xtick', labelsize=14)
plt.rc('ytick', labelsize=14)
plt.rc('date.autoformatter', day='%b %Y')

[11]: months = [pd.datetime(2016, 10, 1), pd.datetime(2016, 11, 1), pd.datetime(2016,␣


↪12, 1)];

[16]: plt.figure(figsize=(10,6))
sns.lineplot(park.Day, park.Occupancy, ci=None)
plt.xticks(months)
plt.yticks([])

2
plt.xlim(None, pd.datetime(2016, 12, 1))
plt.ylim(0, 570)
sns.despine(left=True)
plt.xlabel('')
plt.ylabel('')
plt.tight_layout();

[17]: plt.figure(figsize=(10,6))
sns.lineplot(park.Day, park.Occupancy)
plt.xticks(months)
plt.yticks([])
plt.xlim(None, pd.datetime(2016, 12, 1))
plt.ylim(0, 570)
sns.despine(left=True)
plt.xlabel('')
plt.ylabel('')
plt.tight_layout();

3
[18]: plt.figure(figsize=(10,6))
sns.lineplot(park.Day, park.Occupancy, hue=park.Location, palette=['gray',␣
↪'purple'])

plt.xticks(months)
plt.yticks([])
plt.xlim(None, pd.datetime(2016, 12, 1))
sns.despine(left=True)
plt.xlabel('')
plt.ylabel('')
plt.legend([], frameon=False)
plt.tight_layout();

4
[19]: plt.rc('date.autoformatter', day='%b 1st')
plt.figure(figsize=(6,4))
sns.lineplot(park.Day, park.Occupancy)
plt.xticks(months)
plt.yticks([])
plt.xlim(pd.datetime(2016, 10, 30), pd.datetime(2016, 11, 6))
sns.despine(left=True)
plt.ylim(0, 600)
plt.xlabel('')
plt.ylabel('')
plt.tight_layout();

5
[20]: plt.rc('date.autoformatter', day='%b 1st')
plt.figure(figsize=(6,4))
sns.lineplot(park.Day, park.Occupancy, ci=None)
plt.xticks(months)
plt.yticks([])
plt.xlim(pd.datetime(2016, 10, 30), pd.datetime(2016, 11, 6))
plt.ylim(0, 600)
sns.despine(left=True)
plt.xlabel('')
plt.ylabel('')
plt.tight_layout();

6
[21]: plt.rc('xtick', labelsize=10)
plt.rc('ytick', labelsize=10)

1.2 Basics
[22]: sns.set_style('dark')

[23]: months = [pd.datetime(2016, 10, 1),


pd.datetime(2016, 11, 1),
pd.datetime(2016, 12, 1)]
plt.rc('date.autoformatter', day='%b %Y');

[24]: sns.lineplot(park.Day, park.Occupancy)


plt.xticks(months);

7
[25]: sns.lineplot(park.Hour, park.Occupancy);

8
[26]: sns.lineplot(x='Hour', y='Occupancy', data=park);

1.3 Bootstrapping
[27]: sns.lineplot(x='Hour', y='Occupancy', data=park,
n_boot=1000
);

9
Decreasing the number of bootstrap samples will increase variance of confidence intervals.
[28]: sns.lineplot(x='Hour', y='Occupancy', data=park,
n_boot=10
);

10
[29]: sns.lineplot(x='Hour', y='Occupancy', data=park,
ci=95
);

[30]: sns.lineplot(x='Hour', y='Occupancy', data=park,


ci=68
);

11
To turn off the bootstrapped confidence intervals, set ci=None to trigger early exit within Seaborn
code. (A conditional checks for this case and completely bypasses the bootstrapping procedure if
ci is set to None. This saves time if confidence intervals are not needed!)

[31]: sns.lineplot(x='Hour', y='Occupancy', data=park,


ci=None
);

12
[32]: sns.lineplot(x='Hour', y='Occupancy', data=park,
estimator='mean'
);

13
[33]: sns.lineplot(x='Hour', y='Occupancy', data=park,
estimator='sum'
);

[34]: sns.lineplot(x='Hour', y='Occupancy', data=park,


estimator='std'
);

14
1.4 Visual Semantics
1.4.1 hue
[35]: sns.lineplot(x='Day', y='Occupancy', data=park, hue='Location')

plt.xticks(months);

15
[36]: sns.lineplot(x='Day', y='Occupancy', data=park,
hue='Location',
palette = ['gray', 'xkcd:brick red']
)

plt.xticks(months);

16
1.4.2 style

[37]: sns.lineplot(x='Day', y='Occupancy', data=park, style='Location')

plt.xticks(months);

17
[38]: sns.lineplot(x='Day', y='Occupancy', data=park,
hue="Location",
style='Location'
)

plt.xticks(months);

18
1.4.3 size
[39]: sns.lineplot(x='Hour', y='Occupancy', data=park, size='Location',
);

19
[40]: sns.lineplot(x='Hour', y='Occupancy', data=park, ci=None,
size='Month'
);

1.5 Style
Most of matplotlib’s line styling works within the Seaborn lineplot. (The main exception is that
linestyle does not work.)

[41]: sns.lineplot(x='Day', y='Occupancy', data=park, ci=None,


lw=4,
color='#aa00aa',
alpha=0.5
)

plt.xticks(months);

20
[ ]:

21

You might also like