Introduction to
Seaborn
Topics Covered
• Seaborn Intro
• Distribution Plots
• Categorical Plots
• Matrix Plots
• Grid Plots
• Regression Plots
2
Seaborn
• Uses Matplotlib underneath to plot graphs
• Statistical data visualization ploting library
• Designed to work with dataframe
• Comes with built-in data sets
• Installation
• conda install seaborn
• pip install seaborn
• Example gallery
• https://seaborn.pydata.org/examples/index.html
3
Distribution Plots
• import seaborn as sns
distplot • sns.__version__
• %matplotlib inline # notebook
jointplot • built-in data sets!
• tips = sns.load_dataset('tips’)
pairplot • tips.head()
rugplot
kdeplot
4
The distplot shows the
distribution of a univariate
set of observations
• `distplot` is a deprecated function and will be
removed in seaborn v0.14.0. Replaced by histplot()
• https://seaborn.pydata.org/generated/seaborn.distplot.html
• sns.distplot(tips['total_bill’])
• To remove the kde layer and just have the
histogram use
• sns.distplot(tips['total_bill’], kde=False, bins=30)
• Kernel Density Estimation (KDE)
• Way to estimate the probability density function
of a continuous random variable
• used for non-parametric analysis
5
jointplot
• Allows to basically match up two distplots
for bivariate data (used to determine the
relation between two variables)
• sns.jointplot(x='total_bill’, y='tip’,
data=tips, kind='scatter’)
• sns.jointplot(x='total_bill’, y='tip’,
data=tips, kind='hex’)
• sns.jointplot(x='total_bill’, y='tip’,
data=tips, kind='reg’) # More type:
resid, kde
6
pairplot
• A pairplot visualizes pairwise
relationships between numerical
columns in a dataframe and can
use the hue argument to color
points based on a categorical
column
• sns.pairplot(tips) # normal
• sns.pairplot(tips, hue='sex’,
palette='coolwarm’) # with hue
• hue allows to visually encode an
additional dimension of information
7
rugplot
• A rugplot draws dash marks for each point in
a univariate distribution and is a building
block for a KDE plot
• sns.rugplot(tips['total_bill'])
8
kdeplot
• KDE plots replace every single observation with a
Gaussian (Normal) distribution centered around that
value
• sns.kdeplot(tips['total_bill’])
• sns.rugplot(tips['total_bill'])
x = np.random.randn(200)
sns.kdeplot(x, fill = True)
9
Categorical Data Plots
• Main few plots: • import seaborn as sns
• barplot • sns.__version__
• countplot • %matplotlib inline # notebook
• boxplot • built-in data sets!
• violinplot • tips = sns.load_dataset('tips’)
• stripplot • tips.head()
• swarmplot
• catplot
10
barplot
• These plots provide a concise summary of
aggregate data based on a categorical
feature in the dataset
• sns.barplot(x='sex’, y='total_bill’,
data=tips)
• Showing default avg/mean values of categorical
column (sex)
• sns.barplot(x='sex’, y='total_bill’,
data=tips, estimator=np.std)
• estimator object converts vector to a
scalar
• Statistical function to estimate within each
categorical bin
11
countplo
t
• Same as barplot
except the estimator
is explicitly counting
the number of
occurrences
• Need to pass x value
• sns.countplot(x =
'sex’, data = tips)
12
sns.boxplot(x="day", y="total_bill", data=tips,
palette='rainbow')
boxplot
• Bottom black horizontal line of blue box plot is
minimum value
• First black horizontal line of rectangle shape
of blue box plot is First quartile or 25%
• Second black horizontal line of rectangle
shape of blue box plot is Second quartile or
50% or median.
• Third black horizontal line of rectangle
shape of blue box plot is third quartile or
75%
• Top black horizontal line of rectangle
shape of blue box plot is maximum value.
• Small diamond shape of blue box plot is outlier
sns.boxplot(x="day", y="total_bill", hue="smoker", data or erroneous data.
data=tips, palette="coolwarm")
# you can add hue, then it will show two plot for
each day
13
violinplot
• Similar to a box plot, but
with a mirrored, rotated
kernel density estimate on
both sides
• Used for comparing
probability distributions
(one/more categorial
variables)
14
https://www.labxchange.org/library/items/lb:LabXchange:46f64d7
a:html:1
violinplot
15
violinplot (Compare with boxplot)
sns.violinplot(x="day", y="total_bill", data=tips, hue='sex’, split=True, palette='Set1')
16
stripplot
• An effective complement to a boxplot or violin plot
when displaying all observations alongside a
summarized distribution representation
• Used to draw a scatter plot based on the category
• sns.stripplot(x="day", y="total_bill", data=tips)
• sns.stripplot(x="day", y="total_bill", data=tips,
jitter=True, hue='sex’, palette=‘Set1’)
• jitter can be used to provide displacements along the
horizontal axis
17
• sns.stripplot(x="day", y="total_bill",
data=tips)
stripplot • sns.stripplot(x="day", y="total_bill",
data=tips, jitter=True, hue='sex’,
palette=‘Set1’)
18
• Similar to stripplot, only the points are adjusted so it
won’t get overlap
swarmplot • sns.swarmplot(x="day", y="total_bill", data=tips)
• sns.swarmplot(x="day", y="total_bill", hue='sex’,
data=tips, palette="Set1", split = True)
19
catplot
• Most general form of a
categorical plot
• It can take in a kind
parameter to adjust the plot
type
• sns.catplot(x='sex’,
y='total_bill’, data=tips,
kind='box’)
• Kind option: “strip”, “swarm”,
“box”, “violin”, “boxen”,
“point”, “bar”, or “count”
20
Combining
Categorical Plots
• sns.violinplot(x ='day', y
='total_bill', data = tips)
• sns.swarmplot(x ='day',
y ='total_bill', data =
tips, color ='black’)
21
Matrix Plots
• Matrix plots display data as color-coded grids,
highlighting patterns or clusters.
• Heatmap • import seaborn as sns
• clustermap • sns.__version__
• %matplotlib inline # notebook
• built-in data sets!
• flights = sns.load_dataset('flights')
• tips = sns.load_dataset('tips’)
• tips.head()
• flights.head()
22
Heatmap
• Data should already
be in a matrix form
• sns.heatmap(tips.corr(
))
• sns.heatmap(tips.corr(
), cmap='coolwarm’,
annot=True)
23
Heatmap (Flight
Dataset)
• pvflights =
flights.pivot_table(values='passenger
s',index='month',columns='year’)
• sns.heatmap(pvflights)
• sns.heatmap(pvflights,cmap='magm
a',linecolor='white',linewidths=1)
24
clustermap
• A clustermap applies hierarchical clustering
to create a grouped version of a heatmap.
• sns.clustermap(pvflights)
• sns.clustermap(pvflights,cmap='coolwarm
',standard_scale=1)
• Years and months are reordered by
similar passenger counts.
25
Grids
• Allow to map plot types to rows and columns of a grid,
this helps to create similar plots separated by features
• PairGrid • built-in data sets!
• Pairplot • tips = sns.load_dataset('tips’)
• Facet Grid
• iris = sns.load_dataset('iris’)
• JointGrid • iris.head()
26
Pairgrid and pairplot
• Pairgrid is a subplot grid for plotting pairwise
relationships in a dataset.
• g = sns.PairGrid(iris) # Just the grid
• g.map(plt.scatter)
• g.map_diag(plt.hist) Pairplot (Similar to pairgrid)
sns.pairplot(iris)
• g.map_upper(plt.scatter) sns.pairplot(iris,hue='species',palet
• g.map_lower(sns.kdeplot)te='rainbow')
27
Facet Grid
• FacetGrid is a versatile tool for creating plot grids
based on a feature.
• g = sns.FacetGrid(tips, col="time", row="smoker")
• g = sns.FacetGrid(tips, col="time", row="smoker")
• g = g.map(plt.hist, "total_bill")
• g = sns.FacetGrid(tips, col="time", row="smoker",
hue='sex’)
• # Notice how the arguments come after plt.scatter call
• g = g.map(plt.scatter, "total_bill", "tip").add_legend()
28
JointGrid
• JointGrid is the general
version for jointplot() type
grids
•g=
sns.JointGrid(x="total_bill",
y="tip", data=tips)g =
g.plot(sns.regplot,
sns.distplot)
29
Regression
• sns.lmplot(x='total_bill',y='tip',data=tips)
• sns.lmplot(x='total_bill',y='tip',data=tips,hue='sex
’)
Plots • sns.lmplot(x='total_bill',y='tip',data=tips,hue='sex
',palette='coolwarm’)
Lmplot() • Working with Markers
visualizes linear • lmplot kwargs get passed through to regplot
models and • regplot has a scatter_kws parameter that gets
passed to plt.scatter
enables splitting • sns.lmplot(x='total_bill',y='tip',data=tips,h
ue='sex',palette='coolwarm',
plots by features markers=['o','v'],scatter_kws={'s':100})
while using hue • Using a Grid (col or row argument)
for feature-based • sns.lmplot(x='total_bill',y='tip',data=tips,col='
sex’)
coloring. • Aspect and Size
• sns.lmplot(x='total_bill',y='tip',data=tips,col='
day',hue='sex',palette='coolwarm',
aspect=0.6,size=8)
30