Unit-5 Data Visualization using dataframe 93
UNIT- 5
Data Visualization
using data frame
94 . t v·,sualization using dataframe
Umt-5 0a a
Data Visualization
· .
Data visualization is the technique to present t he data in a pictorial or graphical
format . It enables stakeholders and decision makers to analyze data visually. The data
·m a grap h'1ca I f ormat allows them to .1dent1•ty new t rends and patterns easily.
Matplotlib Python Libraries
matplotlib is a python two-dimensional plotting library for data visualization and
creating interactive graphics or plots. Using pythons matplotlib, the data visualization
of large and complex data becomes easy.
matplotlib.pyplot is a plotting library used for 2D graphics in python programming
language. It can be used in python scripts, shell, web application servers and other
graphical user interface toolkits.
matplotlib.pyplot is a collection of command style functions that make Matplotlib
work like MATLAB. Each Pyplot function makes some change to a figure. For example,
a function creates a figure, a plotting area in a figure, plots some lines in a plotting
area, decorates the plot with labels, etc.
install matplotlib
To use matplotlib, we need to install it.
Step 1 - Make sure Python and pip is preinstalled on your system
Type the following commands in the command prompt to check is python and pip is
installed on your system.
To check Python
python --version
If python is successfully installed, the version of python installed on your system will
be displayed.
To check pip
pip-V
The version of pip will be displayed, if it is successfully installed on your system.
Step 2 - Install Matplotlib
Matplotlib can be installed using pip. The following command is run in the command
prompt to install Matplotlib.
pip install matplotlib
This command will start downloading and installing packages related to the matplotlib
library. Once done, the message of successful installation will be displayed.
Step 3 - Check if it is installed successfully
To verify that matplotlib is successfully installed'on your system, execute the following
command in the command prompt. If matplotlib is successfully installed, the version
of matplotlib installed will be displayed.
untt-5 Data Visualization using dataframe 95
import matplotlib
matplotltb._version_
irnporting inatplotlib.pyplot
Most of the Matplotlib utilities lies under the pyplot submodule, and are usually
irnported under the pit alias :
import matplotlib.pyplot as pit
Now the Pyplot package can be referred to as pit.
Example
Draw a line in a diagram from position (0,0) to position (6,250):
import matplotlib.pyplot as pit
import numpy as np
xpoints = np.array([0, 6])
ypoints = np.array([0, 250])
plt.plot(xpoints, ypoints)
pit.show()
output:
250
200
150
100
50
0
p
Y.[llt·S Data Visualization using dataframe 97
output:
111
LS
10
Example:
. ,import numpy as np
import matplotlib.pyplot as pit
x = np.linspace(0, 10, 1000)
fig, ax = pit.subplots()
ax.plot(x, np.sin(x), '--b', label ='Sine')
ax.plot(x, np.cos(x), c ='r', label ='Cesine')
ax.axis('equal')
leg= ax.legend(loc ="lower left");
Output:
-2
--- 511,e
-3 - CDslne
0 6 10
98 Unit-5 Data Visualization using dataframe
Subplots
With the subplots() function you can draw multiple plots in one figure :
Example:
Draw 2 plots:
import matplotlib.pyplot as pit
import numpy as np
#plot 1:
x = np.array([0, 1, 2, 31)
Y = np .array([3, 8, 1, 10])
plt.subplot(l, 2, 1)
plt.plot(x,y)
#plot 2:
x = np.array([0, 1, 2, 3]}
Y = np.array([lO, 20, 30, 40])
plt.subplot(l, 2, 2)
plt.plot(x, y)
pit.show()
Output:
10
Unlt-5 Data Visualization using dataframe
99
scatter Plot in Matplotlib
scatter plots are great for visualizing data points in two dimensions. They' re
particularly useful for showing correlations and groupings in data.
In matplotlib, you can create a scatter plot using the pyplot's scatter() function . The
following is the syntax:
plt.scatter(x_values, y_values)
Here, x_values are the values to be plotted on the x-axis and y_values are the values
to be plotted on the y-axis.
Examples:
We have the data for heights and weights of 10 students at a university and want to
plot a scatter plot of the distribution between them. The data is present in two lists.
One having the height and the other having the corresponding weights of each
student.
import matplotlib.pyplot as pit
# height and weight data
height= [167, 175, 170, 186, 190, 188,158,169, 183, 180]
weight= [65, 70, 72, 80, 86, 94, SO, 58, 78, 85]
# plot a scatter plot with star markers
plt.scatter(weight, height, marker='*', s=80)
# set axis lables
plt .xlabel("Weight (Kg)")
plt.ylabel("Height (cm)")
# set chart title
plt .title("Height v/s Weight")
pit.show()
100 Unit-5 Data Visualization using dataframe
Output:
Height V/S Weight
190
* *
185
180
*
* *
Ius
...
l!
*
* * *
1170
155
160
*
so
"' JI)
-llKQI
,0 ,0
You can alter the shape of the marker with the marker parameter and size of the
marker with the s parameter of the scatter() function . Matplotlib's pyplot has handy
functions to add axis labels and title to your chart.
Line chart in Matplotlib
A line chart or line graph is a type of chart which displays information as a series of
data points called 'markers' connected by straight line segments.
Line graphs are usually used to find relationship between two data sets on different
axis; for instance X, Y.
Line charts are used to represent the relation between two data X and Y on a different
axis. In matplotlib, you can plot a line chart using pyplot's plot() function . The
following is the syntax to plot a line chart:
plt.plot(x_values, y_values)
Here, x_values are the values to be plotted on the x-axis and y_values are the values
to be plotted on the y-axis.
Example:
import matplotlib.pyplot as pit
# number of employees of A
emp_count = (3, 20, 50,200,350,400]
year=[2014, 2015,2016, 2017, 2018,2019]
# plot a line chart
plt.plot{year, emp_count)
pit.show{)
Unlt-5 Data Visualization using dataframe 101
output:
2014 2015 2016 2017 2018 2019
You can see in the above chart that we have the year on the x-axis and the employee
count on the y-axis. The chart shows an upward trend in the employee count at the
company A year on year.
Matplotlib's pyplot comes with handy functions to set the axis labels and chart title.
You can use pyplot's xlabel() and ylabel() functions to set axis labels and use pyplot's
title() function to set the title for your chart.
Plot multiple lines in a single chart
Matplotlib also allows you to plot multiple lines in the same chart. Generally used to
show lines that share the same axis, for example, lines sharing the x-axis. The y-axis
can also be shared if the second series has the same scale, or if the scales are different
you can also plot on two different y-axes. Let's look at examples for both cases.
Example:
import matplotlib.pyplot as pit
# number of employees
emp_countA = (3, 20, 50, 200, 350, 400]
emp_countB = [250, 300, 325, 380, 320, 350]
year=[2014,2015,2016,2017,2018,2019]
# plot two lines
plt.plot(year, emp_countA, 'o-g')
plt.plot(year, emp_countB, 'o-b')
# set axis titles
plt.xlabel("Year")
plt.ylabel("Employees")
# set chart title
plt.title("Employee Growth")
#legend
plt.legend(['A', 'B'])
pit.show()
102
Unit-5 Data Visualization using dataframe
Output:
_ _ _...:E::.:m:,::pl:oyc::
::.! ••:..:G::.:ro:::wt::.:.h_ _ _- - ,
400 1
350
i:
J:)()
l I 100
so
2014 2011 2016 2017 2018 2019
Both lines share the same axes. Also note, that we added a legend to easily distinguish
between the two companies.
With Pyplot, you can use the xlabel() and ylabel() funct ions to set a label for the x- and
y-axis.
Plot Histogram in Matplotlib
Histograms show the frequency distribution of values of a variable across different
buckets. They are great for visualizing the distribution of a variable.
A histogram is basically used to represent data provided in a form of some groups. It is
accurate method for the graphical representation of numerical data distribution. It is a
type of bar plot where X-axis represents the bin ranges while Y-axis gives information
about frequency.
Creating a Histogram
To create a histogram the first step is to create bin of the ranges, then distribute the
whole range of the values into a series of intervals, and the count the values which fall
into each of the intervals. Bins are clearly identified as consecutive, non-overlapping
intervals of variables.
The matplotlib.pyplot.hist() function is used to compute and create histogram of x.
The following table shows the parameters accepted by matplotlib.pyplot.hist()
function:
Attribute parameter
X array or sequence of array
bins optional parameter contains integer or sequence or strings
density optional parameter contains boolean values
range optional parameter represents upper and lower range of bins
unit-5 Data Visualization using dataframe 103
histtype optional parameter used to create type of histogram (bar, barstacked,
step, stepfilled], default is "bar"
align optional parameter controls the plotting of histogram [leh, right, mid]
weights optional parameter contains array of weights having same dimensions
as x
bottom location of the baseline of each bin
rwidth
optional parameter which is relative width of the bars with respect to
bin width
color
optional parameter used to set color or sequence of color specs
label
optional parameter string or sequence of string to match with multiple
datasets
log
optional parameter used to set histogram axis on log scale
Let's create a basic histogram of some random values. Below code creates a simple
histogram of some random values:
from matplotlib import pyplot as pit
import numpy as np
# Creating dataset
a= np.array((22, 87, 5, 43, 56,
73, 55, 54, 11,
20, 51, 5, 79, 31,
27])
# Creating histogram
fig, ax= plt.subplots(figsize =(10, 7))
ax.hist(a, bins= [0, 25, 50, 75, 100])
# Show plot
pit.show()
104 Unit-S Data Visualization using dataframe
Output:
You can also specify your own bin edges which can be unequally spaced . For this,
instead of passing an integer to the bins parameter, pass a sequence with the bin
edges. For example, if you want to have bins o to 20, 20 to 50, 50 to 70, 70 to 90, and
90to 100 :
Example:
import matplotlib.pyplot as pit
#scores in the Math class
math_scores = [72, 41, 65, 63, 82, 63, 51, 57, 39, 63,62, 68, 52, 76, 62, 73, 72,
73, 71, 62,76, 53, 71, 79, 77, 35, 65, 59, 58, 70,73, 69, 59, 75, 73, 63, 65, 81, 46,
59,53, 71, 79,80,60,60,64,40, 73, 75,68,58,81,65,55,62,82,47,85,62,
39, 77, 82, 78, 57, 58, 72, 75, 65, 68, 86, 49, 39, 64, 54, 68, 85, 77, 62, 53,52,
76,80,84,69,61,69,65,89,97, 71,61, 77,40,83,52, 78,54,64,58]
# specify the bin edges
bin_edges = [0,20,50,70,90,100]
# plot histogram
plt.hist(math_scores, bins=bin_edges)
# add formatting
plt.xlabel("Marks in Math")
plt.ylabel("Students")
plt.title("Histogram of scores in the Math class")
pit.show()
Output:
unit-5 Data Visualization using dataframe 105
Histogram of scores in the Math cl1ss
50
r 20
10
20 40 60 100
Man:., In Mith
Here, the bins are unequally spaced because of the bin edges specified. Matplotlib's
hist() function also has a number of other parameters to customize your plots even
further.
Bar Plot in Matplotlib
A bar plot or bar chart is a graph that represents the category of data with
rectangular bars with lengths and heights that is proportional to the values which
they represent. The bar plots can be plotted horizontally or vertically. A bar chart
describes the comparisons between the discrete categories. One of the axis of the
plot represents the specific categories being compared, while the other axis
represents the measured values corresponding to those categories. ,,, ,
The syntax of the bar() function to be used with the axes is as follows:-
plt.bar(x, height, width, bottom, align)
The function creates a bar plot bounded with a rectangle depending on the given
parameters. Following is a simple example of the bar plot, which represents the
number of students enrolled in different courses of an institute.
Example:
import numpy as np
import matplotlib.pyplot as pit
# creating the dataset
data= {'C':20, 'C++':15, 'Java' :30,
'Python':35}
courses= list(data.keys())
values= list(data .values())
fig= plt.figure(figsize = (10, 5))
# creating the bar plot
106 Unit-5 Data Visualization using dataframe
plt.bar(courses, values, color ='maroon',
width= 0.4)
plt.xlabel("Courses offered")
plt.ylabel("No. of students enrolled")
plt.title("Students enrolled in different courses")
pit.show()
Output:
Here plt.bar(courses, values, color='maroon') is used to specify that the bar chart is
to be plotted by using the courses column as the X-axis, and the values as the Y-axis.
The color attribute is used to set the color of the bars(maroon in this case).
plt.xlabel("Courses offered") and plt.ylabel("students enrolled") are used to label
the corresponding axes.
pit.title() is used to make a title for the graph.
pit.show() is used to show the graph as output using the previous commands.
Pie chart in Matplotlib
A Pie Chart can only display one series of data. Pie charts show the size of items
(called wedge) in one data series, proportional to the sum of the items. The data
points in a pie chart are shown as a percentage of the whole pie.
Matplotlib API has a pie() function that generates a pie diagram representing data in
an array. The fractional area of each wedge is given by x/sum(x). If sum(x)< 1, then the
values of x give the fractional area directly and the array will not be normalized. The
resulting pie will have an empty wedge of size 1 - sum(x).
Example:
pie chart showing the percentage of employees in each department of a company.
import matplotlib.pyplot as pit
# Data to plot
labels = 'Account', 'Technical', 'Sales', 'Purchase'
noOfEmp = [7, 22, 20, 15]
rzation using dataframe 107
Unit-5 Data Visua ,
- n' 'lightcoral', 'lightskyblue']
- ' Id' 'yellowgree ,
colors - Igo ' O) # explode 1st slice
0
exp lode= (0.1, O, '
# Plot. ( OfEmp explode-exp
_ Iode , labels=labels, colors=colors, autopct='%1.lf%%',
pit.pie no ,
sha doW--True, startangle=140)
plt.axis('equal')
pit.show()
output:
Purchase
Sales
Save Plot as a File
Matplotlib is a library in python that offers a number of plotting options to display
your data. The plots created get displayed when you use pit.show() but you cannot
access them later since they're not saved on disk.
To save a figure created with matplotlib, you can use pyplot's savefig() function. This
way, you'll have the plots saved on disk for further use instead of having to plot them
all over again.
Syntax:
import matplotlib.pyplot as pit
plt.savefig("filename.png")
Pass the path where you want the image to be saved. The savefig() function also
comes with a number of additional parameters to further customize how your image
gets saved.
108 Unit-5 Data Visualization using dataframe
Save a plot to an image file:
Examples:
# NBA championship counts
players= ['Kobe Bryant', 'LeBron James', 'Michael Jordan', 'Larry Bird']
titles= [5,4,6,3]
# plot a bar chart
plt.bar(players, titles)
# add y-axis label
plt.ylabel("Rings")
# add chart title
plt.title("Championship Victories of NBA greats")
# save the plot as a PNG image
plt.savefig("NBA_bar_chart.png")
This saves the bar chart as a PNG file with the name NBA_bar_chart.png to the current
directory. You can specify the path and name of your image as per your needs. This is
how the saved plot looks on opening it with an image viewer application:
- 0 X
Save plot as a PDF:
Depending on the filename provided plt.savefig() infers the format of the output file.
For instance, if you want to save the above image as a PDF file," just use the
appropriate file name:
unit-5 Data Visualization using dataframe
109
Example:
import matplotlib.pyplot as pit
# NBA championship counts
players= ['Kobe Bryant', 'Le Bron James', 'Michael Jordan ', 'Larry Bird')
titles = [5,4,6,3]
# plot a bar chart
plt.bar(players, titles)
# add y-axis label
plt.ylabel("Rings")
# add chart title
plt.title("Championship Victories of NBA greats")
# save the plot as a PDF
plt.savefig("NBA_bar_chart_doc.pdf")
The above code saves the plot as a PDF file with the name NBA_bar_chart_doc.pdf to
the current directory. This is how the saved images looks like on opening it in Google
Chrome web browser:
Kr Bryant LeBron James. Michael Jordan Larry Bird