Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views16 pages

Unit 5

Unit 5 focuses on data visualization using data frames, emphasizing the importance of presenting data graphically for better analysis. It introduces the Matplotlib library in Python for creating various types of plots, including line charts, scatter plots, histograms, and bar charts, along with installation instructions and examples. The document provides detailed syntax and examples for each plot type, demonstrating how to visualize data effectively.

Uploaded by

netrak1707
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views16 pages

Unit 5

Unit 5 focuses on data visualization using data frames, emphasizing the importance of presenting data graphically for better analysis. It introduces the Matplotlib library in Python for creating various types of plots, including line charts, scatter plots, histograms, and bar charts, along with installation instructions and examples. The document provides detailed syntax and examples for each plot type, demonstrating how to visualize data effectively.

Uploaded by

netrak1707
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Unit-5 Data Visualization using dataframe 93

UNIT- 5
Data Visualization
using data frame
94 . t v·,sualization using dataframe
Umt-5 0a a

Data Visualization
· .
Data visualization is the technique to present t he data in a pictorial or graphical
format . It enables stakeholders and decision makers to analyze data visually. The data
·m a grap h'1ca I f ormat allows them to .1dent1•ty new t rends and patterns easily.

Matplotlib Python Libraries


matplotlib is a python two-dimensional plotting library for data visualization and
creating interactive graphics or plots. Using pythons matplotlib, the data visualization
of large and complex data becomes easy.
matplotlib.pyplot is a plotting library used for 2D graphics in python programming
language. It can be used in python scripts, shell, web application servers and other
graphical user interface toolkits.
matplotlib.pyplot is a collection of command style functions that make Matplotlib
work like MATLAB. Each Pyplot function makes some change to a figure. For example,
a function creates a figure, a plotting area in a figure, plots some lines in a plotting
area, decorates the plot with labels, etc.
install matplotlib
To use matplotlib, we need to install it.

Step 1 - Make sure Python and pip is preinstalled on your system


Type the following commands in the command prompt to check is python and pip is
installed on your system.
To check Python
python --version
If python is successfully installed, the version of python installed on your system will
be displayed.
To check pip
pip-V
The version of pip will be displayed, if it is successfully installed on your system.

Step 2 - Install Matplotlib


Matplotlib can be installed using pip. The following command is run in the command
prompt to install Matplotlib.
pip install matplotlib
This command will start downloading and installing packages related to the matplotlib
library. Once done, the message of successful installation will be displayed.

Step 3 - Check if it is installed successfully


To verify that matplotlib is successfully installed'on your system, execute the following
command in the command prompt. If matplotlib is successfully installed, the version
of matplotlib installed will be displayed.
untt-5 Data Visualization using dataframe 95

import matplotlib
matplotltb._version_
irnporting inatplotlib.pyplot

Most of the Matplotlib utilities lies under the pyplot submodule, and are usually
irnported under the pit alias :

import matplotlib.pyplot as pit

Now the Pyplot package can be referred to as pit.

Example

Draw a line in a diagram from position (0,0) to position (6,250):

import matplotlib.pyplot as pit


import numpy as np

xpoints = np.array([0, 6])


ypoints = np.array([0, 250])

plt.plot(xpoints, ypoints)
pit.show()
output:

250

200

150

100

50

0
p

Y.[llt·S Data Visualization using dataframe 97

output:

111

LS

10

Example:
. ,import numpy as np
import matplotlib.pyplot as pit

x = np.linspace(0, 10, 1000)


fig, ax = pit.subplots()

ax.plot(x, np.sin(x), '--b', label ='Sine')


ax.plot(x, np.cos(x), c ='r', label ='Cesine')
ax.axis('equal')

leg= ax.legend(loc ="lower left");

Output:

-2
--- 511,e
-3 - CDslne

0 6 10
98 Unit-5 Data Visualization using dataframe

Subplots
With the subplots() function you can draw multiple plots in one figure :

Example:

Draw 2 plots:

import matplotlib.pyplot as pit


import numpy as np

#plot 1:
x = np.array([0, 1, 2, 31)
Y = np .array([3, 8, 1, 10])
plt.subplot(l, 2, 1)
plt.plot(x,y)

#plot 2:
x = np.array([0, 1, 2, 3]}
Y = np.array([lO, 20, 30, 40])
plt.subplot(l, 2, 2)
plt.plot(x, y)
pit.show()

Output:

10
Unlt-5 Data Visualization using dataframe
99

scatter Plot in Matplotlib

scatter plots are great for visualizing data points in two dimensions. They' re
particularly useful for showing correlations and groupings in data.

In matplotlib, you can create a scatter plot using the pyplot's scatter() function . The
following is the syntax:

plt.scatter(x_values, y_values)

Here, x_values are the values to be plotted on the x-axis and y_values are the values
to be plotted on the y-axis.

Examples:

We have the data for heights and weights of 10 students at a university and want to
plot a scatter plot of the distribution between them. The data is present in two lists.
One having the height and the other having the corresponding weights of each
student.

import matplotlib.pyplot as pit

# height and weight data

height= [167, 175, 170, 186, 190, 188,158,169, 183, 180]

weight= [65, 70, 72, 80, 86, 94, SO, 58, 78, 85]

# plot a scatter plot with star markers

plt.scatter(weight, height, marker='*', s=80)

# set axis lables

plt .xlabel("Weight (Kg)")

plt.ylabel("Height (cm)")

# set chart title

plt .title("Height v/s Weight")

pit.show()
100 Unit-5 Data Visualization using dataframe

Output:

Height V/S Weight


190

* *
185

180
*
* *
Ius
...
l!
*
* * *
1170

155

160

*
so
"' JI)
-llKQI
,0 ,0

You can alter the shape of the marker with the marker parameter and size of the
marker with the s parameter of the scatter() function . Matplotlib's pyplot has handy
functions to add axis labels and title to your chart.

Line chart in Matplotlib


A line chart or line graph is a type of chart which displays information as a series of
data points called 'markers' connected by straight line segments.

Line graphs are usually used to find relationship between two data sets on different
axis; for instance X, Y.

Line charts are used to represent the relation between two data X and Y on a different
axis. In matplotlib, you can plot a line chart using pyplot's plot() function . The
following is the syntax to plot a line chart:

plt.plot(x_values, y_values)

Here, x_values are the values to be plotted on the x-axis and y_values are the values
to be plotted on the y-axis.

Example:
import matplotlib.pyplot as pit

# number of employees of A
emp_count = (3, 20, 50,200,350,400]
year=[2014, 2015,2016, 2017, 2018,2019]

# plot a line chart


plt.plot{year, emp_count)
pit.show{)
Unlt-5 Data Visualization using dataframe 101

output:

2014 2015 2016 2017 2018 2019

You can see in the above chart that we have the year on the x-axis and the employee
count on the y-axis. The chart shows an upward trend in the employee count at the
company A year on year.
Matplotlib's pyplot comes with handy functions to set the axis labels and chart title.
You can use pyplot's xlabel() and ylabel() functions to set axis labels and use pyplot's
title() function to set the title for your chart.

Plot multiple lines in a single chart

Matplotlib also allows you to plot multiple lines in the same chart. Generally used to
show lines that share the same axis, for example, lines sharing the x-axis. The y-axis
can also be shared if the second series has the same scale, or if the scales are different
you can also plot on two different y-axes. Let's look at examples for both cases.

Example:
import matplotlib.pyplot as pit
# number of employees
emp_countA = (3, 20, 50, 200, 350, 400]
emp_countB = [250, 300, 325, 380, 320, 350]
year=[2014,2015,2016,2017,2018,2019]

# plot two lines


plt.plot(year, emp_countA, 'o-g')
plt.plot(year, emp_countB, 'o-b')
# set axis titles
plt.xlabel("Year")
plt.ylabel("Employees")
# set chart title
plt.title("Employee Growth")
#legend
plt.legend(['A', 'B'])
pit.show()
102
Unit-5 Data Visualization using dataframe

Output:
_ _ _...:E::.:m:,::pl:oyc::
::.! ••:..:G::.:ro:::wt::.:.h_ _ _- - ,
400 1
350

i:
J:)()

l I 100
so

2014 2011 2016 2017 2018 2019

Both lines share the same axes. Also note, that we added a legend to easily distinguish
between the two companies.
With Pyplot, you can use the xlabel() and ylabel() funct ions to set a label for the x- and
y-axis.

Plot Histogram in Matplotlib

Histograms show the frequency distribution of values of a variable across different


buckets. They are great for visualizing the distribution of a variable.
A histogram is basically used to represent data provided in a form of some groups. It is
accurate method for the graphical representation of numerical data distribution. It is a
type of bar plot where X-axis represents the bin ranges while Y-axis gives information
about frequency.

Creating a Histogram
To create a histogram the first step is to create bin of the ranges, then distribute the
whole range of the values into a series of intervals, and the count the values which fall
into each of the intervals. Bins are clearly identified as consecutive, non-overlapping
intervals of variables.
The matplotlib.pyplot.hist() function is used to compute and create histogram of x.
The following table shows the parameters accepted by matplotlib.pyplot.hist()
function:

Attribute parameter

X array or sequence of array


bins optional parameter contains integer or sequence or strings
density optional parameter contains boolean values
range optional parameter represents upper and lower range of bins
unit-5 Data Visualization using dataframe 103
histtype optional parameter used to create type of histogram (bar, barstacked,
step, stepfilled], default is "bar"
align optional parameter controls the plotting of histogram [leh, right, mid]
weights optional parameter contains array of weights having same dimensions
as x

bottom location of the baseline of each bin


rwidth
optional parameter which is relative width of the bars with respect to
bin width
color
optional parameter used to set color or sequence of color specs
label
optional parameter string or sequence of string to match with multiple
datasets

log
optional parameter used to set histogram axis on log scale

Let's create a basic histogram of some random values. Below code creates a simple
histogram of some random values:

from matplotlib import pyplot as pit


import numpy as np

# Creating dataset
a= np.array((22, 87, 5, 43, 56,
73, 55, 54, 11,
20, 51, 5, 79, 31,
27])

# Creating histogram
fig, ax= plt.subplots(figsize =(10, 7))
ax.hist(a, bins= [0, 25, 50, 75, 100])
# Show plot
pit.show()
104 Unit-S Data Visualization using dataframe

Output:

You can also specify your own bin edges which can be unequally spaced . For this,
instead of passing an integer to the bins parameter, pass a sequence with the bin
edges. For example, if you want to have bins o to 20, 20 to 50, 50 to 70, 70 to 90, and
90to 100 :

Example:

import matplotlib.pyplot as pit

#scores in the Math class


math_scores = [72, 41, 65, 63, 82, 63, 51, 57, 39, 63,62, 68, 52, 76, 62, 73, 72,
73, 71, 62,76, 53, 71, 79, 77, 35, 65, 59, 58, 70,73, 69, 59, 75, 73, 63, 65, 81, 46,
59,53, 71, 79,80,60,60,64,40, 73, 75,68,58,81,65,55,62,82,47,85,62,
39, 77, 82, 78, 57, 58, 72, 75, 65, 68, 86, 49, 39, 64, 54, 68, 85, 77, 62, 53,52,
76,80,84,69,61,69,65,89,97, 71,61, 77,40,83,52, 78,54,64,58]

# specify the bin edges


bin_edges = [0,20,50,70,90,100]

# plot histogram
plt.hist(math_scores, bins=bin_edges)
# add formatting
plt.xlabel("Marks in Math")
plt.ylabel("Students")
plt.title("Histogram of scores in the Math class")
pit.show()
Output:
unit-5 Data Visualization using dataframe 105

Histogram of scores in the Math cl1ss


50

r 20

10

20 40 60 100
Man:., In Mith

Here, the bins are unequally spaced because of the bin edges specified. Matplotlib's
hist() function also has a number of other parameters to customize your plots even
further.

Bar Plot in Matplotlib

A bar plot or bar chart is a graph that represents the category of data with
rectangular bars with lengths and heights that is proportional to the values which
they represent. The bar plots can be plotted horizontally or vertically. A bar chart
describes the comparisons between the discrete categories. One of the axis of the
plot represents the specific categories being compared, while the other axis
represents the measured values corresponding to those categories. ,,, ,

The syntax of the bar() function to be used with the axes is as follows:-
plt.bar(x, height, width, bottom, align)
The function creates a bar plot bounded with a rectangle depending on the given
parameters. Following is a simple example of the bar plot, which represents the
number of students enrolled in different courses of an institute.
Example:
import numpy as np
import matplotlib.pyplot as pit

# creating the dataset


data= {'C':20, 'C++':15, 'Java' :30,
'Python':35}
courses= list(data.keys())
values= list(data .values())

fig= plt.figure(figsize = (10, 5))


# creating the bar plot
106 Unit-5 Data Visualization using dataframe

plt.bar(courses, values, color ='maroon',


width= 0.4)

plt.xlabel("Courses offered")
plt.ylabel("No. of students enrolled")
plt.title("Students enrolled in different courses")
pit.show()

Output:

Here plt.bar(courses, values, color='maroon') is used to specify that the bar chart is
to be plotted by using the courses column as the X-axis, and the values as the Y-axis.
The color attribute is used to set the color of the bars(maroon in this case).
plt.xlabel("Courses offered") and plt.ylabel("students enrolled") are used to label
the corresponding axes.
pit.title() is used to make a title for the graph.
pit.show() is used to show the graph as output using the previous commands.

Pie chart in Matplotlib

A Pie Chart can only display one series of data. Pie charts show the size of items
(called wedge) in one data series, proportional to the sum of the items. The data
points in a pie chart are shown as a percentage of the whole pie.

Matplotlib API has a pie() function that generates a pie diagram representing data in
an array. The fractional area of each wedge is given by x/sum(x). If sum(x)< 1, then the
values of x give the fractional area directly and the array will not be normalized. The
resulting pie will have an empty wedge of size 1 - sum(x).
Example:
pie chart showing the percentage of employees in each department of a company.
import matplotlib.pyplot as pit

# Data to plot
labels = 'Account', 'Technical', 'Sales', 'Purchase'
noOfEmp = [7, 22, 20, 15]
rzation using dataframe 107
Unit-5 Data Visua ,
- n' 'lightcoral', 'lightskyblue']
- ' Id' 'yellowgree ,
colors - Igo ' O) # explode 1st slice
0
exp lode= (0.1, O, '
# Plot. ( OfEmp explode-exp
_ Iode , labels=labels, colors=colors, autopct='%1.lf%%',
pit.pie no ,
sha doW--True, startangle=140)
plt.axis('equal')
pit.show()

output:
Purchase

Sales

Save Plot as a File

Matplotlib is a library in python that offers a number of plotting options to display


your data. The plots created get displayed when you use pit.show() but you cannot
access them later since they're not saved on disk.
To save a figure created with matplotlib, you can use pyplot's savefig() function. This
way, you'll have the plots saved on disk for further use instead of having to plot them
all over again.
Syntax:
import matplotlib.pyplot as pit
plt.savefig("filename.png")

Pass the path where you want the image to be saved. The savefig() function also
comes with a number of additional parameters to further customize how your image
gets saved.
108 Unit-5 Data Visualization using dataframe

Save a plot to an image file:

Examples:

# NBA championship counts


players= ['Kobe Bryant', 'LeBron James', 'Michael Jordan', 'Larry Bird']
titles= [5,4,6,3]

# plot a bar chart


plt.bar(players, titles)
# add y-axis label
plt.ylabel("Rings")
# add chart title
plt.title("Championship Victories of NBA greats")

# save the plot as a PNG image


plt.savefig("NBA_bar_chart.png")

This saves the bar chart as a PNG file with the name NBA_bar_chart.png to the current
directory. You can specify the path and name of your image as per your needs. This is
how the saved plot looks on opening it with an image viewer application:

- 0 X

Save plot as a PDF:

Depending on the filename provided plt.savefig() infers the format of the output file.
For instance, if you want to save the above image as a PDF file," just use the
appropriate file name:
unit-5 Data Visualization using dataframe
109

Example:

import matplotlib.pyplot as pit

# NBA championship counts


players= ['Kobe Bryant', 'Le Bron James', 'Michael Jordan ', 'Larry Bird')
titles = [5,4,6,3]

# plot a bar chart


plt.bar(players, titles)
# add y-axis label
plt.ylabel("Rings")
# add chart title
plt.title("Championship Victories of NBA greats")

# save the plot as a PDF


plt.savefig("NBA_bar_chart_doc.pdf")

The above code saves the plot as a PDF file with the name NBA_bar_chart_doc.pdf to
the current directory. This is how the saved images looks like on opening it in Google
Chrome web browser:

Kr Bryant LeBron James. Michael Jordan Larry Bird

You might also like