UNIT – 1
DATA HANDLING USING PANDAS – I
DATA VISUALIZATION
What is Data Visualization?
"A picture is worth a thousand words“
Data visualization refers to the graphical or visual representation of information
and data using visual elements like charts, graphs, maps, etc.
Data visualization plays an essential role in the representation of both small
and large-scale data.
It especially applies when trying to explain the analysis of increasingly large
datasets.
Data visualization is the discipline of trying to expose the data to understand it
by placing it in a visual context.
Its main goal is to distil large datasets into visual graphics to allow for easy
understanding of complex relationships within the data.
Several data visualization libraries are available in Python, namely Matplotlib,
Seaborn, and Folium etc.
Purpose of Data Visualization
Better analysis
Quick action
Identifying patterns
Finding errors
Understanding the story
Exploring business insights
Grasping the Latest Trends
Plotting library
Matplotlib is the whole python package/ library used to create 2D graphs and plots
by using python scripts.
Matplotlib is the most popular plotting library for Python.
Pyplot is a module / interface in matplotlib, which supports a very wide variety of
graphs and plots namely - histogram, bar charts, line chart, pie chart, etc.
Pyplot is a collection of methods within Matplotlib library which allows user to
construct 2D plots easily.
Matplotlib – Pyplot features
Following features are provided in Matplotlib library for data visualization.
Drawing – plots can be drawn based on passed data through specific
functions.
Customization – plots can be customized as per requirement after specifying it
in the arguments of the functions. Like color, style (dashed, dotted), width;
adding label, title, and legend in plots can be customized.
Saving – After drawing and customization plots can be saved for future use.
How to plot in Matplotlib?
Steps to plot in matplotlib:
Install matplotlib by pip command - pip install matplotlib in command prompt.
Create a .py & import matplotlib library in it using following statement:
import matplotlib.pyplot as plt
Set data points in plot() method of plt object.
Customize plot through changing different parameters.
Call the show() method to display plot.
Save the plot/graph if required.
Types of plot using Matplotlib
LINE PLOT
BAR GRAPH
HISTOGRAM
1) LINE PLOT
Line plot/chart/graph is a type of plot which displays information as a series of
data points called “markers” connected by straight lines.
A line plot/chart is a graph that shows the frequency of data occurring along
a number line.
This type of plot is often used to visualize a trend in data over intervals of time –
a time series.
The line plot is represented by a series of datapoints connected with a straight
line.
Generally line plots are used to display trends over time. A line plot or line
graph can be created using the plot() function available in pyplot library.
We can, not only just plot a line but we can explicitly define the grid, the x and
y axis scale and labels, title and display options etc.
In order to draw a line plot, the steps to be followed are as under:
Steps:
1. Importing Matplotlib.
2. plt.plot(x,y,color,others) --- Plot y versus x as lines and/or markers.
3. plt.xlabel(“Your Text”) --- Set the X-axis label of the current axes.
4. plt.ylabel(“Your Text”) --- Set the Y-axis label of the current axes.
5. plt.title(“Your Title”) --- Set a title of the current axes.
6. plt.show() --- Display a figure.
Example 1: To plot a simple line chart using two lists.
OUTPUT:
Example 2: To add legends, titles and labels to a line plot.
Chart Title
OUTPUT:
Y axis label
X axis label
Multiple Line Charts:
Example 3: To add legends, titles and labels to a line plot with multiple lines.
Legend
OUTPUT:
Different parameters used to customize the Line plot:
1. marker – A marker is any symbol that represents a data value in a line plot or a scatter plot.
Marker Description
‘.’ Point marker
‘,’ Pixel marker
‘o’ Circle marker
‘+’ Plus marker
‘x’ X marker
‘D’, ‘d’ Diamond marker, Thin diamond marker
‘s’ Square marker
‘p’ Pentagon marker
‘*’ Star marker
‘h’,’H’ Hexagon1 marker, Hexagon2 marker
‘1’, ‘2’, ‘3’, ‘4’ Tri_down, tri_up, tri_left, tri_right markers
‘v’, ‘^’,’<‘,’>’ Triangle_down, triangle_up, triangle_left, triangle_right
markers
‘|’, ‘_’ Vline, hline markers
2. color – A graph can be customized by changing the colour of the plotted data.
character colour
‘b’ blue
‘g’ green 4. Markersize – in digits
‘r’ red
5. linewidth – in points
‘m’ magenta
‘y’ yellow
‘k’ black
‘c’ cyan
‘w’ white
3. linestyle –
linestyle colour
Solid -------
dashed ----
‘dotted’ …………
dashdot -.-.-.-.-.-.
Example 4: A python program to plot a line chart based on the given data to depict
the changing weekly average temperature in Delhi for four weeks by customizing
with all the parameters.
Week=[1,2,3,4]
Avg_week_temp=[40,42,38,44]
Example 4: Only marker without line
Example 5: plot a multiline chart using DataFrame df.plot()
Example 6: plot a multiline chart using csv file
Try this out:
Example 7: The following are the runs scored by a team in the first 5 overs:
Draw the line graph for the above data using matplotlib.
Example 8: Draw the line chart using matplotlib for Ice Cream Sales of an ice
cream parlour.
Example 9:
Draw a line graph using matplotlib to show how the height of the plant increased.
Try this out:
Example 10:
Consider the following graph. Write the Python code to plot it. Also add the Title,
label for X and Y axis.
2) BAR PLOT/CHART
A bar chart/bar graph is a very commonly-used two-dimensional data
visualization made up of rectangular bars.
A bar chart represents categorical data with rectangular bars.
Each bar has a height which corresponds to the value it represents.
It can also be used with two data series.
The bars can be plotted vertically or horizontally.
Other characteristics can also be configured for the chart, like width of the
bars, colour, etc., among others.
To make a bar chart with matplotlib, bar() function can be used.
Anatomy of Bar chart:
Example 1: To plot a simple bar chart with orange in colour.
OUTPUT:
Example 2: To plot a simple bar chart with different bar colours.
OUTPUT:
Example 3: To plot a bar chart horizontally.
OUTPUT:
Multiple Bar Graph:
Example 4: To plot multiple bar charts to visualize the “MelaSales.csv” file with
column Day on x axis as shown in output
USING CSV FILE
OUTPUT:
USING DATAFRAME
Try this out:
Example 5:
Write a Python code to display a bar chart of the popularity of programming
Languages. Sample data:
Programming languages: Python, Java, PHP, JavaScript
Popularity: 8.6, 8, 7.8, 6.4
Example 6:
Write a Python code to display a bar chart of the number of the students in a class.
Sample data:
Class: I,II,III,IV,V,VI,VII,VIII,IX,X
Strengths: 40,43,45,47,49,38,50,37,43,39
Example 7:
Write a Python code to draw a bar graph representing the total sales in each
quarter. Add suitable Title, Label for X-axis & Y-axis.
Use following data for plotting the graph:
sales=[450,300,500,650]
qtr=[‘QTR1’,’Qtr2’,’Qtr3’,’Qtr4’]
Try this out:
Example 8:
Write suitable Python code to create 'Favourite Hobby' Bar Chart as shown below:
Also give suitable python statement to save this chart.
3) HISTOGRAMS
Histograms are column-charts, where each column represents a range of values, and
the height of a column corresponds to how many values are in that range.
Histogram charts are a graphical display of frequencies, represented as bars.
Basically, histograms are used to represent the given data in the form of some groups.
X-axis is about bin ranges where Y-axis talks about frequency.
It is an accurate graphical representation of the distribution of numerical data.
It is similar to a vertical bar graph but without gaps between the bars.
Histograms are a great way to show results of continuous data, such as: weight,
height, stock prices, waiting time for a customer, etc.
The hist() function of PyPlot module is used to create and plot histogram from a given
sequence of numbers.
Histograms show what portion of the dataset falls into each category, usually
specified as non-overlapping intervals called bins.
It plots group up values into bins of values. By default, hist() uses a bin value of 10 (so
only ten categories, or bars, are computed).
Bins can be customized:
Either by passing an additional parameter, for example, hist(y, <bins>)
Or using the bin keyword argument as hist(y, bin=<bins>)
Components of Histogram:
Components of a histogram plot constitute:
Title: To display heading of the histogram.
Colour: To show the colour of the bar.
Axis: Y-axis and X-axis
Data: The data can be represented as an array.
Width of bars: The width of the bar is called bin or intervals.
Border colour: To display border colour of the bar.
Example 1: To plot a histogram for displaying age of employees in a
particular range.
We can collect the age of each employee in an office and show it in the form of a
histogram to know how many employees are there in the range 10-20 years, 20-30 years
and so on.
OUTPUT:
Example 2:
OUTPUT:
Histtype –
• bar
• barstacked
• step
• stepfilled
Example 3: plot a Histogram using DataFrame
Example 4:
Draw the histogram for
population_ages =
[22, 55, 62, 45, 21, 22,34,42,42,4,99,102,110,120,121,122,130,111,115,112,80,75,65,54
,44,43,42,48] and
bins = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110,120,130]through python program
with suitable title and label.
Example 5:
The heights of 10 students of eighth grade are given below:
Height_cms=[145,141,142,142,143,144,141,140,143,144]
Write suitable Python code to generate a histogram based on the given data,
along with an appropriate chart title and both axis labels.
Also give suitable python statement to save this chart.
HOW TO SAVE PLOT
The active figure of graph/plot/chart can be saved to file for future.
To save any plot savefig() method is used.
plots can be saved like pdf, svg, png, jpg file formats.
plt.savefig('line_plot.pdf')
plt.savefig('line_plot.svg')
plt.savefig('line_plot.png’)
HOW TO SAVE PLOT
Example:
Write a Python code to plot a line graph using matplotlib, with x-axis representing the
years (2010-2015) and y-axis representing the corresponding population (in millions) of
a city: 2, 2.5, 3, 3.5, 4, 4.5.
OUTPUT:
Questions:
1. Write a Python code to plot a line graph to represent the distance covered by a car over a period of 5
hours, with the x-axis representing time (in hours) and the y-axis representing distance (in kilometers).
Data: Time (hours) - 0, 1, 2, 3, 4, 5
Distance (km) - 0, 30, 60, 90, 120, 150
2. Create a line graph using matplotlib to show the temperature (in Celsius) of a city over a period of 7
days (Monday to Sunday).
Data: Day - Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday
Temperature (C) - 20, 22, 25, 28, 30, 32, 35
3. Using matplotlib, plot a stacked bar graph to represent the number of boys and girls in each of the five
classes (XI-XII) in a school.
Data: Class - XI, XI, XII, XII Boys - 50, 60, 70, 80 Girls - 40, 50, 60, 70
4. Plot a histogram using matplotlib to represent the distribution of heights (in cm) of 25 students in a class,
with bins of size 5 (150-155, 156-160, 161-165, ...).
Data: Height (cm) - 155, 160, 165, ..., 185 (25 students)
5. Create a histogram using matplotlib to represent the distribution of marks obtained by 30 students in a
class, with bins of size 10 (0-10, 11-20, 21-30, ...).
Data: Marks - 20, 30, 40, ..., 90 (30 students)
Also write suitable python statement to save all the above created charts in their respective
codes.