Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
14 views66 pages

Day 5 Session 1 Visualization I

A comprehensive guide for learning R programming language

Uploaded by

codedrive51
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views66 pages

Day 5 Session 1 Visualization I

A comprehensive guide for learning R programming language

Uploaded by

codedrive51
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 66

DATA VISUALIZATION I

Day 5
Session 1

1
What will we learn

Grammar of Graphics
Plotting Systems in R
What is ggplot2?
Bar Plot
Pie Chart
Box-Whisker Plot
Histogram

2
About Data Visualization

What is Data Visualisation?


It is the visual representation of data generally in the form of graphs and
plots

Why is it important?

•It enables us to see the data and get insights in one glance
•Allows us to grasp difficult/ complex data in an easy manner
•Helps us to identify patterns or trends easily

3
Principles of Visualization

Show distribution (overall and by groups)


Show correlation and causality
Show multivariate data; real world is complex
Integration of evidence
Describe and document evidence with appropriate labels,
sources, scales, etc.
Content is King

4
Application Areas

Sensex Charts Sales Charts

Sales Charts Survey Results


5
What is ggplot2?
An implementation of Grammar of Graphics by Leland Wilkinson

Written by Hadley Wickham (while he was a graduate student of Iowa State)

A “third” graphics system for R (along with Base and Lattice)


Available from CRAN via
 install.packages("ggplot2")
 library(ggplot2)

Website: http://ggplot2.org (better documentation)

# R base package and lattice package also provide rich graphics

6
What is ggplot2?

Grammar of Graphics represents an abstraction of graphics ideas/objects

Think “verb”, “noun”, “adjective” for graphics

Allows for a “theory” of graphics on which to build new graphics and


graphics objects

Shorten the distance from mind to page

Plots are made up of aesthetics (size, shape, color) and geoms (points, lines)

7
Import Telecom Data Sets

#Import two data Sets

demographic<-read.csv(file.choose(), header=TRUE) head(demographic)

transaction<-read.csv(file.choose(), header=TRUE)
head(transaction)

8
Data Snapshots

Demographic Transactions

9
Aggregate and Merge

#Aggregating and Merging

tcalls<-aggregate(Calls~CustID, data=transaction, FUN=sum)


head(tcalls)

working<-merge(demographic, tcalls, by=("CustID"), all=TRUE)


head(working)

working$age_group<-cut(working$Age, breaks=c(0,30,45,Inf), labels=c("18-


30","30-45",">45"))
head(working)

10
Simple Bar Chart

A Bar Chart is the simplest and basic form of graph.


In this graph, for each data item, we simply draw a ‘bar’ showing its value

Simple Bar Chart: It is a type of chart which shows the values of different categories of
data as rectangular bars with different lengths.
The values are generally :
- Frequency
- Mean
- Totals
- Percentages

11
Simple Bar Chart

# Simple bar chart of count of customers by age group

ggplot(working , aes ( x = age_group)) + geom_bar()

ggplot() is a function in ggplot2 which yields different types of


plots
working is the data to be used

aes() specifies the variables to be used on each axis

geom_bar() is used to call shapes and colors

12
geom_bar() transforms the data with count stat which returns a data
set of age_group values and count

13
Simple Bar Chart
 ggplot(working , aes ( x = age_group)) +
geom_bar()

14
Simple Bar Chart..

#Bar chart showing age groups on x axis and total calls on y axis

ggplot(working,aes(x=age_group,y=Calls))+
geom_bar(stat="identity",fill="green")+
labs(x="Age Groups",y="Total Calls",title="Bar Diagram")

ggplot() is a function in ggplot2 which yields different types of


plots
working is the data to be used
aes() specifies the variables to be used on each axis
geom_bar() is used to call shapes and colors
stat="identity" using the height of the bar will represent the values in a
column of the data frame
labs() Used to give labels/titles

15
Simple Bar Chart
R Output

#Bar chart showing total calls for customers in each age group

16
Simple Bar Chart
Change Order of the Bars

# Order bars as per value

ggplot(working,aes(reorder(age_group,Calls),Calls))+
geom_bar(stat="identity",fill="green")+
labs(x="Age Groups",y="Total Calls", title="Bar Diagram")

reorder() orders levels of a factor variable (First argument) by values of a second


variable, usually numeric (Second argument)

17
Simple Bar Chart
R Output

#Bar chart (ordered by value)

18
Simple Bar Chart
Some modifications

# Replace Age Groups by Gender and use maroon color

ggplot(working,aes(x=Gender,y=Calls))+
geom_bar(stat="identity",fill="maroon")+
labs(x= "Gender" ,y= "Total Calls",title="Bar Diagram")

Note the format of writing the commands. It is intuitive by adding a single


part to the graph each time with a “+” sign

19
Simple Bar Chart
R Output

#Bar chart showing total calls for Males and Females (and in maroon
color)

20
Simple Bar Chart
Horizontal View

#Make bar graph horizontally oriented

ggplot(working,aes(x=age_group,y=Calls))+
geom_bar(stat="identity",fill="orange")+
labs(x="Age Groups",y="Total Calls",title="Bar Diagram")+
coord_flip()

# coord_flip makes bar graph horizontal

21
Simple Bar Chart
R Output

#Horizontal Bar Plot by Age Group

22
Simple Bar Chart
Modify horizontal view

#Obtain horizontal Bar chart with Gender on x axis

ggplot(working,aes(x=Gender,y=Calls))+
geom_bar(stat="identity",fill="seagreen")+
labs(x= "Gender",y="Total Calls",title="Bar Diagram") +
coord_flip()

23
Simple Bar Chart
R Output

#Try with Gender & horizontal Bar Plot(R code next slide)

24
Simple Bar Chart
Show Text

#Obtain bar chart showing average total calls by Gender and print values
on the chart

avgcalls <- aggregate (Calls~Gender,data=working,FUN=mean)

ggplot(avgcalls,aes(x=Gender,y=Calls))+
geom_bar(stat="identity",fill="darkblue")+
geom_text(aes(label=round(Calls)), vjust=0, colour="black",
size=4)+
labs(x="Gender",y="Average Calls",
title="Bar Diagram with Text")
# geom_text adds labels to the chart

25
Simple Bar Chart
Show Text R Output

#Average total calls by Gender

26
Simple Bar Chart

#Total calls v/s Average total calls by Gender

27
Stacked Bar Chart

#Stack the plot with Gender

ggplot(working, aes(x=age_group))+
geom_bar(aes(fill=Gender))+
labs(x = "Age Group", y="No. of customers", title="Stacked bar
chart")
#Normalized Bars

Add position=“fill” to geom_bar() to produce stacked bar with normalized height


geom_bar(aes(fill=Gender),position=“fill”)

28
Stacked Bar Chart
R Output

#Stack the plot with Gender

29
Stacked Bar Chart
Some modifications

#Stack the plot with Gender and Add your colours

ggplot(working, aes(x=age_group))+
geom_bar(aes(fill=Gender))+
labs(x="Age Group", y="No. of customers", title="Stacked bar
chart")+
scale_fill_manual(values=c("red","green"))

#Change size and style of the plot title, add (with a ‘+’ sign):

theme(plot.title = element_text(size = 20, face = "bold"))

30
Stacked Bar Chart
R Output

#Stack the plot with Gender and Add your colours and customise title
size and style

31
Stacked Bar Chart
Horizontal View

#Try with flipping the coordinate and stacking with age group on
Gender

ggplot(working, aes(x=Gender))+
geom_bar(aes(fill=age_group))+
labs(x="Age Group", y="No. of customers", title="Stacked bar
chart")+
coord_flip()

32
Stacked Bar Chart
R Output

#Try with flipping the coordinate and stacking with age group on Gender

33
Multiple Bar Chart

#Multiple bars, side by side

ggplot(working, aes(x=age_group))+
geom_bar(aes(fill=Gender), position="dodge")+
labs(x="Age Group", y="No. of customers", title="Multiple bar
chart")

34
Multiple Bar Chart
R Output

#Multiple bars, side by side

35
Multiple Bar Chart
Reordering Bars

#Levels of a factor variable


levels(working$age_group)

[1] "18-30" "30-45" ">45"

ggplot(working,
aes(x=factor(working$age_group,levels=sort(unique(working$age_group),de
creasing=TRUE)), fill=Gender))+
geom_bar(position="dodge")+
labs(x="Age Group", y="No. of customers", title="Multiple
bar chart (Reordered)")

36
Multiple Bar Chart
Reordering Bars R Output

#Multiple bars, age_group reordered

37
Multiple Bar Chart
Horizontal View

#Try with total calls on y-axis and flipped coordinate

ggplot(working, aes(x=age_group,y=Calls))+
geom_bar(stat="identity",aes(fill=Gender),
position="dodge")+
labs(x="Age Group", y="Total calls", title="Multiple bar chart")
+
coord_flip()

38
Multiple Bar Chart
R Output

#Try with total calls on y-axis and flipped coordinate

39
Multiple Bar Chart

#Another variation: 'fill=' is with ggplot() instead of geom_bar()

ggplot(working, aes(x=age_group,fill=Gender))+
geom_bar(position="dodge")+
labs(x="Age Group", y="No. of customers", title="Multiple bar
chart")

40
Multiple Bar Chart

#Another variation: 'fill=' is with ggplot() instead of geom_bar()

41
Caution with ‘fill’

Caution: fill="blue" in geom_bar() overrides the fill=Gender in ggplot()

ggplot(working, aes(x=age_group,fill=Gender))+
geom_bar(position="dodge", fill="blue")+
labs(x="Age Group", y="No. of customers", title="Bar chart")

42
Caution with ‘fill’…
R Output

Caution: fill="blue" in geom_bar() overrides the fill=Gender in ggplot()

43
Pie Chart

Pie charts are generally used to show percentage or proportional data.


In this graph the entire circle (pie) is sliced proportional to the values of each
category.

44
Pie Chart
#Pie Chart showing distribution of Age Group

Original Dataframe

pie_table <- table(working$age_group)

pie_dataframe <- as.data.frame(pie_table)

45
Pie Chart
#Making required changes to dataframe
names(pie_dataframe)[1] <- "Age_group"
names(pie_dataframe)[2] <- "Count“

#Calculating Percentage
pct <- round(pie_dataframe$Count/sum(pie_dataframe$Count)*100)

pie_dataframe$group_percent <- paste0(pie_dataframe$Age_group," ",pct,"%")

46
Pie Chart
#Pie Chart showing distribution of Age Group

pie(pie_dataframe$Count,labels = pie_dataframe$group_percent,
col=rainbow(length(pie_dataframe$group_percent)),
main="Pie Chart of Age Groups")

47
Box Plot

Box and Whisker plot summarizes data graphically using 5 measures:


- Minimum
- The Three Quartiles : Q1, Q2 (i.e. Median) and Q3
- Maximum.

Advantages of a Box Plot :


- A boxplot is particularly effective when comparing two sets of data
- It shows us the shape of the data

48
Box Plot

Describing a Box-Plot :
- The rectangle (box) in the middle represents the middle 50% of the data
(between the values that are ¼ and ¾ of the way through the data).
- The lines (whiskers) extend from the box to the smallest and largest
values.
- The diagram also shows the middle value (i.e. The Median).
- The outliers which are plotted outside the plot (The observations which
are outside 1.5 times the interquartile range above the upper quartile
and below the lower quartile)

49
Box Plot

This plot shows that the distribution of total call is very much
symmetric
50
Box Plot

# Box plot for variable ‘Calls’

ggplot(working, aes(x="", y=Calls))+


geom_boxplot()+
labs( y="Total Calls", title="Boxplot")

51
Box Plot
R Output

# Box plot for variable ‘Calls’

52
Box Plot
By Age Group
# Box plot by Age group

ggplot(working, aes(x=age_group, y=Calls))+


geom_boxplot()+
labs(x="Age Group", y="Total Calls",
title="Boxplot")

53
Box Plot
By Age Group R Output

# Box plot by Age group

54
Box Plot By Age Group
Enhance the plot

#Box plot by Age group; colour the boxes & outlier

ggplot(working, aes(x=age_group, y=Calls))+


geom_boxplot(fill=5, outlier.colour="blue",
outlier.size=2.5)+
labs(x="Age Group", y="Total Calls",
title="Boxplot")

55
Box Plot By Age Group
Enhance the plot R Output

#By Age group; colour the boxes & outlier

56
Box Plot
Adding Gender Facet

#Adding Gender Facet

ggplot(working, aes(x=age_group, y=Calls))+


geom_boxplot(aes(fill=Gender))+
labs(x="Age Group", y="Total Calls",
title="Boxplot")

57
Box Plot
Adding Gender Facet R Output

#Adding Gender Facet

58
Box Plot
Horizontal View

# Box plot:Horizontal View

ggplot(working, aes(x=age_group, y=Calls))+


geom_boxplot(aes(fill=Gender))+
coord_flip()+
labs( y= “Total Calls", x="Age Group",
title="Boxplot")

59
Box Plot
Horizontal View: R Output

# Box plot: Horizontal View

60
Histogram

A Histogram is similar to a bar chart but is used to display continuous data.


Therefore we will use a continuous scale with no ‘gaps’ between the bars.
It is generally used to check the Normality of the data.

• This plot shows


that the
distribution of
Average Call
Time is very
much symmetric

61
Histogram

#Histogram for variable ‘Calls’

ggplot(working, aes(x=Calls))+
geom_histogram(binwidth=40, fill="maroon")+
labs(x=“Total Calls", y="No. of customers", title="Customer
usage")

62
Histogram
R Output

#Histogram for variable ‘Calls’

63
Make Plots Interactive

install.packages(“plotly")
library(plotly)

#Save ggplot as an object


interactive <- ggplot(working, aes(x=age_group,fill=Gender))+
geom_bar(position="dodge")+
labs(x="Age Group", y="No. of customers", title="Multiple bar chart")

ggplotly(interactive)

64
Make Plots Interactive

65
THANK YOU!!

66

You might also like