0% found this document useful (0 votes)

37 views27 pages

Week11 Slides

The document provides an introduction to ggplot2, a data visualization package in R, focusing on various geometrical objects such as bar plots and maps. It explains how to choose the appropriate plot type based on the data and communication goals, and covers the use of aesthetics and themes in visualizations. Additionally, it discusses the integration of ggplot2 with other packages for enhanced mapping capabilities and the importance of effective visual communication in data analysis.

Uploaded by

Tùng Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views27 pages

Week11 Slides

Uploaded by

Tùng Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

DSA2101

Essential Data Analytics Tools: Data Visualization

Yuting Huang

AY24/25

Week 11 Introduction to ggplot2

1 / 27
Re-cap: Choosing the right plot
There are many geoms available in the ggplot2 package.
The choice of which one to use largely depends on two questions:
▶ What are you trying to communicate?
▶ What type of variable(s) do you want to show?

Source: Adapted from John F. Ouyang.

2 / 27
Prerequisites
▶ ggplot2 is included in tidyverse.

library(tidyverse)

Artwork by Allison Horst

3 / 27
Outline

1. Aesthetics and geometrical objects

▶ Scatterplot
▶ Smoother line
▶ Histogram and density plot
▶ Line plot
▶ Text annotations
▶ Bar plot
▶ Maps
2. Miscellaneous tasks
▶ Themes
▶ Layouts
▶ Common layers

4 / 27
Bar plot

We use a bar plot to visualize categorical variables.

▶ geom_col() creates bars where the height directly represents
values in the data.
▶ geom_bar() creates bars based on the count of observations in
each group – the counts are obtained through an internal
aggregation.

Some of the aesthetics that these geom functions use are:

▶ x (required)
▶ y (not required by geom_bar())
▶ color
▶ fill
▶ width

5 / 27
Bar plot: geom_col()
Let’s continue working on the murders.csv data set.
▶ Here’s a bar chart on the number of states in each region.

murders <- read.csv("../data/murders.csv")

state_by_region <- murders %>% count(region)
ggplot(state_by_region, aes(x = region, y = n)) +
geom_col()

10
n

North Central Northeast South West

region

6 / 27
Bar plot: geom_bar()
Alternatively, we can use geom_bar() to visualize the data.
▶ The function automatically counts the number of observations for
each x value – there’s no need to summarize the data beforehand.

ggplot(murders, aes(x = region)) +

geom_bar()

10
count

North Central Northeast South West

region

7 / 27
▶ By default, geom_bar() uses stat = "count", which means to
count the number of observations in each group.
▶ To use the values directly from the data, set stat = "identity".

ggplot(state_by_region, aes(x = region, y = n)) +

geom_bar(stat = "identity")

10
n

North Central Northeast South West

region

8 / 27
Maps: geom_polygon()

Plotting geo-spatial data is a common visualization task.

▶ The simplest way to draw maps is to use geom_polygon().
▶ We will need the latitude and longitude of the boundaries for
different regions.
▶ For US states, we can obtain the data from the maps package.

# install.packages("maps")
library(maps)
us_states <- map_data("state")

9 / 27
Maps: geom_polygon()
In the object us_states, we have the following variables:
▶ lat and long specify the latitude and longitude of the corners of
a polygon.
▶ group provides a unique id for each region.
▶ order provides the drawing order of boundary points.

glimpse(us_states)

## Rows: 15,537
## Columns: 6
## $ long <dbl> -87.46201, -87.48493, -87.52503, -87.53076, -87.570
## $ lat <dbl> 30.38968, 30.37249, 30.37249, 30.33239, 30.32665, 3
## $ group <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
## $ order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
## $ region <chr> "alabama", "alabama", "alabama", "alabama", "alabam
## $ subregion <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,

10 / 27
Let’s first visualize the data using geom_point().
▶ Each row in the data is plotted as a single point – forming a
scatterplot that effectively shows the outline of every state.

ggplot(data = us_states, aes(x = long, y = lat)) +

geom_point(size = 0.25)

40
lat

−120 −100 −80

long

11 / 27
Maps: geom_polygon()
Now let’s turn this scatter plot into a map using geom_polygon().
▶ group specifies how to connect the coordinates into polygons.
▶ The order column is also used internally.

ggplot(data = us_states, aes(x = long, y = lat, group = group)) +

geom_polygon(color = "white", fill = "lightblue")

40
lat

−120 −100 −80

long

12 / 27
▶ Add data to fill each state according to its population. We shall
continue using the gun murders data, murders.csv.
▶ The first step is to merge the two data sets.

murders <- read.csv("../data/murders.csv")

df <- murders %>%
mutate(state = tolower(state)) %>%
left_join(us_states, by = c("state" = "region"))
glimpse(df)

## Rows: 15,539
## Columns: 10
## $ state <chr> "alabama", "alabama", "alabama", "alabama", "alaba
## $ abb <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "A
## $ region <chr> "South", "South", "South", "South", "South", "Sout
## $ population <int> 4779736, 4779736, 4779736, 4779736, 4779736, 47797
## $ total <int> 135, 135, 135, 135, 135, 135, 135, 135, 135, 135,
## $ long <dbl> -87.46201, -87.48493, -87.52503, -87.53076, -87.57
## $ lat <dbl> 30.38968, 30.37249, 30.37249, 30.33239, 30.32665,
## $ group <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
## $ order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
## $ subregion <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
13 / 27
Maps: geom_polygon()

ggplot(df, aes(x = long, y = lat, group = group, fill = region)) +

geom_polygon(color = "white") +
theme(legend.position = "top")

region North Central Northeast South West

40
lat

25
−120 −100 −80
long

14 / 27
Maps: geom_polygon()

ggplot(df, aes(x = long, y = lat, group = group, fill=population/1e6))+

geom_polygon(color = "white") +
theme(legend.position = "top")

population/1e+06
10 20 30

40
lat

25
−120 −100 −80
long

15 / 27
▶ Once a map is created, we often need to modify the color
schemes.
▶ . . . with scale_fill_continuous() in this example.
▶ Also, theme_void() modifies the theme of the visualization.

ggplot(df, aes(x = long, y = lat, group = group, fill=population/1e6))+

geom_polygon(color = "white") +
scale_fill_continuous(name = "Population (millions)",
low = "lightgray", high = "steelblue") +
theme_void() +
theme(legend.position = "top")

Population (millions)
10 20 30

16 / 27
More on maps

Other maps in the map_data() function:

▶ Countries: usa, france, italy, nz
▶ Within the US: county, state
▶ World: world, world2

There are other packages and methods for maps. But you will need to
do your research to look for geographic information that defines the
map boundaries.

17 / 27
Singapore planning regions
The file sg_masterplan2019.rds contains Singapore’s planning area
boundary in 2019.
▶ The original data come from the Urban Redevelopment
Authority.
▶ We will need the sf package before loading the data.

# install.packages("sf")
library(sf)
sg_map <- readRDS("../data/sg_masterplan2019.rds")
class(sg_map)

## [1] "sf" "tbl_df" "tbl" "data.frame"

glimpse(sg_map)

## Rows: 55
## Columns: 2
## $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((103.9321 1...., MULTIPO
## $ town <chr> "BEDOK", "BOON LAY", "BUKIT BATOK", "BUKIT MERAH", "
18 / 27
Simple feature (sf) is a common storage and access model for
geographic features with spatial geometries.
▶ At the most basic, an sf object that contains a special
geometry column with the spatial aspects of the features.
▶ In sg_map, it is the coordinates that describes the town
boundaries.

Artwork by Allison Horst

19 / 27
Maps: geom_sf()
We will use a special geom, geom_sf(), to visualize sf objects.
▶ The function uses a unique aesthetics: geometry.

ggplot(sg_map) +
geom_sf(aes(geometry = geometry),
fill = "lightgray", color = "white") +
theme_void()

20 / 27
Second summary on ggplot2

Summary on some of the geoms we learned this week:

ggplot + geom_col (/bar) + geom_polygon (/sf)

21 / 27
ggplot2 themes

The default background of a ggplot2 graph is always light gray.

There are several reasons that the designers have:

1. White grid lines are visible, yet easy to tune out, keeping the
data prominent.
2. The grey background gives a similar color to typographic text,
preventing it from jumping out.
3. It creates a continuous field of color which ensures that the plot
is perceived as a single visual entity.

You may agree or disagree with these points. If you would like to
alter some of these elements, they can be done by selecting a different
theme for your plot.

22 / 27
Themes

23 / 27
Layouts
▶ To combine separate ggplots into one, we can use patchwork, an
extension to ggplot2.

library(patchwork)
p1 <- ggplot(murders, aes(x = region)) + geom_bar()
p2 <- p1 + theme_minimal() + labs(title = "Minimal")
p3 <- p1 + theme_classic() + labs(title = "Classic")
p1 + p2 + p3

Minimal Classic

15 15 15

10 10 10
count

count

count
5 5 5

0 0 0
North Central
Northeast
South West North Central
Northeast
South West North Central
Northeast
South West
region region region

24 / 27
Layered grammar of graphics

Our initial template can be extend to:

ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>),
stat = <STAT>, position = <POSITION>) +
<COORDINATE_FUNCTION> +
<FACET_FUNCTION> +
<SCALE_FUNCTION> +
<THEME_FUNCTION>

25 / 27
Figure generation pipelines

Most visualization is done for the purpose of communication.

▶ Who’s your audience?
▶ What’s the insight you’d like to convey?

Visualizations should be “autogenerated” as part of our data

analysis pipeline (which should also be automated).
▶ Ready for printing and sharing, without manual post-processing
needed.
▶ Able to tweak and re-generate the graph, which is quite
frequently encountered in data analysis.

Additional reading: Ch. 28 & 29 in Fundamentals of Data

Visualization (via Canvas).

26 / 27
ggplot2 and extensions

ggplot2 is a system for declaratively creating graphic, included in

tidyverse.
▶ Besides the functions we covered in lecture, explore more at
▶ https://ggplot2-book.org/
▶ Also, the ggplot2 extensions:
▶ https://exts.ggplot2.tidyverse.org/gallery/

27 / 27

Data Visualization with R
100% (1)
Data Visualization with R
18 pages
GG Map Cheat Sheet
No ratings yet
GG Map Cheat Sheet
2 pages
An Introduction To Spatial Data Analysis in R
100% (1)
An Introduction To Spatial Data Analysis in R
121 pages
R for Simplified Mapping
100% (1)
R for Simplified Mapping
54 pages
Verifyaccess Admin
No ratings yet
Verifyaccess Admin
344 pages
Ggplot2 For Data Visualization: Grammer of Graphics "
No ratings yet
Ggplot2 For Data Visualization: Grammer of Graphics "
19 pages
Data Visualization With Ggplot2: Choropleths
No ratings yet
Data Visualization With Ggplot2: Choropleths
37 pages
R Mapping Basics with ggplot2 & sf
No ratings yet
R Mapping Basics with ggplot2 & sf
28 pages
DVT (Lab) - R Language Manual
No ratings yet
DVT (Lab) - R Language Manual
20 pages
KrutikaKolhe 862467252 HW5
No ratings yet
KrutikaKolhe 862467252 HW5
18 pages
#Merging The Columns From Two Data Sets
No ratings yet
#Merging The Columns From Two Data Sets
3 pages
Week 4 Data Visualisation I
No ratings yet
Week 4 Data Visualisation I
34 pages
Usingrformapmaking Notes
No ratings yet
Usingrformapmaking Notes
12 pages
Introduction To Visualising Spatial Data in R
No ratings yet
Introduction To Visualising Spatial Data in R
29 pages
A Comprehensive Guide On Ggplot2 in R
No ratings yet
A Comprehensive Guide On Ggplot2 in R
30 pages
Drawing Beautiful Maps Programmatically With R, SF and Ggplot2 - Part 3 - Layouts
No ratings yet
Drawing Beautiful Maps Programmatically With R, SF and Ggplot2 - Part 3 - Layouts
26 pages
MIT 302 - Statistical Computing II - Tutorial 04
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 04
7 pages
Cheat Sheet Ggplot2
No ratings yet
Cheat Sheet Ggplot2
2 pages
Thematic Maps
No ratings yet
Thematic Maps
13 pages
Ultimate Cheat SHEET - Analysis in R
No ratings yet
Ultimate Cheat SHEET - Analysis in R
17 pages
Sally Range Gifs Code
No ratings yet
Sally Range Gifs Code
2 pages
Visualizing Data in R
No ratings yet
Visualizing Data in R
20 pages
Data Visualization With Ggplot2 - CheatSheet
No ratings yet
Data Visualization With Ggplot2 - CheatSheet
9 pages
Ggmapcheatsheet PDF
No ratings yet
Ggmapcheatsheet PDF
2 pages
Visualizing Geospatial Data in R
No ratings yet
Visualizing Geospatial Data in R
31 pages
Intro Spatial RL PDF
No ratings yet
Intro Spatial RL PDF
20 pages
Data Visualization With Ggplot2::: Cheat Sheet
No ratings yet
Data Visualization With Ggplot2::: Cheat Sheet
2 pages
Data Viz with ggplot2 for Analysts
No ratings yet
Data Viz with ggplot2 for Analysts
30 pages
R Data Visualization Techniques
No ratings yet
R Data Visualization Techniques
46 pages
Package Maps': R Topics Documented
No ratings yet
Package Maps': R Topics Documented
30 pages
Using Ggplot2 For Plots in R
No ratings yet
Using Ggplot2 For Plots in R
8 pages
Basic Mapping With R4
No ratings yet
Basic Mapping With R4
15 pages
Typographic Conventions: Plot (X, Y) Monospace C, ## (1) 1 4 9 25 ## #
No ratings yet
Typographic Conventions: Plot (X, Y) Monospace C, ## (1) 1 4 9 25 ## #
14 pages
Introduction To Visualising Spatial Data in R / Robin Lovelace
No ratings yet
Introduction To Visualising Spatial Data in R / Robin Lovelace
23 pages
Data Visualization 2.1
No ratings yet
Data Visualization 2.1
2 pages
Vector Data Exploration With R
No ratings yet
Vector Data Exploration With R
9 pages
R Graphics Essentials For Great Data Visualization
No ratings yet
R Graphics Essentials For Great Data Visualization
28 pages
Spatial Analysis
No ratings yet
Spatial Analysis
24 pages
Section 2 Section 6 Section 9: Occupied Psilon Colonized Psilon
No ratings yet
Section 2 Section 6 Section 9: Occupied Psilon Colonized Psilon
54 pages
ggplot2 Data Visualization Guide
No ratings yet
ggplot2 Data Visualization Guide
4 pages
Assignment 2 PDF
No ratings yet
Assignment 2 PDF
9 pages
Week12 Slides
No ratings yet
Week12 Slides
46 pages
The Ggplot2 System
No ratings yet
The Ggplot2 System
7 pages
Ggplot2 Cheatsheet 2.0
No ratings yet
Ggplot2 Cheatsheet 2.0
2 pages
Exploratory Data Analysis Course Notes
No ratings yet
Exploratory Data Analysis Course Notes
55 pages
Compiler Queue Syntax Hash Integer Stack
No ratings yet
Compiler Queue Syntax Hash Integer Stack
3 pages
Lecture 2 Data Presentation
No ratings yet
Lecture 2 Data Presentation
18 pages
ggplot2 Data Visualization Cheat Sheet
No ratings yet
ggplot2 Data Visualization Cheat Sheet
2 pages
Week5 Slides
No ratings yet
Week5 Slides
72 pages
Week6 Slides Updated
No ratings yet
Week6 Slides Updated
57 pages
Week13 Slides Review
No ratings yet
Week13 Slides Review
23 pages
Week3 Slides
No ratings yet
Week3 Slides
36 pages
Week2 Slides
No ratings yet
Week2 Slides
76 pages
SANS - 0230920 - Sysdig - Updated - Buyers - Guide - FINAL
No ratings yet
SANS - 0230920 - Sysdig - Updated - Buyers - Guide - FINAL
17 pages
Advanced Modeling Checklist
No ratings yet
Advanced Modeling Checklist
6 pages
Computer Fundamentals Overview
No ratings yet
Computer Fundamentals Overview
22 pages
CP Radar Product Summary v5 0 Web
No ratings yet
CP Radar Product Summary v5 0 Web
16 pages
Log
No ratings yet
Log
93 pages
Headless CMS Implementation Guide
No ratings yet
Headless CMS Implementation Guide
7 pages
HANA SmartDataAccess SQL 1.00.60+
No ratings yet
HANA SmartDataAccess SQL 1.00.60+
9 pages
MIPS Instruction Guide
No ratings yet
MIPS Instruction Guide
24 pages
Express Course 2021
No ratings yet
Express Course 2021
110 pages
Cybersecurity For Small Networks A No Nonsense Guide For The Reasonably Paranoid 1st Edition Seth Enoka Instant Read Access
No ratings yet
Cybersecurity For Small Networks A No Nonsense Guide For The Reasonably Paranoid 1st Edition Seth Enoka Instant Read Access
131 pages
MIT - The Dark Secret at The Heart of AI
No ratings yet
MIT - The Dark Secret at The Heart of AI
13 pages
Arjun Kumar Chaurasia: Field Application Engineer
No ratings yet
Arjun Kumar Chaurasia: Field Application Engineer
2 pages
Alguns Atalhos Do Excel para A Versao em Ingles
No ratings yet
Alguns Atalhos Do Excel para A Versao em Ingles
3 pages
Web Based Agri Tourism Information Management With Tour Scheduling Design Hearing Presentation
No ratings yet
Web Based Agri Tourism Information Management With Tour Scheduling Design Hearing Presentation
50 pages
Flow Chart 2
No ratings yet
Flow Chart 2
16 pages
10G SFP+ Switch Quickstart Guide
No ratings yet
10G SFP+ Switch Quickstart Guide
6 pages
Aws Certified Data Engineer Associate 9
No ratings yet
Aws Certified Data Engineer Associate 9
14 pages
Neuron XT Compressor Manual
No ratings yet
Neuron XT Compressor Manual
3 pages
Hospital Management Srs
100% (2)
Hospital Management Srs
6 pages
Gopal Sahastranaam Stotram Path Vidhi Labh Hindi 225
No ratings yet
Gopal Sahastranaam Stotram Path Vidhi Labh Hindi 225
22 pages
AashishAmbasta Resume
No ratings yet
AashishAmbasta Resume
1 page
Ftalk Pp011 en P
No ratings yet
Ftalk Pp011 en P
4 pages
Adding Pulldown Menu To Ade
No ratings yet
Adding Pulldown Menu To Ade
3 pages
Ez Win Answer Codm
No ratings yet
Ez Win Answer Codm
65 pages
Topic Wise Bundle PDF Course Quantitative Aptitude Ages - Based On Twice/Thrice/N Times Set-1 (Eng)
No ratings yet
Topic Wise Bundle PDF Course Quantitative Aptitude Ages - Based On Twice/Thrice/N Times Set-1 (Eng)
5 pages
TW Ebook Modern Data Engineering Playbook
No ratings yet
TW Ebook Modern Data Engineering Playbook
38 pages
CS Lab Midterm Instructions
No ratings yet
CS Lab Midterm Instructions
1 page
Gojek Case Study
0% (1)
Gojek Case Study
15 pages
Digital Literacy - Summative Assessment 1 - Xeb
No ratings yet
Digital Literacy - Summative Assessment 1 - Xeb
16 pages

Week11 Slides

Uploaded by

Week11 Slides

Uploaded by

DSA2101

Essential Data Analytics Tools: Data Visualization

Week 11 Introduction to ggplot2

Source: Adapted from John F. Ouyang.

Artwork by Allison Horst

1. Aesthetics and geometrical objects

We use a bar plot to visualize categorical variables.

Some of the aesthetics that these geom functions use are:

murders <- read.csv("../data/murders.csv")

North Central Northeast South West

ggplot(murders, aes(x = region)) +

North Central Northeast South West

ggplot(state_by_region, aes(x = region, y = n)) +

North Central Northeast South West

Plotting geo-spatial data is a common visualization task.

ggplot(data = us_states, aes(x = long, y = lat)) +

−120 −100 −80

ggplot(data = us_states, aes(x = long, y = lat, group = group)) +

−120 −100 −80

murders <- read.csv("../data/murders.csv")

ggplot(df, aes(x = long, y = lat, group = group, fill = region)) +

region North Central Northeast South West

ggplot(df, aes(x = long, y = lat, group = group, fill=population/1e6))+

ggplot(df, aes(x = long, y = lat, group = group, fill=population/1e6))+

Other maps in the map_data() function:

## [1] "sf" "tbl_df" "tbl" "data.frame"

Artwork by Allison Horst

Summary on some of the geoms we learned this week:

ggplot + geom_col (/bar) + geom_polygon (/sf)

The default background of a ggplot2 graph is always light gray.

Our initial template can be extend to:

Most visualization is done for the purpose of communication.

Visualizations should be “autogenerated” as part of our data

Additional reading: Ch. 28 & 29 in Fundamentals of Data

ggplot2 is a system for declaratively creating graphic, included in

You might also like