0% found this document useful (0 votes)

40 views46 pages

Week12 Slides

The document provides essential information regarding the midterm and final exams for a data analytics course, including grading statistics and exam logistics. It emphasizes the importance of data visualization in exploratory data analysis, showcasing various methods to visualize Covid-19 data for Asian countries. Additionally, it discusses color theory and palettes in data visualization, highlighting tools and packages available in R for effective graphical representation.

Uploaded by

Tùng Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views46 pages

Week12 Slides

Uploaded by

Tùng Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

DSA2101

Essential Data Analytics Tools: Data Visualization

Yuting Huang

AY24/25

Week 12 Exploring data through visualization

1 / 46
Midterm Exam: Feedback

The grades and feedback are available on Canvas:

▶ Median = 23.28, mean = 22.54, std = 6.45.
▶ Some of you did really well: > 27.5/30.
▶ If you scored below 12.75, please schedule a meeting with
your TA to review the midterm.

2 / 46
Final Exam: Date and time

The final exam worth 40% of your grade.

▶ Time: May 6th 9-11am
▶ Venue: MPSH1A
▶ Open book, open notes, block internet exam on Examplify.
▶ R packages required: readxl, stringr, lubridate,
tidyverse.
▶ Data files will be available on Canvas 15 minutes before the
exam.

3 / 46
Final Exam: Question format

The exam consists of

▶ Part I: Multiple choice + Fill-in-the-blank questions (20
marks).
▶ Answer questions directly on Examplify. No submission of
R code is needed.
▶ Part II: Coding questions (20 marks).
▶ Answer questions in a single Rmd file and submit it on both
Examplify and Canvas.

4 / 46
Submission requirements

At the end of the exam at 11am:

▶ Copy and paste the entire Rmd to an Examplify text box.
Submit the exam on Examplify.
▶ Then upload your Rmd to Canvas before 11:15am.
▶ Ensure that the code submission on both Examplify and
Canvas is identical, with exception of indentation and
alignment. Any discrepancy will be flagged and penalized.
▶ After successful submissions, please keep your Examplify
green confirmation window and the Canvas submission
page open for invigilators to verify.

5 / 46
Explore data through visualization

Visualization is an integral part of exploratory data analysis

(EDA).
▶ It is a highly iterative process. We should expect to:
▶ Generate questions about our data.
▶ Search for answers by visualizing, transforming, and
modeling out data.
▶ Use what we learn to refine our questions and/or generate
new questions.

6 / 46
coronavirus data
Let’s work with a data set on the daily summary of Covid-19
cases, deaths, and recovery for Asian countries and cities.

library(tidyverse)
theme_set(theme_minimal())
coronavirus <- read.csv("../data/wk12_coronavirus.csv") %>%
select(-X) %>% mutate(date = ymd(date))
glimpse(coronavirus)

## Rows: 154,305
## Columns: 4
## $ country <chr> "Afghanistan", "Afghanistan", "Afghanistan",
## $ date <date> 2020-01-22, 2020-01-22, 2020-01-22, 2020-01-
## $ type <chr> "confirmed", "death", "recovery", "confirmed"
## $ cases <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

7 / 46
The data contain information on Covid-19 cases for 45 Asian
countries and cities.
▶ Let’s examine confirmed cases for selected countries in
2020.

selected_countries <- c("Singapore", "Malaysia", "Indonesia")

df1 <- coronavirus %>%
filter(country %in% selected_countries,
type == "confirmed", date <= "2020-12-31")

8 / 46
Naturally, we can visualize the daily confirmed cases with line
charts.

df1_text <- df1 %>% filter(date == "2020-12-31")

ggplot(df1, aes(x = date, y = cases, color = country)) +
geom_line() +
geom_text(data = df1_text, aes(label = country),
hjust = "left", nudge_x = 2, size = 3) +
labs(x = "", y = "Confirmed cases") +
theme(legend.position = "none") +
scale_x_date(limits = as.Date(c("2020-01-01", "2021-01-31")))

8000 Indonesia
Confirmed cases

6000

4000

Malaysia
2000

0 Singapore
Jan 2020 Apr 2020 Jul 2020 Oct 2020 Jan 2021

9 / 46
The layer facet_wrap() creates small multiples (i.e., faceted
plots) based on a categorical variable.
▶ Each subplot shows a subset of the data.
▶ By default, it also keeps the scales of the axes fixed, for
easier comparison.

ggplot(df1, aes(x = date, y = cases, color = country)) +

geom_line() +
facet_wrap(~ country) +
labs(x = "", y = "Confirmed cases") +
theme(legend.position = "none") +
scale_x_date(limits = as.Date(c("2020-01-01", "2021-01-01")),
date_breaks = "6 months", date_labels = "%Y-%b")

10 / 46
ggplot(df1, aes(x = date, y = cases, color = country)) +
geom_line() +
facet_wrap(~ country) +
labs(x = "", y = "Confirmed cases") +
theme(legend.position = "none") +
scale_x_date(limits = as.Date(c("2020-01-01", "2021-01-01")),
date_breaks = "6 months", date_labels = "%Y-%b")

Indonesia Malaysia Singapore

8000
Confirmed cases

6000

4000

2000

0
2020−Jun 2020−Dec 2020−Jun 2020−Dec 2020−Jun 2020−Dec

11 / 46
We can also visualize the data will be using a tile chart (heat
map).
▶ A numeric variable is mapped to a continuous fill scale.

ggplot(df1, aes(x = date, y = country)) +

geom_tile(aes(fill = cases/1000)) +
scale_fill_gradient(low = "white", high = "maroon") +
labs(title = "Confirmed cases in 2020",
fill = "Cases (thousands)", x = "", y = "") +
theme(legend.position = "top")

Confirmed cases in 2020

Cases (thousands)
0 2 4 6 8

Singapore

Malaysia

Indonesia

Apr 2020 Jul 2020 Oct 2020 Jan 2021 12 / 46

Alternatively, we can aggregate daily counts to monthly total,
and visualize the overall trends.

df2 <- df1 %>%

mutate(month = month(date, label = TRUE, abbr = TRUE)) %>%
group_by(country, month) %>%
summarize(cases = sum(cases), .groups = "drop")
glimpse(df2)

## Rows: 36
## Columns: 3
## $ country <chr> "Indonesia", "Indonesia", "Indonesia", "Indon
## $ month <ord> Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep,
## $ cases <int> 0, 0, 1528, 8590, 16355, 29912, 51991, 66420,

13 / 46
ggplot(df2, aes(x = month, y = country)) +
geom_tile(aes(fill = cases/1000)) +
scale_fill_gradient(low = "white", high = "maroon") +
labs(title = "Confirmed cases in 2020",
fill = "Cases (thousands)", x = "", y = "") +
theme(legend.position = "top")

Confirmed cases in 2020

Cases (thousands)
0 50 100 150 200

Singapore

Malaysia

Indonesia

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

14 / 46
ggplot(df2, aes(x = month, y = country)) +
geom_tile(aes(fill = cases/1000), show.legend = FALSE) +
geom_text(aes(label = cases), size = 2.5) +
scale_fill_gradient(low = "white", high = "maroon") +
labs(x = "", y = "", title="Monthly confirmed cases in 2020")

Monthly confirmed cases in 2020

Singapore 13 89 824 15243 18715 9023 8298 4607 953 250 203 381

Malaysia 8 21 2737 3236 1817 820 337 364 1884 20324 34149 47313

Indonesia 0 0 1528 8590 16355 29912 51991 66420 112212 123080 128795 204315

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

15 / 46
Colors

Colors can be thought of as a three-dimensional concept

consisting of
▶ Hue: Red, green, blue, . . .
▶ Saturation: The purity of light, e.g., dull versus vivid.
▶ Brightness: The amount of light present, e.g., light versus
dark.

Apart from making graphs prettier and more pleasant to look

at, colors add solid functionality to visual representations.

1. Distinguish between different categorical groups.

2. Distinguish between the magnitude of continuous values.

16 / 46
Types of color palettes

These functionalities roughly correspond to different types of

color palettes.

▶ Qualitative color
palettes for categorical
data.
▶ To highlight distinction
across groups.
▶ Sequential color palettes
for continuous data.
▶ Use increasing intensity
or saturation to
indicate larger values.

17 / 46
Types of color palettes

▶ Diverging color palettes

for data with a central
neutral value.
▶ To put equal emphasis
on extreme values at
both ends of the data
range.
▶ The value in the
middle is represented
by lighter colors.

18 / 46
Base R colors

To gain control over colors, we first need to define colors or

color palettes.
▶ Base R comes with 657 predefined colors.
▶ We can call them by names: col = "steelblue"
▶ The default color palette in R (using version 4.4.2):

palette()

## [1] "black" "#DF536B" "#61D04F" "#2297E6" "#28E2E5" "#CD0BB

## [8] "gray62"

19 / 46
HEX color codes

The palette() command returns colors in 6-digit HEX

(hexadecimal) codes.

▶ The six digits indicate the color.

▶ Black is #000000 and white is #FFFFFF.
▶ We can add two digits at the end to encode the degree of
opacity.

20 / 46
RGB color codes

Colors can also be represented using RGB (red-green-blue)

codes.
▶ An additive color model
used for screens.
▶ Each code is specified
with three parameters,
defining the intensity of
the color as an integer
between 0 and 255.
▶ rgb(0, 0, 0) is black.
▶ rgb(255, 255, 255) is
white.

21 / 46
Using color packages

Most visualization packages, like ggplot2, provide their own

color palettes.
There is also a large number of R packages that supply
additional color support.
▶ ggthemes provides useful palettes such as Tableau color
palettes.
▶ viridis can be perceived by readers with the most
common forms of color blindness.
▶ RColorBrewer provides a vibrant color palettes that are
also widely used in the R community.

22 / 46
ggthemes palettes

There are various color palettes available in ggthemes.

▶ Here’s one of those used in Tableau.

Classic 10

#1f77b4 #ff7f0e #2ca02c #d62728

#9467bd #8c564b #e377c2 #7f7f7f

#bcbd22 #17becf

23 / 46
Color blindness and the viridis paletes

A sizable proportion of population can only distinguish fewer

colors than others.
▶ Here’s how the base R palette would appear under different
form of color blindness.

base R palette viridis palette

normal vision normal vision

deuteranope deuteranope

protanope protanope

desaturate desaturate

24 / 46
RColorBrewer palettes
YlOrRd
YlOrBr
YlGnBu
YlGn
Reds
RdPu
Purples
PuRd
PuBuGn
PuBu
OrRd
Oranges
Greys
Greens
GnBu
BuPu
BuGn
Blues
Set3
Set2
Set1
Pastel2
Pastel1
Paired
Dark2
Accent
Spectral
RdYlGn
RdYlBu
RdGy
RdBu
PuOr
PRGn
PiYG
BrBG

25 / 46
Custom colors

▶ Specify a single color to a geom:

▶ Use color or fill to a specific color outside of aes().
▶ Assign colors by a variable in data:
▶ Map color or fill to the variable of interest.
▶ Set custom color palettes, for example:
▶ scale_*_manual() for custome a set of colors.
▶ Additional color packages such as viridis, RColorBrewer,
and ggthemes.

26 / 46
Example
Let us continue to use the Coronavirus data set, now we focus
on cases in Singapore in 2020.

df_sg <- coronavirus %>%

mutate(date = ymd(date)) %>%
filter(country == "Singapore", date <= "2020-12-31",
type %in% c("confirmed", "recovery"))
head(df_sg)

## country date type cases

## 1 Singapore 2020-01-22 confirmed 0
## 2 Singapore 2020-01-22 recovery 0
## 3 Singapore 2020-01-23 confirmed 1
## 4 Singapore 2020-01-23 recovery 0
## 5 Singapore 2020-01-24 confirmed 2
## 6 Singapore 2020-01-24 recovery 0

27 / 46
p1 <- ggplot(df_sg, aes(x = date, y = cases, color = type)) +
geom_line(lwd = 1) +
scale_x_date(date_breaks = "1 month", date_labels = "%b") +
labs(x = "", y = "Cases", color = "",
title = "Confirmed and recovered cases in Singapore, 2020") +
theme(legend.position = "top")
p1

Confirmed and recovered cases in Singapore, 2020

confirmed recovery

1000
Cases

500

0
Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan

28 / 46
Manually specified colors
▶ Here we specify a discrete color scale with
scale_color_manual():

p1 +
scale_color_manual(values = c("maroon", "gray"))

Confirmed and recovered cases in Singapore, 2020

confirmed recovery

1000
Cases

500

0
Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan

29 / 46
▶ The same data with geom_area(). Notice that type is now
mapped to the fill aesthetics.
p2 <- ggplot(df_sg, aes(x = date, y = cases, fill = type)) +
geom_area() +
scale_x_date(date_breaks = "1 month", date_labels = "%b") +
labs(x = "", y = "Confirmed cases", color = "",
title = "Confirmed cases in Singapore, 2020") +
theme(legend.position = "top") +
scale_fill_manual(values = c("maroon", "gray"))
p2
Confirmed cases in Singapore, 2020

type confirmed recovery

2000

1500
Confirmed cases

1000

500

0
Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan

30 / 46
For RColorBrewer palettes, we use scale_fill_brewer():
▶ The palette argument specifies the name of the palette.

library(RColorBrewer)
p2 + scale_fill_brewer(palette = "Set3")

Confirmed cases in Singapore, 2020

type confirmed recovery

2000

1500
Confirmed cases

1000

500

0
Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan

31 / 46
▶ Moreover, a viridis palette with scale_fill_viridis().

library(viridis)
p2 + scale_fill_viridis(option = "viridis", discrete = TRUE)

Confirmed cases in Singapore, 2020

type confirmed recovery

2000

1500
Confirmed cases

1000

500

0
Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan

32 / 46
More on accessibility

Apart from inclusive colors, we can also include Alt Text

(alternative text).
▶ The goal is to make our visuals more accessible to everyone.
▶ Used in HTML pages, often displayed in place of or below
the figure.

Source: Mary Cesal.

33 / 46
Examples

34 / 46
Examples

35 / 46
▶ In RMarkdown, one way to include Alt Text is through
fig.cap local code chunk option.
▶ The text will be displayed below the figure.

Daily confirmed cases Singapore, 2020

1000

500

Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan

Line chart that shows daily COVID-19 daily confirmed cases in

Singapore in 2020, where there is a prominent peak in late April; the
daily cases drops sharply after August and remains low for the rest of
the year.

36 / 46
Summary on ggplot2 functions

▶ Here are some of the geom functions we learned so far.

ggplot + geom_point + geom_smooth + + geom_histogram + geom_density

geom_hline (/vline)

+ geom_line + + geom_col (/bar) + geom_polygon + geom_tile + geom_area

geom_text (/label) (/sf)

text

label

37 / 46
Visualization principles
Now we are equipped with the visualization functions, we shall
revisit some of the principles covered in Week 9.
1. Include the baseline (typically 0).

38 / 46
2. Pie charts (best to avoid them).

39 / 46
3. Partial transparency and jiterring (to handle overplotting).

40 / 46
4. Color coding (sequential palette for continuous variable).

41 / 46
4. Color coding (diverging palette for data with a meaningful
midpoint).

42 / 46
5. Small multiples (same scale on the axis).

43 / 46
In Week 10, we discussed three graphs in the wild:

44 / 46
Your turn

The following data sets are available on canvas:

▶ wk12_streamingUS.csv
▶ wk12_cereal_consumption.xlsx
▶ wk12_time_use.csv

Identify the geoms used and re-create the plots in ggplot2.

Explore the data and think about possible ways to

revise/improve the visualizations.

45 / 46
Plans in Week 13

Week 13:
▶ Review session on Monday. There will be no lecture on
Wednesday.
▶ Tutorials as usual.
▶ Wrap up your group project and submit it by the extended
due date: Saturday, April 19 11:59pm.
▶ Only one submission is required for each group.

46 / 46

Prescriptive, Descriptive, Formal, Functional, & Pedagogical Grammar
0% (1)
Prescriptive, Descriptive, Formal, Functional, & Pedagogical Grammar
4 pages
Visualizing Data in R
No ratings yet
Visualizing Data in R
20 pages
Pres Dataviz
No ratings yet
Pres Dataviz
122 pages
Ggplot2 For Data Visualization: Grammer of Graphics "
No ratings yet
Ggplot2 For Data Visualization: Grammer of Graphics "
19 pages
Data Visulization1
No ratings yet
Data Visulization1
39 pages
R Data Visualization Techniques
No ratings yet
R Data Visualization Techniques
46 pages
Interactive Visualization of COVID-19 Data and Animated Map: Some Instructions
No ratings yet
Interactive Visualization of COVID-19 Data and Animated Map: Some Instructions
6 pages
GGPLOT Tips for Data Scientists
No ratings yet
GGPLOT Tips for Data Scientists
18 pages
R Data Visualization Guide
No ratings yet
R Data Visualization Guide
98 pages
R Data Visualization Techniques
No ratings yet
R Data Visualization Techniques
21 pages
Tools For Data Visualization
No ratings yet
Tools For Data Visualization
2 pages
R Visualizations: Derive Meaning From Data 1st Edition David Gerbing PDF Download
No ratings yet
R Visualizations: Derive Meaning From Data 1st Edition David Gerbing PDF Download
178 pages
Lectorial 2 P3
No ratings yet
Lectorial 2 P3
27 pages
Lec06-Data Visualization
No ratings yet
Lec06-Data Visualization
70 pages
Data Visualization With Ggplot2: Case Study I Bag Plot
No ratings yet
Data Visualization With Ggplot2: Case Study I Bag Plot
47 pages
Lesson 6 - Data Analytics - Visualizing
No ratings yet
Lesson 6 - Data Analytics - Visualizing
23 pages
MIT 302 - Statistical Computing II - Tutorial 04
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 04
7 pages
Intro Ggplot2 3
No ratings yet
Intro Ggplot2 3
53 pages
Dav Exp8
No ratings yet
Dav Exp8
10 pages
Rcourse Partviz
No ratings yet
Rcourse Partviz
9 pages
On Eda
No ratings yet
On Eda
60 pages
Week 1 Basics
No ratings yet
Week 1 Basics
23 pages
04 Data Visualization
No ratings yet
04 Data Visualization
64 pages
Visualizing A Single Variable Using R
No ratings yet
Visualizing A Single Variable Using R
9 pages
Data Visualization in R
No ratings yet
Data Visualization in R
4 pages
Week 10
No ratings yet
Week 10
15 pages
A Comprehensive Guide On Ggplot2 in R
No ratings yet
A Comprehensive Guide On Ggplot2 in R
30 pages
R Training AM
No ratings yet
R Training AM
6 pages
11 Data Visualization
No ratings yet
11 Data Visualization
44 pages
Tidy Data
No ratings yet
Tidy Data
62 pages
COVID 19 Some Challenges Some Data 1
No ratings yet
COVID 19 Some Challenges Some Data 1
26 pages
Unit Iii (R)
No ratings yet
Unit Iii (R)
75 pages
Data Visualization Techniques
No ratings yet
Data Visualization Techniques
8 pages
Unit 5 To Students
No ratings yet
Unit 5 To Students
41 pages
Unit3 R
No ratings yet
Unit3 R
30 pages
Week11 Slides
No ratings yet
Week11 Slides
27 pages
MDPN460 Lecture06
No ratings yet
MDPN460 Lecture06
67 pages
1-Week R Programming Syllabus (Data Science, ML, Time Series)
No ratings yet
1-Week R Programming Syllabus (Data Science, ML, Time Series)
6 pages
Unit3 R
No ratings yet
Unit3 R
19 pages
2 R - Zajecia - 4 - Eng
No ratings yet
2 R - Zajecia - 4 - Eng
7 pages
Figures With GGPlot
No ratings yet
Figures With GGPlot
58 pages
Charts and Graphs in R
No ratings yet
Charts and Graphs in R
50 pages
R Programming for Students
No ratings yet
R Programming for Students
10 pages
5th Report
No ratings yet
5th Report
23 pages
Visualization in R
No ratings yet
Visualization in R
44 pages
Assignment 2
No ratings yet
Assignment 2
13 pages
Exercise 2
No ratings yet
Exercise 2
3 pages
Grpahs and Charts in R
No ratings yet
Grpahs and Charts in R
12 pages
Exploratory Data Analysis Course Notes
No ratings yet
Exploratory Data Analysis Course Notes
55 pages
KrutikaKolhe 862467252 HW5
No ratings yet
KrutikaKolhe 862467252 HW5
18 pages
DV Assignment-1
No ratings yet
DV Assignment-1
10 pages
Ds 1
No ratings yet
Ds 1
22 pages
R Module 10 - Data - Visualization
No ratings yet
R Module 10 - Data - Visualization
49 pages
Lecture 3
No ratings yet
Lecture 3
53 pages
Lecture 2 Data Presentation
No ratings yet
Lecture 2 Data Presentation
18 pages
R Graphics Essentials Great Data Visualization
No ratings yet
R Graphics Essentials Great Data Visualization
248 pages
Ggplot2 advancedTP - RMD
No ratings yet
Ggplot2 advancedTP - RMD
22 pages
Data Visualization in R Sem-III 2021 PDF
No ratings yet
Data Visualization in R Sem-III 2021 PDF
57 pages
Week5 Slides
No ratings yet
Week5 Slides
72 pages
Week6 Slides Updated
No ratings yet
Week6 Slides Updated
57 pages
Week13 Slides Review
No ratings yet
Week13 Slides Review
23 pages
Week3 Slides
No ratings yet
Week3 Slides
36 pages
Week2 Slides
No ratings yet
Week2 Slides
76 pages
Hemanth (4,0)
No ratings yet
Hemanth (4,0)
4 pages
National Anthem Player System IOT (1) (Perfect)
No ratings yet
National Anthem Player System IOT (1) (Perfect)
11 pages
Try Begging - Belle Reservoir
No ratings yet
Try Begging - Belle Reservoir
12 pages
Chaper 5
No ratings yet
Chaper 5
41 pages
Gerund, Infinitive, Participle
100% (1)
Gerund, Infinitive, Participle
6 pages
Comparative and Superlative 1-Páginas-1
0% (1)
Comparative and Superlative 1-Páginas-1
1 page
Essay Structures & Phrases Guide
100% (1)
Essay Structures & Phrases Guide
16 pages
1 Teaching Assign Trinity in Asian Contexts
No ratings yet
1 Teaching Assign Trinity in Asian Contexts
6 pages
Tesla's TTPoE for AI Supercomputers
No ratings yet
Tesla's TTPoE for AI Supercomputers
23 pages
SOA 27001 Controles Aplicados
No ratings yet
SOA 27001 Controles Aplicados
16 pages
Lab 8 - Latex: 1. Objective - 2. Tutorial A. The Basic Layout of A Latex File
No ratings yet
Lab 8 - Latex: 1. Objective - 2. Tutorial A. The Basic Layout of A Latex File
12 pages
English Language
No ratings yet
English Language
12 pages
Women in Heart of Darkness
No ratings yet
Women in Heart of Darkness
4 pages
Fyimca Business Mathematics 123 Theory Termwork 2
No ratings yet
Fyimca Business Mathematics 123 Theory Termwork 2
3 pages
ANSWER KEY Yearly Exame Paper Maths Class 9 Session (2024-25)
No ratings yet
ANSWER KEY Yearly Exame Paper Maths Class 9 Session (2024-25)
12 pages
Acharyakulam Samvaad Test: (For Class 8)
No ratings yet
Acharyakulam Samvaad Test: (For Class 8)
14 pages
DLL MATH-2 Week8 Q2 Final
No ratings yet
DLL MATH-2 Week8 Q2 Final
8 pages
Ch-4 - Introduction To Calculus
No ratings yet
Ch-4 - Introduction To Calculus
51 pages
Hamiltonian Graph
No ratings yet
Hamiltonian Graph
6 pages
Telecom Data Specialist Profile
No ratings yet
Telecom Data Specialist Profile
3 pages
CPU Scheduling Explained
No ratings yet
CPU Scheduling Explained
20 pages
CBSE English Class VI Annual Examination Question Paper Pattern 2022-2023
No ratings yet
CBSE English Class VI Annual Examination Question Paper Pattern 2022-2023
1 page
Zapotec Civilization
No ratings yet
Zapotec Civilization
8 pages
Research Paper Memes - As - Digital - Folk - Tales
No ratings yet
Research Paper Memes - As - Digital - Folk - Tales
9 pages
Intro to Philosophy Course Guide
No ratings yet
Intro to Philosophy Course Guide
3 pages
Class 8 Grammar
No ratings yet
Class 8 Grammar
6 pages
Comparatives and Superlatives Sheets (2523)
No ratings yet
Comparatives and Superlatives Sheets (2523)
2 pages
Tara (Buddhism) - Wikipedia
No ratings yet
Tara (Buddhism) - Wikipedia
13 pages
RELIGION STUDIES P1 GR12 QP SEPT 2023 - English
No ratings yet
RELIGION STUDIES P1 GR12 QP SEPT 2023 - English
16 pages

Week12 Slides

Uploaded by

Week12 Slides

Uploaded by

DSA2101

Essential Data Analytics Tools: Data Visualization

Week 12 Exploring data through visualization

The grades and feedback are available on Canvas:

The final exam worth 40% of your grade.

The exam consists of

At the end of the exam at 11am:

Visualization is an integral part of exploratory data analysis

selected_countries <- c("Singapore", "Malaysia", "Indonesia")

df1_text <- df1 %>% filter(date == "2020-12-31")

ggplot(df1, aes(x = date, y = cases, color = country)) +

Indonesia Malaysia Singapore

ggplot(df1, aes(x = date, y = country)) +

Confirmed cases in 2020

Apr 2020 Jul 2020 Oct 2020 Jan 2021 12 / 46

df2 <- df1 %>%

Confirmed cases in 2020

Monthly confirmed cases in 2020

Colors can be thought of as a three-dimensional concept

Apart from making graphs prettier and more pleasant to look

1. Distinguish between different categorical groups.

These functionalities roughly correspond to different types of

▶ Diverging color palettes

To gain control over colors, we first need to define colors or

## [1] "black" "#DF536B" "#61D04F" "#2297E6" "#28E2E5" "#CD0BB

The palette() command returns colors in 6-digit HEX

▶ The six digits indicate the color.

Colors can also be represented using RGB (red-green-blue)

Most visualization packages, like ggplot2, provide their own

There are various color palettes available in ggthemes.

#1f77b4 #ff7f0e #2ca02c #d62728

#9467bd #8c564b #e377c2 #7f7f7f

A sizable proportion of population can only distinguish fewer

base R palette viridis palette

▶ Specify a single color to a geom:

df_sg <- coronavirus %>%

## country date type cases

Confirmed and recovered cases in Singapore, 2020

Confirmed and recovered cases in Singapore, 2020

type confirmed recovery

Confirmed cases in Singapore, 2020

type confirmed recovery

Confirmed cases in Singapore, 2020

type confirmed recovery

Apart from inclusive colors, we can also include Alt Text

Source: Mary Cesal.

Daily confirmed cases Singapore, 2020

Line chart that shows daily COVID-19 daily confirmed cases in

▶ Here are some of the geom functions we learned so far.

ggplot + geom_point + geom_smooth + + geom_histogram + geom_density

+ geom_line + + geom_col (/bar) + geom_polygon + geom_tile + geom_area

The following data sets are available on canvas:

Identify the geoms used and re-create the plots in ggplot2.

Explore the data and think about possible ways to

You might also like