Graphics using R
Corso di laurea magistrale in
Psicologia del Lavoro e delle Organizzazioni
8 May 2017
Giovanni Luca Lo Magno
INVALSI test
A bad graph
An interesting graph
From “Le Scienze”, September 2011
Hertzsprung–Russell diagram
Pen's parade
Circular layout
Why R for graphics?
Resources for learning
● manuals
● guides
● cookbooks
● tutorials
● blogs
● forums
● www.stackoverflow.com
R is made up of packages
Load a package:
library(packagename)
Main graphic packages
grDevices
graphics grid
maps lattice
Vector vs. raster images
Vector image Raster image
(50 x 50 pixels)
Resolution affects image quality
(50 x 50 pixels) (25 x 25 pixels) (10 x 10 pixels)
Anti-aliasing
No anti-aliasing With anti-aliasing
Alpha blending
One line Two overlapped lines
Best practices
● prefer vector graphics
● ≥ 300 dpi for raster images
● check paper output
Let's have a look
demo(graphics)
More examples?
example(plot)
example(hist)
example(barplot)
example(boxplot)
Graphical user interface (GUI) vs. text-based interface
GUI Text-based interface
● point-and-click ● type commands
● easy to learn ● not easy to learn
● easy to use ● not easy to remember
● little automation ● excellent automation
Painters model
First
Second
New paint partially or completely obscures the old
Functions in the graphic system
● high-level functions
● low-level functions
● interactive functions
Graphical devices
Graphic Graphic
Device
commands output
One input, several outputs
Input Device Output
Graphic commands
windows()
screen
bitmap()
file .png
A typical session
1) Open device bitmap(file="rastertest")
2) Graphic commands plot(1:10)
3) Close device dev.off()
Graphical devices
Screen File Other
● x11() ● postscript() ● devGTK()
● windows() ● pdf() ● devJava()
● quartz() ● pictex() ● devSVG()
● xfig()
● bitmap()
● png()
● jpeg()
● win.metafile()
● bmp()
Managing devices
Return open devices
dev.list()
Return current device
dev.cur()
Close current device
dev.off()
Close all open devices
graphics.off()
Cartesian coordinate system
Y
P(x,y)
y
O x X
The graphic box model
outer margin 3
figure region
figure margin 2 figure margin 3
figure margin 4
outer margin 2
outer margin 4
plot region
figure margin 1
outer margin 1
The graphic box model: an example
The graphic box model: an example
Figure margin 3
Figure margin 4
Figure margin 2 Plot region
Figure margin 1
The graphic box model: an example
x = rnorm(50)
y = rnorm(50)
plot(x, y, main="An example graph",
xlim=c(-3, 3), ylim=c(-3, 3))
Adding boxes
plot.new()
box(which="plot")
box(which="figure")
box(which="outer")
Note: no outer margins by default
Exploring the margins
plot.new()
plot.window(c(0,10), c(0,2))
points(c(0,0,10,10), c(0, 2, 0, 2))
(0,2) (10,2)
(0,0) (10,0)
Exploring the margins and the box
plot.new()
plot.window(c(0,10), c(0,2))
points(c(0,0,10,10), c(0, 2, 0, 2))
box()
(0,2) (10,2)
(0,0) (10,0)
Multiple figure regions
Outer margin 3
Figure region 1 Figure region 2
Plot region 1 Plot region 2
Outer margin 2
Outer margin 4
Figure region 3 Figure region 4
Plot region 3 Plot region 4
Figure region 5 Figure region 6
Plot region 5 Plot region 6
Outer margin 1
Coordinate system in the plot region
Max y value
y (x, y)
Min y value
Min x value x Max x value
Several types of plot
plot(y, type="p") plot(y, type="l")
plot(y, type="b") plot(y, type="n")
Plot step-by-step: data
> set.seed(123456)
> y <- rnorm(20)
> y
[1] 0.83373317 -0.27604777 -0.35500184 0.08748742
[5] 2.25225573 0.83446013 1.31241551 2.50264541
[9] 1.16823174 -0.42616558 -0.99612975 -1.11394990
[13] -0.05573154 1.17443240 1.05321861 0.05760597
[17] -0.73504289 0.93052842 1.66821097 0.55968789
> range(y)
[1] -1.113950 2.502645
Plot step-by-step: start a new plot
plot.new()
Plot step-by-step: set up coordinate system
plot.window(c(1, 20), c(-1.2, 2.6))
Plot step-by-step: add grid
grid(col="lightgray", lty="solid")
Plot step-by-step: add points
points(y)
Plot step-by-step: add x-axis
axis(1, at=c(1, 10, 20))
Plot step-by-step: add y-axis
axis(2, at=c(-1.2, 0, 2.6))
Plot step-by-step: add x-axis title
title(xlab="X")
Plot step-by-step: add y-axis title
title(ylab="Y")
Plot step-by-step: add main title
title(main="My graph title")
Plot step-by-step: let review all the code
set.seed(123456)
y <- rnorm(20)
plot.new()
plot.window(c(1, 20), c(-1.2, 2.6))
grid(col="lightgray", lty="solid")
points(y)
axis(1, at=c(1, 10, 20))
axis(2, at=c(-1.2, 0, 2.6))
title(xlab="X")
title(ylab="Y")
title(main="My graph title")
Plot step-by-step: create SVG file
set.seed(123456)
y <- rnorm(20)
open SVG device svg(file="mygraph.svg")
plot.new()
plot.window(c(1, 20), c(-1.2, 2.6))
grid(col="lightgray", lty="solid")
points(y)
graphic commands
axis(1, at=c(1, 10, 20))
axis(2, at=c(-1.2, 0, 2.6))
title(xlab="X")
title(ylab="Y")
title(main="My graph title")
close device dev.off()
Best practices: comment and save the script
# Data
set.seed(123456)
y <- rnorm(20)
# Open device
svg(file="final.svg")
# Init frame
plot.new()
plot.window(c(1, 20), c(-1.2, 2.6))
# Grid
grid(col="lightgray", lty="solid")
# Points
points(y)
# Axes
axis(1, at=c(1, 10, 20))
axis(2, at=c(-1.2, 0, 2.6))
# Titles
title(xlab="X")
title(ylab="Y")
title(main="My graph title")
# Close device
dev.off()
Overlapping points: the problem
x <- c(1, 2, 2, 3, 3, 3, 4, 4, 4, 4)
y <- c(2, 6, 6, 8, 8, 8, 10, 10, 10, 10)
plot(x=x, y=y)
4 points
3 points
2 points
1 point
Overlapping points: jitter (add noise)
x <- c(1, 2, 2, 3, 3, 3, 4, 4, 4, 4)
y <- c(2, 6, 6, 8, 8, 8, 10, 10, 10, 10)
plot(x=jitter(x), y=jitter(y), xlab="x", ylab="y")
Overlapping points: sunflower plot
1 2 3 4 5 6 7 8 9 10
Overlapping points: a sunflower plot example
x <- c(1, 2, 2, 3, 3, 3, 4, 4, 4, 4)
y <- c(2, 6, 6, 8, 8, 8, 10, 10, 10, 10)
sunflowerplot(x=x, y=y)
Overlapping points: another sunflower plot example
sunflowerplot(x=iris$Petal.Length, y=iris$Petal.Width)
Pie chart
data <- c(12, 5, 4)
labels <- c("Italian", "English", "Spanish")
pie(data, labels=labels)
Example:
“Italian” pie slice
From vector data to distribution
> data <- c("No", "Maybe", "Maybe", "Yes", "No",
"Yes", "Yes", "Yes", "No", "Yes")
> distribution <- table(data)
> distribution
data
Maybe No Yes
2 3 5
Pie chart of distribution
> data <- c("No", "Maybe", "Maybe", "Yes", "No",
"Yes", "Yes", "Yes", "No", "Yes")
> distribution <- table(data)
> pie(distribution)
Histogram with absolute frequencies
set.seed(123456)
data <- rnorm(1000)
hist(data)
= number of bins
= number of obs.
Histogram with relative frequencies (density)
set.seed(123456)
data <- rnorm(1000)
hist(data, freq=FALSE)
= number of bins
Histogram with not equal bins
hist(data, breaks=c(-4, 0, 1, 3)))
= number of bins
= number of obs.
= number of obs. in
the i-th bin
= width of the i-th bin
Calculate density for histogram: an example
> n <- length(data)
> n1 <- length(data[which(data > -4 & data <=0)])
> f1 <- n1 / n
> f1
[1] 0.479
> w1 <- 4
> d1 <- f1 / w1
> d1
[1] 0.11975
0.479
0.11975
Box plot (or box-and-whisker plot)
max or other value
whisker
third quartile
box second quartile
(median)
first quartile
whisker
min or other value
Box plot: highlighting outliers
> data <- airquality$Ozone
> s <- summary(data)
> print(s)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.00 18.00 31.50 42.13 63.25 168.00 37
> q1 <- s[[2]]
> q3 <- s[[5]]
> iqr = q3 - q1
> q1 - 1.5*iqr outliers
[1] -49.875
> q3 + 1.5*iqr
[1] 131.125 highest value
> boxplot(airquality$Ozone) within 1.5·IQR
lowest value
within 1.5·IQR
Box plot: highlighting min and max
boxplot(airquality$Ozone, range=0)
max
3Q
2Q
1Q
min
Box plot: groups of data
d <- read.table(file="Employee_data.txt")
boxplot(d$salary ~ d$gender) boxplot(d$salary ~ d$gender, range=0)
Box plot: horizontal orientation
d <- read.table(file="Employee_data.txt")
boxplot(d$salary ~ d$gender, horizontal=TRUE)
Box plot + data points
d <- read.table("Employee_data.txt")
salaryf <- d$salary[which(d$gender=="Female")]
boxplot(salaryf, range=0)
x <- rep(1, length(salaryf))
points(x, salaryf)
Box plot + jittered data points
d <- read.table("Employee_data.txt")
salaryf <- d$salary[which(d$gender=="Female")]
boxplot(salaryf, range=0)
x <- rep(1, length(salaryf))
x <- jitter(x, factor=8)
points(x, salaryf)
jittering
Bar plot
d <- read.table("Employee_data.txt")
jobcattable <- table(d$jobcat)
barplot(jobcattable)
Stacked bar plot
> d <- read.table("Employee_data.txt")
> subd <- data.frame(gender = d$gender,
jobcat = d$jobcat)
> t <- table(subd)
> print(t)
d.jobcat
d.gender Clerical Custodial Manager
Female 206 0 10
Male 156 27 74
> barplot(t)
Stacked bar plot: add a legend (inside the plot area)
barplot(t, legend.text=c("female", "male"))
Stacked bar plot: relative frequencies
> d <- read.table("Employee_data.txt")
> subd <- data.frame(gender=d$gender, jobcat=d$jobcat)
> t <- table(subd)
> rt <- prop.table(t, 2)
> print(rt)
jobcat
gender Clerical Custodial Manager
Female 0.5690608 0.0000000 0.1190476
Male 0.4309392 1.0000000 0.8809524
> barplot(rt)
The device as a state machine: the “par” command
List graphic parameters:
par()
Set a graphic parameter:
par(col=2)
Multiple plots: basic layouts
par(mfrow=c(3,2)) par(mcol=c(3,2))
1 2 1 4
3 4 2 5
5 6 3 6
Multiple plots: projecting our first layout
par(mfcol=c(2,1))
Male
Salary
Experience
Female
Salary
Experience
Tip: use paper and pencil when projecting
Multiple plots: basic layout
male <- d[d$gender=="Male",]
female <- d[d$gender=="Female",]
par(mfcol=c(2,1))
plot(x=male$prevexp, y=male$salary,
main="Male", xlab="Experience",
ylab="Salary", ylim=c(15000, 135000))
plot(x=female$prevexp, y=female$salary,
main="Female", xlab="Experience",
ylab="Salary", ylim=c(15000, 135000))
More advanced multiple plots: the “layout” command
m <- matrix(c(1,1,2,3), nrow=2, ncol=2, byrow=TRUE)
layout(m, width=c(4,4), height=c(2, 3))
1 1 height=2
2 3 height=3
width=4 width=4
An example of use of the “layout” command
l <- matrix(c(1,1,2,3), nrow=2, ncol=2,
byrow=TRUE)
layout(l, height=c(2, 3))
barplot(table(d$jobcat), main="Job category")
plot(x=male$prevexp, y=male$salary,
main="Male", xlab="Experience",
ylab="Salary", ylim=c(15000, 135000))
plot(x=female$prevexp, y=female$salary,
main="Female", xlab="Experience",
ylab="Salary", ylim=c(15000, 135000))
An overlapping legend
edudata <- matrix(c(0.4, 0.6, 0.3, 0.7, 0.2, 0.8), nrow=2, ncol=3)
colors <- c("gray50", "gray80")
barplot(edudata, xlab="Education", names.arg=c("low", "medium", "high"),
col=colors, legend.text=c("female", "male"))
Adding legend by using the “layout” command
edudata <- matrix(c(0.4, 0.6, 0.3, 0.7, 0.2, 0.8), nrow=2, ncol=3)
mlayout <- matrix(c(1,2), nrow=2, ncol=1)
colors <- c("gray50", "gray80")
par(mai=c(0.8, 0.6, 0.1, 0.2)) # bottom, left, top, right
layout(mlayout, height=c(9, 3))
barplot(edudata, xlab="Education", names.arg=c("low", "medium", "high"),
col=colors)
plot.new()
par(mai=c(0, 0, 0, 0)) # bottom, left, top, right
plot.window(xlim=c(0,1), ylim=c(0,1))
legend(x=0.5, y=0.5, xjust=0.5, yjust=0.5, legend = c("male", "female"),
fill = colors)
Multiple graphs setting the figure regions
par(fig=c(0, 0.8, 0, 0.8), new=FALSE)
plot(x=d$prevexp, y=d$salary,
xlab="Experience", ylab="Salary")
par(fig=c(0, 0.8, 0.55, 1), new=TRUE)
boxplot(d$prevexp, horizontal=TRUE,
axes=FALSE)
par(fig=c(0.65, 1, 0, 0.8), new=TRUE)
boxplot(d$salary, axes=FALSE)
Plotting fitted regression line
# Data
n <- 50
x <- 0:(n-1)
real_a <- 5
real_b <- 0.1
logy <- real_a + real_b*x +rnorm(n)
y <- exp(logy)
# Estimation
est <- lm(log(y) ~ x)
# Graph
plot(log(y) ~ x)
abline(est, col="red")
Plotting fitted regression line for log-linear model
# Data
n <- 50
x <- 0:(n-1)
real_a <- 5
real_b <- 0.1
logy <- real_a + real_b*x +rnorm(n)
y <- exp(logy)
# Estimation
est <- lm(log(y) ~ x)
a <- est$coefficients[[1]]
b <- est$coefficients[[2]]
fitted <- exp(a+b*x)
# Graph
plot(y ~ x)
lines(y=fitted, x=x, col="red")
Thanks for your kind attention
[email protected]