Computer Programming with R
The apply family of functions
Salma Akter
Lecturer
Department of Statistics
Jagannath University
The apply family of functions
R has several looping functions known as apply family of functions. These functions include
• apply(): Apply a function over the margins of an array
• lapply(): Loop over a list and evaluate a function on each element
• sapply(): Same as lapply but try to simplify the result
• tapply(): Apply a function over subsets of a vector
• mapply(): Multivariate version of lapply
apply ()
The function apply can be used to apply a function to the rows or columns of a matrix.
Syntax:
apply(X, MARGIN, FUN, ...)
The arguments to apply() are
• X is an array
• MARGIN is an integer vector indicating which margins should be “retained”. when MARGIN = 1, it
applies over rows, whereas with MARGIN = 2, it works over columns. Note that when you use the
construct MARGIN = c(1, 2), it applies to both rows and columns.
• FUN is a function to be applied
• ... is for other arguments to be passed to FUN
apply ()
• Ex:
Create a 20 by 10 matrix of Normal random numbers. Then compute the mean of each column.
x <- matrix(rnorm(200), 20, 10)
apply(x, 2, mean)
## We can also compute the sum of each row
apply(x, 1, sum)
apply(x,c(1,2),sum)
Ex:
x<-matrix(runif(50),nrow=5,ncol=10)
apply(x,1,mean)
apply(x,2,sum)
apply ()
## For the special case of column/row sums and column/row means of matrices, we have some useful
shortcuts.
• rowSums = apply(x, 1, sum)
• rowMeans = apply(x, 1, mean)
• colSums = apply(x, 2, sum)
• colMeans = apply(x, 2, mean)
lapply
• The lapply() function does the following simple series of operations:
1. it loops over a list, iterating over each element in that list
2. it applies a function to each element of the list (a function that we specify)
3. and returns a list.
Syntax:
lappy(X, FUN, ...)
This function takes three arguments:
X is a list, dataframe or vector
FUN is a function or more functions to be applied
... is possible arguments for functions
• Have a look
n<-list(pop=8405837, cities= c(“Dhaka”, “Cumilla”, “Rajshahi”))
for ( info in n){
print(class(info))
}
lapply(n,class)
lapply
Ex:
x <- list(a = 1:5, b = rnorm(10))
lapply(x, mean)
Ex:
x <- list(a = 1:4, b= rnorm(20, 1), c = rnorm(100, 5))
lapply(x, mean)
Ex:
x<-c("abc","defghi","d","pqrz")
lapply(x,nchar)
sapply
• The sapply() function behaves similarly to lapply(); the only real difference is in the return value.
sapply() will try to simplify the result of lapply().
• If the result is a list where every element is length 1, then a vector is returned
• If the result is a list where every element is a vector of the same length (> 1), a matrix
is returned.
• If it can’t figure things out, a list is returned.
Syntax:
sappy(X, FUN, ...)
sapply
Ex:
x <- list(a = 1:4, b= rnorm(20, 1), c = rnorm(100, 5))
sapply(x, mean)
Ex:
x<-c("abc","defghi","d","pqrz")
sapply(x,nchar)
tapply
• tapply() is used to apply a function over subsets of a vector. tapply splits the array based on
specified data, usually factor levels and then applies the function to it.
Syntax:
tapply (X, INDEX, FUN = NULL, ..., simplify = TRUE)
The arguments to tapply() are as follows:
• X is a vector
• INDEX is a factor or a list of factors (or else they are coerced to factors)
• FUN is a function to be applied
• … contains other arguments to be passed FUN
• simplify, should we simplify the result?
tapply
• Ex: Consider the dataset dmbp13 in the rugarch package.
install.packages(“rugarch”)
library(rugarch)
data(dmbp)
head(dmbp)
The data set contains the daily percentage nominal returns and a dummy variable
that takes the value of 1 on Mondays and other days following no trading in
the Deutschemark or British pound/ U.S. dollar market during regular European
trading hours and 0 otherwise.
tapply
• If we want to compute the mean daily percentage nominal returns grouped by days (0 or 1), we can
use the tapply function.
ret <- dmbp$V1
The dummy variable should be stored as a factor. We add a vector of labels for
the levels (0 ="Not Monday" and 1 ="Monday").
days <- factor(dmbp$V2, labels = c("Not Monday", "Monday"))
tapply(ret, days, mean) # mean daily % return grouped by days
mapply
• The mapply() function is a multivariate apply of sorts which applies a function in parallel over a set
of arguments. It will apply the specified function to the first element of each argument first,
followed by the second element, and so on.
Ex:
a<-1:5
b<-6:10
mapply(sum,a,b)