R offers a potent suite of functions known as the apply() family, which are designed to streamline many iterative tasks that might otherwise require loops. These functions are not only more concise but often more efficient.
The apply() family is a part of the Base R package and includes functions tailored for manipulating slices of data from matrices, arrays, lists, and data frames in repetitive ways. They facilitate traversing the data through various means while avoiding the explicit use of loop constructs. The apply() functions operate on an input list, matrix, or array and apply a specified function with one or more optional arguments.
These functions are the foundation of more complex operations and enable the execution of tasks with minimal code. Specifically, the family comprises the apply(), lapply(), sapply(), vapply(), mapply(), rapply(), and tapply() functions.
25.1apply()
Input class: matrix, data.frame
Output class: vector
This function is the foundation within the apply() family. It applies a function to margin 1 (rows), margin 2 (columns), or both c(1, 2) (rows and columns).
# Create a 3x5 matrix with normally distributed random variables generated by rnorm()set.seed(123) # Set seed for reproducibilitym <-matrix(rnorm(n =15, mean =100, sd =30), nrow =3, ncol =5)m
Use apply() to calculate the standard deviation of each column in m.
Q2
Use apply() to compute the mean of each column in the mtcars dataset.
25.2lapply()
Input class: list, vector
Output class: list
It applies a function to each element of a list or vector. First, let us examine how it is used with vectors.
# Create a character vector with city namescities <-c("New York", "Philadelphia", "Boston")# Apply the nchar() function to each element in cities to count the number of characterslapply(X = cities, FUN = nchar)
[[1]]
[1] 8
[[2]]
[1] 12
[[3]]
[1] 6
To convert the returned values into a vector, use unlist().
unlist(lapply(X = cities, FUN = nchar))
[1] 8 12 6
Now, let us examine how it works with lists.
# create a listl <-list(a =c(1:3), b =c(4:6), c =c(7:9))l
$a
[1] 1 2 3
$b
[1] 4 5 6
$c
[1] 7 8 9
Calculate the sum for each element in the list:
# apply the sum function to each element in the list 'l'lapply(X = l, FUN = sum)
$a
[1] 6
$b
[1] 15
$c
[1] 24
We can also define our own functions to be applied.
# select the first element from each list item lapply(l, function(z) { z[1] })
$a
[1] 1
$b
[1] 4
$c
[1] 7
Exercise F
Q1
Calculate the mean for each vector in the list l.
25.3sapply()
Input class: list, vector
Output class: vector, matrix, list (if fail to return in the previous two forms)
Similar to lapply(), but the results are typically presented in a simpler and more user-friendly format.
# create a list of temperature measurements over 5 daystemp <-list( c(3, 7, 9, 6, -1), # temperatures for day 1c(6, 9, 12, 13, 5), # temperatures for day 2c(4, 8, 3, -1, -3), # temperatures for day 3c(1, 4, 7, 2, -2), # temperatures for day 4c(5, 7, 9, 4, 2) # temperatures for day 5)temp
To obtain a vector from the output of lapply(), we need to employ unlist().
unlist(lapply(X = temp, FUN = min))
[1] -1 5 -3 -2 2
What if the function supplied to sapply() returns more than one value?
# define a function that returns the minimum and maximum values of a vectorgetMinMax <-function(x) { return(c(min =min(x), max =max(x)))}# apply getMinMax to each element of 'temp' using sapply to return a well-formatted matrixsapply(X = temp, FUN = getMinMax)
[,1] [,2] [,3] [,4] [,5]
min -1 5 -3 -2 2
max 9 13 8 7 9
If we use lapply():
lapply(X = temp, FUN = getMinMax)
[[1]]
min max
-1 9
[[2]]
min max
5 13
[[3]]
min max
-3 8
[[4]]
min max
-2 7
[[5]]
min max
2 9
After unlist():
unlist(lapply(X = temp, FUN = getMinMax))
min max min max min max min max min max
-1 9 5 13 -3 8 -2 7 2 9
In our examination of the temp dataset, employing lapply() with the getMinMax() function yields a list with each element containing the minimum and maximum values of each vector. This list structure, while retaining detailed information, can be less intuitive for immediate data interpretation. Subsequently, applying unlist() converts this output into a single vector, but this transformation sacrifices the inherent organization that was present in the list.
On the other hand, sapply() streamlines the process and directly provides a neatly formatted matrix. Each row corresponds to either the minimum or maximum value, and each column represents a vector from the temp dataset. This matrix format delivered by sapply() is more structured and user-friendly for quick analysis and visualization purposes.
Exercise G
Q1
Obtain the average temperature on each day in temp.
25.4vapply()
Input class: list, vector
Output class: vector, matrix, list (if fail to return in the previous two forms)
It functions identically to sapply(), but it operates faster because you explicitly specify the output type for R, which optimizes performance.
# tells vapply we're expecting 1 number for each elementvapply(X = cities, FUN = nchar, FUN.VALUE =numeric(1))
New York Philadelphia Boston
8 12 6
# returns minimum and maximumvapply(X = temp, FUN = getMinMax, FUN.VALUE =c(min =0, max =0))
[,1] [,2] [,3] [,4] [,5]
min -1 5 -3 -2 2
max 9 13 8 7 9
25.5mapply()
Input class: list, vector
Output class: vector, list (if fail to return as vector)
mapply() is a multivariate version of sapply(). It is called multivariate in the sense that your function must accept multiple arguments. It applies function (the first argument supplied to mapply()) to each element.
mapply(FUN = sum, 1:5, 5:1, -5:-1)
[1] 1 2 3 4 5
What it does is to apply the function sum() where there are three arguments five times, i.e., sum (1, 5, -5), sum (2, 4, -4), …, sum (5, 1, -1).
We apply a function that appends “!” to each text string and adds 1000 to each number in this list.
# create custom function to supply to rapplyaddSomething <-function(x) {if (is.character(x)) {# if element within the list is a character, add !return(paste0(x, "!")) }else {return(x +1000) # if element isn't a character, add 1000 to it }}rapply(object = l1, f = addSomething)