Lecture 10 - Programming Basics

In this lecture note, we will delve into the core of R programming by exploring its building blocks. Think of R as a language: to communicate effectively, you need to know the grammar (syntax), vocabulary (functions), and the rules of conversation (flow control and loops). Let us unpack these one by one.

0.1 R Syntax

my_function <- function(parameters) {
  # Code to execute
  return(result)
}
  • Example:

    add_numbers <- function(a, b) {
      sum <- a + b
      return(sum)
    }
if (condition1) {
  # Code to execute if condition1 is TRUE
} else if (condition2) {
  # Code to execute if condition2 is TRUE
} else {
  # Code to execute if condition1 and condition2 are FALSE
}
  • Example:

    number <- 10
    if (number > 5) {
      print("Number is greater than 5")
    } else {
      print("Number is 5 or less")
    }
for (i in 1:n) {
  # Code to execute for each iteration
}
  • Example:

    for (i in 1:5) {
      print(i)
    }
while (condition) {
  # Code to execute as long as the condition is TRUE
}
  • Example:

    count <- 1
    while (count <= 5) {
      print(count)
      count <- count + 1
    }

0.2 Core Apply Functions

Function Input Output Description
apply(x, margin, FUN) Mutlidimension (dim) Customizable Applies a function over rows or columns of a matrix or dimensions of an array.
lapply(x, FUN) List-type (as.list) List Applies a function to each element of a list or vector and returns a list.
sapply(x, FUN) List-type (as.list) Simplified Like lapply, but tries to simplify output.
vapply(x, FUN, FUN.VALUE) List-type (as.list) User-defined Like sapply, but enforces a specific return type.

1 Functions

Functions are the vocabulary of R. They are pre-written scripts that perform specific tasks, saving you the effort of rewriting common procedures. Just like using the right word in a sentence, using the right function is key to an efficient and effective R program. Think of each function as a tool in your toolbox. The more tools you have and know how to use, the easier it will be to solve programming challenges. We have already mastered many functions for a variety of tasks. Now, our focus will shift to learning how to craft our own tools.

To create custom functions in R, follow this function structure:

function_name <- function(input) {
  # Do something
  return(output)
}

For example,

# create a function to calculate squared values
getSquare <- function(x) {
  squared <- x * x
  return(squared)
}

Let us write a function that calculates the area of a rectangle. This function will require two arguments: the length and the width of the rectangle.

# create a function to calculate area
calculateArea <- function(length, width) {
  area <- length * width
  return(area)
}

In this function named calculateArea, we have two parameters: length and width. We calculate the area by multiplying these two parameters and then return the result.

You can use this function by passing the length and width values as arguments:

rect_area <- calculateArea(5, 3)
print(rect_area)
[1] 15

Exercise A

Q1

Write a function that performs addition on two variables. Test it with the input values \(3\) and \(9\).

Q2

Write a ‘greet’ function that welcomes a user by a specified name. It should return a message like, “Welcome, Amy!” Use the structure provided below.

greetSomeone <- function(name = "user") {
  ## enter your code here
}
# you can test it by
# greetSomeone("Amy")
# greetSomeone("Bob")

2 Flow control and loops

Flow control in R is like choosing your path in a choose-your-own-adventure book. It allows your program to make decisions, repeat actions, and change its course based on certain conditions. Flow control can be tricky. Start with simple if-else structures and gradually build complexity by adding more conditions and integrating loops.

2.1 if statement

The if statement is a fundamental aspect of flow control in programming. It allows your program to execute a block of code only if a specified condition is true. Think of it as a decision-making junction, where your program can take different paths based on certain criteria. In R, the basic syntax of an if statement is straightforward, yet it holds the power to add significant logic and complexity to your code.

The structure of an if statement in R is as follows:

if (condition) {
  # Do something
} else {
  # Do something different
}

For example,

# Assign the value 3 to the variable 'i'
i <- 3

# Check if 'i' is greater than 3
if (i > 3) {
  print("Yes") # If 'i' is greater than 3, print "Yes"
} else {
  print("No") # If 'i' is not greater than 3, print "No"
}
[1] "No"

Conditions are typically represented by logical statements such as a != b, a > b, a >= b, a < b, a <= b, or a %% b == c, among others. These can be combined using the logical operators & (AND) and | (OR).

# Assign the value 5 to the variable 'i'
i <- 5

# Check if 'i' is greater than 2 AND less than 4
if (i > 2 & i < 4) {
  print("Yes")  # If both conditions are true, print "Yes"
} else {
  print("No")   # If either condition is false, print "No"
}
[1] "No"
# Check if 'i' is greater than 2 OR less than 4
if (i > 2 | i < 4) {
  print("Yes")  # If at least one condition is true, print "Yes"
} else {
  print("No")   # If both conditions are false, print "No"
}
[1] "Yes"

In the first if statement, the condition uses the logical AND operator &, which requires both conditions to be true. In the second if statement, the condition uses the logical OR operator |, which requires at least one condition to be true.

If there are more than two alternatives, we can construct an if...else ladder using else if.

if (condition1) {
  # Do statement 1
} else if (condition2) {
  # Do statement 2
} else if (condition3) {
  # Do statement 3
} else {
  # Do statement 4
}

For example,

# Assign the value 3 to the variable 'i'
i <- 3

# Check if 'i' is greater than 3
if (i > 3) {
    i <- i + 1   # If true, increase 'i' by 1
} else if (i < 3) {  # If the first condition is false, check if 'i' is less than 3
    i <- i - 1   # If true, decrease 'i' by 1
} else {  # If none of the above conditions are true
    print("Good guess")  # Print "Good guess"
}
[1] "Good guess"

Note the format here. Command statements always appear on a separate line and do not require any accompanying brackets.

Exercise B

Q1

Without directly utilizing the abs() function, create a custom function to compute the absolute value of any given number: it should return the number unchanged if it is positive, and its positive counterpart if it is negative. Verify the functionality by calculating the absolute values of 7 and -7.

Q2

Without using the max() function, create a custom function that identifies the larger of two numbers with an if statement. Test its accuracy by determining the maximum between 376 and 924.

2.2 Loops

Loops allow you to perform repetitive tasks efficiently. Instead of writing the same code over and over, a loop can iterate through data, performing the same action on each element. Always think about your exit condition. An infinite loop is like a conversation that never ends—not very productive!

R executes loops using the syntax shown below, with for and while being the most common types.

for loop

A for loop is a control flow statement that allows code to be executed repeatedly through specified iterations.

for (variable in sequence) {
  # Do something
}

For example,

# Initialize a 'for' loop to iterate over a sequence from 1 to 4
for (i in 1:4) {
  j <- i + 10  # Add 10 to the current value of 'i' and assign it to 'j'
  print(j)     # Print the value of 'j' for each iteration
}
[1] 11
[1] 12
[1] 13
[1] 14

Exercise C

Q1

Write a for loop to print the mean of each column in the mtcars dataset. Additionally, how would you modify the loop to store all the mean values in a vector?

Q2

Write a for loop that adds numbers from 1 to 10.

while loop

A while loop is a control flow statement that repeatedly executes code as long as a specified boolean condition is true. It can be considered a recurring if statement.

while (condition) {
  # Do something
}

For example,

# Initialize the variable 'i'
i <- 1

# While loop: runs as long as 'i' is less than 5
while (i < 5) {
  print(i)  # Prints the current value of 'i'
  i <- i + 1  # Increments 'i' by 1
}
[1] 1
[1] 2
[1] 3
[1] 4

A common pitfall with while loops is creating an infinite loop, where the loop’s exit condition is never met. Here’s an example of a poorly constructed while loop that will run indefinitely:

# CAUTION: THE FOLLOWING IS AN INFINITE LOOP.
# TERMINATE IT USING Ctrl + C IF IT RUNS.

# Initialize the variable 'i'
i <- 1
 
while (i < 5) {
  print(i)  
  i <- i - 1  # Mistakenly decremented i, which means it will never exceed 5
}

In this example, i starts at 1 and is continually decreases. Since i will never exceed 5, the loop will run endlessly.

To kill an infinite loop in R:

  • If you are using R in a terminal or console, you can usually stop the process by pressing Ctrl + C.
  • If you are using an IDE like RStudio, there is typically a “Stop” button (usually a red square) in the console or script pane that you can click to terminate the running script.

After stopping the infinite loop, you will want to debug your code to ensure that the exit condition can be met to avoid future infinite loops.

Exercise D

Q1

Use a while loop to print the square roots of integers from 10 to 0.

Q2 (Optional)

Employ a while loop to determine the smallest integer \(n\) for which the sum \(\sum_{i=1}^n i^2\) exceeds 1000.

3 The apply() family in R

R offers a potent suite of functions known as the apply() family, which are designed to streamline many iterative tasks that might otherwise require loops. These functions are not only more concise but often more efficient.

The apply() family is a part of the Base R package and includes functions tailored for manipulating slices of data from matrices, arrays, lists, and data frames in repetitive ways. They facilitate traversing the data through various means while avoiding the explicit use of loop constructs. The apply() functions operate on an input list, matrix, or array and apply a specified function with one or more optional arguments.

These functions are the foundation of more complex operations and enable the execution of tasks with minimal code. Specifically, the family comprises the apply(), lapply(), sapply(), vapply(), mapply(), rapply(), and tapply() functions.

3.1 apply()

  • Input class: matrix, data.frame
  • Output class: vector

This function is the foundation within the apply() family. It applies a function to margin 1 (rows), margin 2 (columns), or both c(1, 2) (rows and columns).

# Create a 3x5 matrix with normally distributed random variables generated by rnorm()
set.seed(123)  # Set seed for reproducibility
m <- matrix(rnorm(n = 15, mean = 100, sd = 30), nrow = 3, ncol = 5)
m
          [,1]     [,2]      [,3]      [,4]      [,5]
[1,]  83.18573 102.1153 113.82749  86.63014 112.02314
[2,]  93.09468 103.8786  62.04816 136.72245 103.32048
[3,] 146.76125 151.4519  79.39441 110.79441  83.32477

With the apply() function, we can find the mean of each row in m by

apply(X = m, MARGIN = 1, FUN = mean)
[1]  99.55635  99.81288 114.34536

We can verify this result using the rowMeans() function.

rowMeans(m)
[1]  99.55635  99.81288 114.34536

Similarly, to calculate the mean of each column, we can use the apply() function as follows.

apply(X = m, MARGIN = 2, FUN = mean)
[1] 107.68055 119.14861  85.09002 111.38234  99.55613

Verify this result using colMeans():

colMeans(m)
[1] 107.68055 119.14861  85.09002 111.38234  99.55613

Exercise E

Q1

Use apply() to calculate the standard deviation of each column in m.

Q2

Use apply() to compute the mean of each column in the mtcars dataset.

3.2 lapply()

  • Input class: list, vector
  • Output class: list

It applies a function to each element of a list or vector. First, let us examine how it is used with vectors.

# Create a character vector with city names
cities <- c("New York", "Philadelphia", "Boston")

# Apply the nchar() function to each element in cities to count the number of characters
lapply(X = cities, FUN = nchar)
[[1]]
[1] 8

[[2]]
[1] 12

[[3]]
[1] 6

To convert the returned values into a vector, use unlist().

unlist(lapply(X = cities, FUN = nchar))
[1]  8 12  6

Now, let us examine how it works with lists.

# create a list
l <- list(a = c(1:3), b = c(4:6), c = c(7:9))
l
$a
[1] 1 2 3

$b
[1] 4 5 6

$c
[1] 7 8 9

Calculate the sum for each element in the list:

# apply the sum function to each element in the list 'l'
lapply(X = l, FUN = sum)
$a
[1] 6

$b
[1] 15

$c
[1] 24

We can also define our own functions to be applied.

# select the first element from each list item 
lapply(l, function(z) { z[1] })
$a
[1] 1

$b
[1] 4

$c
[1] 7

Exercise F

Q1

Calculate the mean for each vector in the list l.

3.3 sapply()

  • Input class: list, vector
  • Output class: vector, matrix, list (if fail to return in the previous two forms)

Similar to lapply(), but the results are typically presented in a simpler and more user-friendly format.

# create a list of temperature measurements over 5 days
temp <- list( 
    c(3, 7, 9, 6, -1), # temperatures for day 1
    c(6, 9, 12, 13, 5), # temperatures for day 2
    c(4, 8, 3, -1, -3), # temperatures for day 3
    c(1, 4, 7, 2, -2), # temperatures for day 4
    c(5, 7, 9, 4, 2) # temperatures for day 5
)
temp
[[1]]
[1]  3  7  9  6 -1

[[2]]
[1]  6  9 12 13  5

[[3]]
[1]  4  8  3 -1 -3

[[4]]
[1]  1  4  7  2 -2

[[5]]
[1] 5 7 9 4 2

Let us determine the minimum temperature for each day.

# determine the minimum temperature for each day (returns a vector)
sapply(X = temp, FUN = min)
[1] -1  5 -3 -2  2

Compare the outputs of sapply() and lapply():

# determine the minimum temperature for each day (returns a list)
lapply(X = temp, FUN = min)
[[1]]
[1] -1

[[2]]
[1] 5

[[3]]
[1] -3

[[4]]
[1] -2

[[5]]
[1] 2

To obtain a vector from the output of lapply(), we need to employ unlist().

unlist(lapply(X = temp, FUN = min))
[1] -1  5 -3 -2  2

What if the function supplied to sapply() returns more than one value?

# define a function that returns the minimum and maximum values of a vector
getMinMax <- function(x) { 
    return(c(min = min(x), max = max(x)))
}

# apply getMinMax to each element of 'temp' using sapply to return a well-formatted matrix
sapply(X = temp, FUN = getMinMax)
    [,1] [,2] [,3] [,4] [,5]
min   -1    5   -3   -2    2
max    9   13    8    7    9

If we use lapply():

lapply(X = temp, FUN = getMinMax)
[[1]]
min max 
 -1   9 

[[2]]
min max 
  5  13 

[[3]]
min max 
 -3   8 

[[4]]
min max 
 -2   7 

[[5]]
min max 
  2   9 

After unlist():

unlist(lapply(X = temp, FUN = getMinMax))
min max min max min max min max min max 
 -1   9   5  13  -3   8  -2   7   2   9 

In our examination of the temp dataset, employing lapply() with the getMinMax() function yields a list with each element containing the minimum and maximum values of each vector. This list structure, while retaining detailed information, can be less intuitive for immediate data interpretation. Subsequently, applying unlist() converts this output into a single vector, but this transformation sacrifices the inherent organization that was present in the list.

On the other hand, sapply() streamlines the process and directly provides a neatly formatted matrix. Each row corresponds to either the minimum or maximum value, and each column represents a vector from the temp dataset. This matrix format delivered by sapply() is more structured and user-friendly for quick analysis and visualization purposes.

Exercise G

Q1

Obtain the average temperature on each day in temp.

3.4 vapply()

  • Input class: list, vector
  • Output class: vector, matrix, list (if fail to return in the previous two forms)

It functions identically to sapply(), but it operates faster because you explicitly specify the output type for R, which optimizes performance.

# tells vapply we're expecting 1 number for each element
vapply(X = cities, FUN = nchar, FUN.VALUE = numeric(1))
    New York Philadelphia       Boston 
           8           12            6 
# returns minimum and maximum
vapply(X = temp, FUN = getMinMax, FUN.VALUE = c(min = 0, max = 0))
    [,1] [,2] [,3] [,4] [,5]
min   -1    5   -3   -2    2
max    9   13    8    7    9

3.5 mapply()

  • Input class: list, vector
  • Output class: vector, list (if fail to return as vector)

mapply() is a multivariate version of sapply(). It is called multivariate in the sense that your function must accept multiple arguments. It applies function (the first argument supplied to mapply()) to each element.

mapply(FUN = sum, 1:5, 5:1, -5:-1)
[1] 1 2 3 4 5

What it does is to apply the function sum() where there are three arguments five times, i.e., sum (1, 5, -5), sum (2, 4, -4), …, sum (5, 1, -1).

Let us see another example:

mapply(rep, 1:4, 4:1)
[[1]]
[1] 1 1 1 1

[[2]]
[1] 2 2 2

[[3]]
[1] 3 3

[[4]]
[1] 4

This is equivalent to writing the rep() four times.

list(rep(1, 4), rep(2, 3), rep(3, 2), rep(4, 1))
[[1]]
[1] 1 1 1 1

[[2]]
[1] 2 2 2

[[3]]
[1] 3 3

[[4]]
[1] 4

3.6 rapply()

  • Input class: list
  • Output class: list

It is used when applying a function to each element of a nested list structure, recursively. Not so commonly used. For example,

l1 <- list(a = list("A", "B", "C"), b = c(1, 100), c = list("Hey"))
l1
$a
$a[[1]]
[1] "A"

$a[[2]]
[1] "B"

$a[[3]]
[1] "C"


$b
[1]   1 100

$c
$c[[1]]
[1] "Hey"

We apply a function that appends “!” to each text string and adds 1000 to each number in this list.

# create custom function to supply to rapply
addSomething <- function(x) {
    if (is.character(x)) {# if element within the list is a character, add !
        return(paste0(x, "!"))
    }
    else {
        return(x + 1000) # if element isn't a character, add 1000 to it
    }
}

rapply(object = l1, f = addSomething)
    a1     a2     a3     b1     b2      c 
  "A!"   "B!"   "C!" "1001" "1100" "Hey!" 

3.7 tapply()

  • Input class: vector
  • Output class: array
#create vector
x <- c(1, 2, 3, 10, 20, 30, 100, 200, 300)

#create grouping variable (3 groups)
groups <- c("a", "a", "a", "b", "b", "b", "c", "c", "c")

tapply(X = x, INDEX = groups, FUN = mean)
  a   b   c 
  2  20 200 

When working with a data frame:

head(ChickWeight)
# There are four types of diets (1, 2, 3, 4)
table(ChickWeight$Diet)

  1   2   3   4 
220 120 120 118 
tapply(ChickWeight$weight, ChickWeight$Diet, mean)
       1        2        3        4 
102.6455 122.6167 142.9500 135.2627