Lecture 2 - R Basics

  1. Arithmetic Operations
  2. Variable Assignment
  3. Objects
  4. Vectors
  5. Matrices
  6. Data frames

flowchart LR
  A(Arithmetic Operations) --> B(Variable Assignment)
  B --> C(Objects)
  C --> D(Vectors)
  C --> E(Matrices)
  C --> F(Data frames)

1 Arithmetic Operations

+, -, *, /, ^ work in the same way as a calculator, e.g., exponentiate:

It also works with modulo: quotient – %/%, remainder – %%:

R uses functions to perform operations. To run a function called funcname, we type funcname(input1, input2), where the inputs (or arguments) input1 and input2 tell R how to run the function. A function can have any number of inputs. Maths functions such as Square root sqrt(), Logrithm log(), Exponentiation exp(), Absolute value abs() are built-in functions.

Exercise A

Q1

Calculate 20 mod 6 (i.e., the remainder of 20 divided by 6).

Q2

Calculate \((2+5)\) squared divided by \(3\times\sqrt{e^{2.4} - |8.6-9.2|}\).

2 Variable Assignment

You can store a value (e.g. 2) or an object (e.g. a data frame) in a variable in R. You can later call this variable’s name to easily access the value or the object that has been stored within this variable. For example, you can assign a value 2 to a variable my_var with the command.

<- is the assignment operator in R. It can be also replaced by =. Assigning new values leads to overwriting previously assigned values. Variables already defined are shown in the environment window. By entering the variable name, the values associated with the variable are automatically printed.

Exercise B

Q1
  1. Define a variable named new_var with value 100,000.
  2. Multiply my_var (defined in my notes as 4 now) by new_var.
Q2
  • Define a variable named my_uni with a string "Arcadia".

The value of a variable can take different types, including logical values/boolean TRUE or FALSE.

Warning

When naming your variables, you can use letters, numbers, dots, and underscores.

However, no variable names can start with a number or a dot followed by a number.

3 Objects

Common objects in R include:

  • vectors,
  • matrices,
  • lists,
  • data frames (Data frames are like matrices, but instead of numbers, the entries can be numbers or other types of variables, e.g., factors, dates, etc).

Entries in the objects can be categorized into certain data types.

  • numeric (double precision real numbers),
  • integer (specified by an L suffix),
  • complex,
  • logical,
  • character (non-numeric data, for example, the address of a patient),
  • factors (categorical variables).

A vector can only contain data of the same class, but a list can contain data of different classes.

We could use class() and str() to check the object/data type.

Some people prefer to use the attributes() function.

Exercise C

Q1

Use class(), str() and attributes() to check the object/data type of iris and 'iris'.

4 Vectors

4.1 Forming a vector

To create a vector of numbers, we use the function c() (for concatenate). Any number inside the parentheses are joined together. The following command instructs R to join together the numbers 1, 3, 2, and 5, and to save them as a vector named x. When we type x, it returns the vector.

The : operator can be used to generate a sequence of consecutive integers.

If we would like to create more general arithmetic sequences, we can apply the seq() function.

There is the rep() function specially useful for creating vectors with repetitive elements.

Compare it with

Exercise D

Create the vectors:

Q1

(20, 19, ..., 1).

Q2

(1, 2, 3, ..., 19, 20, 19, 18, ..., 2, 1).

Q3

(4, 6, 3) and assign it to the variable called tmp.

Q4

(4, 6, 3, 4, 6, 3, ..., 4, 6, 3) where there are 10 occurrences of 4, 6, 3 based on tmp.

Q5

(4, ..., 4, 6, ..., 6, 3, ..., 3) where there are 10 occurrences of 4, 20 occurrences of 6 and 30 occurrences of 3, based on tmp.

Q6

A vector called boolean_vector containing the three elements: TRUE, FALSE and TRUE (in that order).

4.2 Arithmetic operations

Arithmetic operators apply to vectors where the operation is taken element by element.

However, when applying mathematical operators, x and y must be the same length. We can check their length using the length() function.

Tip
  • Hitting \(\uparrow\) multiple times will display previous commands which can be further edited. This is useful when one often wishes to repeat a similar command.

  • The ls() function allows us to review a list of all of the objects, such as data and functions, that we have saved so far.

  • The rm() function can be used to delete variables that are no longer needed.

    It is also possible to remove all objects at once:

Exercise E

Q1

Create a vector of \(e^x\) evaluated at x=3, 3.1, 3.2, ..., 6, respectively.

Q2 (Optional)

Create a vector of \((-1^3, 2^3, -3^3, \ldots, -9^3, 10^3)\).

4.3 Selecting Elements

Select the indexing technique appropriate for your problem:

  • Use square brackets to select vector elements by their position, such as vec[2] for the second element of vec.

  • Use negative indexes to exclude elements.

  • Use a vector of indexes to select multiple values.

  • Use a logical vector to select elements based on a condition.

  • Use names to access named elements.

4.4 Comparison operations

By making use of comparison operators, we can approach some questions in a more proactive way.

The (logical) comparison operators known to R are:

  • < for less than
  • > for greater than
  • <= for less than or equal to
  • >= for greater than or equal to
  • == for equal to each other
  • != not equal to each other

The nice thing about R is that you can use these comparison operators also on vectors.

This command tests for every element of the vector if the condition stated by the comparison operator is TRUE or FALSE.

Working with comparisons will make your data analytical life easier. Instead of selecting a subset to investigate yourself, you can simply ask R to return only the elements that satisfy the underlying condition. You can select the desired elements by putting the conditional statement between the square brackets that follow the vector of interest:

R knows what to do when you pass a logical vector in square brackets: it will only select the elements that correspond to TRUE in my_cond.

Exercise F

Q1

Determine which days have a stock price lower than 209.

5 Matrices

5.1 Basics

The matrix() function can be used to create a matrix of numbers.

Note that we could just as well omit typing data =, nrow= and ncol = in the matrix() command above. We could also omit either one of the nrow= and the ncol= arguments.

Exercise G

Q1

Construct a matrix with 3 rows that contain the numbers 1 up to 9, arranged in the default format..

We can also combine vectors to obtain a matrix. For instance, we would like to construct a matrix with each of the three vectors as a row.

If you are interested in finding the sum of all values by rows or by columns, rowSums() and colSums() provide the convenience. Similarly, there are rowMeans() and colMeans(). These functions would each create a new vector with the same length as the dimension of the row and column:

You can easily append additional values to an existing matrix by either rbind() or cbind(), where r and c stand for row and column, respectively.

Exercise H

Q1
  1. First construct a matrix containing prices for three different hotel rooms (row) shown on two different websites (column).

  2. Compute the average price from two websites for each room.

  3. Compute the mean price of the three rooms from the two websites, respectively.

  4. Add Room4 whose price shown on the two websites are 110, 120, respectively.

  5. Based on the matrix you created in 4., add prices from the third website for all 4 rooms: 150, 199, 85, 115.

5.2 Extracting matrix elements

Similar to vectors, you can use the square brackets [ ] to select one or multiple elements from a matrix. While vectors have one dimension, matrices have two dimensions. You should therefore use a comma to separate the rows you want to select from the columns. For example:

If you want to select all elements of a row or a column, no number is needed before or after the comma, respectively:

Exercise I

Q1
  • Working with the matrix created in the previous Exercise, select the prices for Room 1 and Room 4 from the first and the third website.
  • Select prices for Room 3 from all websites.

5.3 Arithmetics with matrices

Similar to what you have learned with vectors, the standard operators like +, -, /, *, etc. work in an element-wise way on matrices in R. For example,

Note that this is not the standard matrix multiplication for which you should use %*% in R. See below for an example.

where diag() allows us to create a diagonal matrix with all elements on the diagonal to be 1 (an identity matrix).

6 Data frames

6.1 Create data frames

Data frame is probably the most widely used data format in Base R. An example would look like this.

where mtcars is a built-in data frame in R. The data() function allows us to retrieve data. The head() function lets us view the first six rows of the data frame.

The data.frame() function can be used to create a data frame.

Other commonly used approaches to (re)build a data frame are cbind() and rbind(), which are counterparts of the c() function. The cbind() function means binding columns whereas rbind() combines rows, like what we did for matrices. More details will be presented in the next section.

Exercise J

Q1

Write the hotel room prices in a data frame with columns to be the prices for Room 1, Room 2 and Room 3, respectively. You can define the column names accordingly.

6.2 Indexing data

Similar to matrices, if I wish to select a specific entry in the data frame, say

The first number refers to the row index and the second corresponds to the column index. We could also choose a subset of the data frame by typing

Note

Notice that by not putting a number for the column index, I have chosen all the columns in the dataset.

In addition, we can index a column in the data frame by using $. For instance,

Alternatively, we can call the column name directly. This will demonstrate the row names as well.

That could be assisted by first checking the column names of a data frame.

If one wants to obtain the index or rownames of a data frame in R, the rownames() function should be used.

6.3 Attributes

Remember we can use the attributes() function to have a general idea on the data frame.

Exercise K

Q1
  1. Use dim(), names(), str(), summary() functions on the data frame mtcars. Describe what you see.
  2. Use the class() function to verify the object types of mtcars$mpg and mtcars["mpg"], respectively.