flowchart LR A(Arithmetic Operations) --> B(Variable Assignment) B --> C(Objects) C --> D(Vectors) C --> E(Matrices) C --> F(Data frames)
Lecture 2 - R Basics
- Arithmetic Operations
- Variable Assignment
- Objects
- Vectors
- Matrices
- Data frames
1 Arithmetic Operations
+, -, *, /, ^ work in the same way as a calculator, e.g., exponentiate:
It also works with modulo: quotient – %/%, remainder – %%:
R uses functions to perform operations. To run a function called funcname, we type funcname(input1, input2), where the inputs (or arguments) input1 and input2 tell R how to run the function. A function can have any number of inputs. Maths functions such as Square root sqrt(), Logrithm log(), Exponentiation exp(), Absolute value abs() are built-in functions.
Exercise A
2 Variable Assignment
You can store a value (e.g. 2) or an object (e.g. a data frame) in a variable in R. You can later call this variable’s name to easily access the value or the object that has been stored within this variable. For example, you can assign a value 2 to a variable my_var with the command.
<- is the assignment operator in R. It can be also replaced by =. Assigning new values leads to overwriting previously assigned values. Variables already defined are shown in the environment window. By entering the variable name, the values associated with the variable are automatically printed.
Exercise B
The value of a variable can take different types, including logical values/boolean TRUE or FALSE.
When naming your variables, you can use letters, numbers, dots, and underscores.
However, no variable names can start with a number or a dot followed by a number.
3 Objects
Common objects in R include:
- vectors,
- matrices,
- lists,
- data frames (Data frames are like matrices, but instead of numbers, the entries can be numbers or other types of variables, e.g., factors, dates, etc).
Entries in the objects can be categorized into certain data types.
- numeric (double precision real numbers),
- integer (specified by an
Lsuffix), - complex,
- logical,
- character (non-numeric data, for example, the address of a patient),
- factors (categorical variables).
A vector can only contain data of the same class, but a list can contain data of different classes.
We could use class() and str() to check the object/data type.
Some people prefer to use the attributes() function.
Exercise C
4 Vectors
4.1 Forming a vector
To create a vector of numbers, we use the function c() (for concatenate). Any number inside the parentheses are joined together. The following command instructs R to join together the numbers 1, 3, 2, and 5, and to save them as a vector named x. When we type x, it returns the vector.
The : operator can be used to generate a sequence of consecutive integers.
If we would like to create more general arithmetic sequences, we can apply the seq() function.
There is the rep() function specially useful for creating vectors with repetitive elements.
Compare it with
Exercise D
Create the vectors:
4.2 Arithmetic operations
Arithmetic operators apply to vectors where the operation is taken element by element.
However, when applying mathematical operators, x and y must be the same length. We can check their length using the length() function.
Hitting \(\uparrow\) multiple times will display previous commands which can be further edited. This is useful when one often wishes to repeat a similar command.
The
ls()function allows us to review a list of all of the objects, such as data and functions, that we have saved so far.The
rm()function can be used to delete variables that are no longer needed.It is also possible to remove all objects at once:
Exercise E
4.3 Selecting Elements
Select the indexing technique appropriate for your problem:
Use square brackets to select vector elements by their position, such as
vec[2]for the second element of vec.Use negative indexes to exclude elements.
Use a vector of indexes to select multiple values.
Use a logical vector to select elements based on a condition.
Use names to access named elements.
4.4 Comparison operations
By making use of comparison operators, we can approach some questions in a more proactive way.
The (logical) comparison operators known to R are:
<for less than>for greater than<=for less than or equal to>=for greater than or equal to==for equal to each other!=not equal to each other
The nice thing about R is that you can use these comparison operators also on vectors.
This command tests for every element of the vector if the condition stated by the comparison operator is TRUE or FALSE.
Working with comparisons will make your data analytical life easier. Instead of selecting a subset to investigate yourself, you can simply ask R to return only the elements that satisfy the underlying condition. You can select the desired elements by putting the conditional statement between the square brackets that follow the vector of interest:
R knows what to do when you pass a logical vector in square brackets: it will only select the elements that correspond to TRUE in my_cond.
Exercise F
5 Matrices
5.1 Basics
The matrix() function can be used to create a matrix of numbers.
Note that we could just as well omit typing data =, nrow= and ncol = in the matrix() command above. We could also omit either one of the nrow= and the ncol= arguments.
Exercise G
We can also combine vectors to obtain a matrix. For instance, we would like to construct a matrix with each of the three vectors as a row.
If you are interested in finding the sum of all values by rows or by columns, rowSums() and colSums() provide the convenience. Similarly, there are rowMeans() and colMeans(). These functions would each create a new vector with the same length as the dimension of the row and column:
You can easily append additional values to an existing matrix by either rbind() or cbind(), where r and c stand for row and column, respectively.
Exercise H
5.2 Extracting matrix elements
Similar to vectors, you can use the square brackets [ ] to select one or multiple elements from a matrix. While vectors have one dimension, matrices have two dimensions. You should therefore use a comma to separate the rows you want to select from the columns. For example:
If you want to select all elements of a row or a column, no number is needed before or after the comma, respectively:
Exercise I
5.3 Arithmetics with matrices
Similar to what you have learned with vectors, the standard operators like +, -, /, *, etc. work in an element-wise way on matrices in R. For example,
Note that this is not the standard matrix multiplication for which you should use %*% in R. See below for an example.
where diag() allows us to create a diagonal matrix with all elements on the diagonal to be 1 (an identity matrix).
6 Data frames
6.1 Create data frames
Data frame is probably the most widely used data format in Base R. An example would look like this.
where mtcars is a built-in data frame in R. The data() function allows us to retrieve data. The head() function lets us view the first six rows of the data frame.
The data.frame() function can be used to create a data frame.
Other commonly used approaches to (re)build a data frame are cbind() and rbind(), which are counterparts of the c() function. The cbind() function means binding columns whereas rbind() combines rows, like what we did for matrices. More details will be presented in the next section.
Exercise J
6.2 Indexing data
Similar to matrices, if I wish to select a specific entry in the data frame, say
The first number refers to the row index and the second corresponds to the column index. We could also choose a subset of the data frame by typing
Notice that by not putting a number for the column index, I have chosen all the columns in the dataset.
In addition, we can index a column in the data frame by using $. For instance,
Alternatively, we can call the column name directly. This will demonstrate the row names as well.
That could be assisted by first checking the column names of a data frame.
If one wants to obtain the index or rownames of a data frame in R, the rownames() function should be used.
6.3 Attributes
Remember we can use the attributes() function to have a general idea on the data frame.