flowchart LR A(R) --> B(RStudio) B --> C(R Packages)
Lecture 1 - Get Started
- R
- RStudio
- R packages
1 R
1.1 Why learn R?
- Statistical Analysis
- R provides powerful tools for conducting complex statistical analyses, which is essential for many scientific and social research disciplines.
- Data Visualization
- R excels in creating high-quality, publishable graphics, enabling clear communication of data insights.
- Data Manipulation
- It includes extensive libraries for handling and transforming data, making it easier to prepare large datasets for analysis.
- Open Source
- R is free to use, with a vast community that contributes packages and support, reducing software costs and increasing accessibility.
- Career Opportunities
- Proficiency in R is highly valued in many careers such as data science, economics, actuarial science and biostatistics, enhancing job prospects. Specifically, the SoA (Society of Actuaries) requires R as the main tool in their Advanced Topics in Predictive Analytics (ATPA) assessment.
- Ease of Learning
- While Python is often considered beginner-friendly, R has a significant advantage for data-related tasks once you master the basics. Designed specifically for data manipulation and analysis, learning core data science skills—data manipulation, visualization, and machine learning—can be more straightforward in R.
1.2 What is R and Rstudio?
R is a language and environment for statistical computing and graphics. It is a free, open-source program for which there are abundant online resources to support its use. R can be downloaded from http://cran.r-project.org.
RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management. RStudio can be downloaded from https://www.rstudio.com/products/rstudio/download/.
1.3 What is CRAN?
R is downloaded from the Comprehensive R Archive Network (CRAN). You will see later that we can also download packages from CRAN.
1.4 Exercise A
2 RStudio
2.1 Script Window vs Console Window
- The Script Window is the place to enter and run your code so that it is easily edited and saved for future use. Usually the Script Window is shown at the top left in RStudio. If this window is not shown, it will be visible when you open a previously saved R script, or when you create a new R Script.
- To execute your code in the R script, you can move your cursor anywhere in the line of the code, and either click on Run or press
Cmd/CTRL + Enteron your keyboard. - To execute your code in the Console Window, you can enter code directly and hit
Enter. The commands that you run will be shown in the History Window on the top right of RStudio. You can save these commands for future use, yet this is not recommended for documenting your code. - Comments in R are preceded by the
#symbol; anything following this symbol will not be executed. However, writing comments in your code is important for documentation purposes.
2.2 Saving and Opening R Script Files
- Saving an R Script: To save your work in R, you create what is known as an R script. This is a plain text file containing the code you’ve written, which can be run in R to perform tasks like data analysis, visualization, etc. Here’s how to save an R script:
- Click on
File>New File>R Scriptin the RStudio menu bar. - Write your code in the R script window.
- Click on
File>Save Asin the RStudio menu bar. - Choose a location on your computer to save the file, and give it a name ending in .R (e.g., myscript.R).
- Click Save.
- Click on
- Opening an R Script: To open an existing R script for editing or execution, follow these steps:
- Click on
File>Open Filein the RStudio menu bar. - Navigate to the location of the R script file on your computer.
- Click Open.
- Click on
2.3 Exercise B
3 R Packages
3.1 What is an R package?
While Base R includes numerous built-in functions for statistical computing and plotting, its capabilities can be limited. Thanks to R’s open-source nature, developers can create packages that enhance its basic functionality. You could consider a package as a collection of functions and code or sometimes data as well which is wrapped up in a nice, complete format. If you would like to develop your own packages, check out Hadley Wickham’s book from O’Reilly, “R Packages”.
3.2 What are repositories?
A repository is a central location where many developed packages are located and available for download. There are three major repositories:
- CRAN (Comprehensive R Archive Network): R’s main repository (currently 21132 packages available!)
- GitHub: A very popular, open source repository (not R specific!)
- BioConductor: A repository mainly for bioinformatic-focused packages
3.3 CRAN task views
CRAN task views aim to provide some guidance which packages on CRAN are relevant for tasks related to a certain topic. They give a brief overview of the included packages which are intended to have a sharp focus so that it is sufficiently clear which packages should be included (or excluded). An excerpt of the current task views can be found in Figure Figure 2 and they are still being updated.
R Documentation is a useful search engine for packages and functions from CRAN and BioConductor.
3.4 Install packages from CRAN?
We will be focusing on installing packages from CRAN in this course. If you are interested, you can Google installation instructions for packages from GitHub or BioConductor.
Use install.packages() function
You can simply install a package by using the install.packages() function in your R console. e.g., or
if you would like to install multiple packages at once.
Use RStudio graphical interface
As seen in Figure Figure 3 and Figure 4 click the Packages tab in your plots, packages panel and find the Install button and click it. Once a window pops up, type in the package name you would like to install. Click Install and R will do the rest of the job.
3.5 Load packages
After installing a package, you MUST load it before you start to use functions in the package. You can do it using the library() function, e.g., .
3.6 Update, remove, unload packages
Checking packages:
Before considering updating or removing packages, you might want to check what packages you have already installed. You can do it by either installed.packages() or library() with nothing between the parentheses to check. Alternatively, RStudio Packages tab also presents you with a list of installed packages.
Updating packages:
Use old.packages() to obtain a list of the versions of the packages installed. To update all packages, use update.packages(). If you only want to update a specific package, just use once again install.packages('packagename'). Or, you could also update packages using the RStudio graphical interface under the Packages tab.
Uninstalling packages:
Running the remove.packages() will allow you to uninstall a package you do not need anymore. If you are using the RStudio graphical interface, the removal can be done by clicking the ‘X’ on the right end of the packages list.
Unloading packages:
Sometimes you may want to unload a package in the middle of a script, possibly due to conflicts with another package. To unload a given package you can use the detach() function, e.g., would unload the ggplot2 package (that we loaded earlier). Within the RStudio interface, under the Packages tab, you can unload a package by unchecking the box in front of the package name.
3.7 Use of Help files
To activate a function’s help file, you could simply type in the R console a question mark followed by the function name, e.g., . In the file, you will see a list of arguments of the function as well as detailed explanation related to each argument. At the bottom of the file, you can find examples of employing the function. Note that some arguments are optional and some arguments can be migrated from another function.
3.8 Exercise C
If interested, you can find more details about the ‘readxl’ package here.