title: "My report"
execute:
echo: false
Lecture 12 - Quarto
Reporting and Communicating
Reporting and communicating is the final part of the data science process. If you cannot communicate your results to other humans, it does not matter how great your analysis is.
In the realm of data science, effective communication is paramount. It bridges the gap between complex data analysis and decision-making processes, enabling stakeholders to grasp insightful conclusions and make informed choices.
Traditional tools like Microsoft Word, while ubiquitous in many professional settings, often fall short when it comes to handling the dynamic and interactive needs of data science reporting:
- They lack the ability to seamlessly integrate statistical analysis and visualizations.
- Their static formats can hinder the audience’s understanding and engagement, particularly when dealing with multifaceted data sets.
- They challenge the reproducibility of reports; without the right tools, reproducing results can be labor-intensive and prone to errors, which in turn can compromise the credibility of the findings.
- They do not allow for the display of code alongside its results, which is crucial for transparency and understanding in data science communication.
Addressing these issues, Quarto emerges as a powerful ally for users of R. Quarto is an open-source scientific and technical publishing system that enhances the creation of dynamic and reproducible reports.
- Its compatibility with R allows analysts to embed live R code into documents, which can then be converted into various formats including HTML, PDF, and Word.
- This integration not only ensures accuracy but also enhances the reproducibility of documents.
- With Quarto, reports gain interactivity with elements such as expandable code outputs and interactive visualizations, which significantly boost reader engagement and comprehension.
- Quarto’s ability to produce multiple output formats from a single source document efficiently meets diverse audience needs, simplifying the workflow for R users.
1 Quarto Basics
1.1 Get Started
Quarto is a command line interface tool, not an R package. This means that help is, by-and-large, not available through ?. Instead, as you work through this chapter, and use Quarto in the future, you should refer to the Quarto Cheatsheet or the Quarto documentation.
You need the Quarto command line interface, but you do not need to explicitly install it or load it, as RStudio automatically does both when needed. The easiest way to create a new quarto document is using the RStudio IDE, i.e., File -> New File -> Quarto Document….
Selecting “Quarto Document…” will lead to the New Quarto Document dialog window, where you can choose the type of desired output document you would like to create. The default option is HTML, which is a good choice if you want to publish your work online or in an email, or if you have not made up your mind yet about how you would like to output your final document. Changing to a different format later is typically as easy as chaining one line of text in the document, or a few clicks in the IDE.
After you make your selection and click Create, you will get a basic quarto template.
A quarto file is a plain text file that has the extension .qmd. See the following example.
It contains three important types of content:
- An (optional) YAML header surrounded by —s.
- Chunks of R code surrounded by ```.
- Text mixed with simple text formatting like
#heading and_italics_.
It shows a .qmd document in RStudio with notebook interface where code and output are interleaved. You can run each code chunk by clicking the Run icon (it looks like a play button at the top of the chunk), or by pressing Cmd/Ctrl + Shift + Enter. RStudio executes the code and displays the results inline with the code.
To produce a complete report containing all text, code, and results, click Render or press Cmd/Ctrl + Shift + K. You can also do this programmatically with quarto::quarto_render("AQuartoExample.qmd"). This will display the report in the viewer pane and create an HTML file.
When you render the document, Quarto sends the .qmd file to knitr, https://yihui.org/knitr/, which executes all of the code chunks and creates a new markdown (.md) document which includes the code and its output. The markdown file generated by knitr is then processed by pandoc, https://pandoc.org, which is responsible for creating the finished file. This process is shown in the figure below. The advantage of this two step workflow is that you can create a very wide range of output formats.
Exercise A
1.2 Visual vs. Source Editors
If you are new to computational documents like .qmd files but have experience using tools like Google Docs or MS Word, the easiest way to get started with Quarto in RStudio is the visual editor. The Visual editor in RStudio provides a WYSIWYM interface for authoring Quarto documents. Under the hood, prose in Quarto documents (.qmd files) is written in Markdown, a lightweight set of conventions for formatting plain text files.
In the visual editor you can either use the buttons on the menu bar to insert images, tables, cross-references, etc. or you can use the catch-all Cmd + / or Ctrl + / shortcut to insert just about anything. If you are at the beginning of a line, you can also enter just / to invoke the shortcut.
You can also edit Quarto documents using the Source editor in RStudio, without the assist of the Visual editor. While the Visual editor will feel familiar to those with experience writing in tools like Google docs, the Source editor will feel familiar to those with experience writing R scripts or R Markdown documents. The Source editor can also be useful for debugging any Quarto syntax errors since it is often easier to catch these in plain text.
The guide below shows how to use Pandoc’s Markdown for authoring Quarto documents in the source editor.
Text formatting
------------------------------------------------------------
*italic* or _italic_
**bold** __bold__
`code`
~~strikeout~~
superscript^2^ and subscript~2~
[underline]{.underline} [small caps]{.smallcaps}
Headings
------------------------------------------------------------
# 1st Level Header
## 2nd Level Header
### 3rd Level Header
Lists
------------------------------------------------------------
- Bulleted list item 1
- Item 2
- Item 2a
- Item 2b
1. Numbered list item 1
2. Item 2.
The numbers are incremented automatically in the output.
Links
------------------------------------------------------------
<http://example.com>
[linked phrase](http://example.com)
The best way to learn these is simply to try them out. It will take a few days, but soon they will become second nature, and you will not need to think about them. If you forget, you can get to a handy reference sheet with Help -> Markdown Quick Reference.
Exercise B
In the previous AQuartoExample.qmd template file we worked with:
1.3 Code Chunks
To run code inside a Quarto document, you need to insert a chunk. There are three ways to do so:
- The keyboard shortcut
Cmd + Option + I(Mac) orCtrl + Alt + I(Windows) - The “+C” icon in the editor toolbar.
- By manually typing the chunk delimiters ```{r} and ```.
It is highly recommended that you learn the keyboard shortcut. It will save you a lot of time in the long run!
You can continue to run the code line by line using the keyboard shortcut that by now: Cmd/Ctrl + Enter. However, chunks get a new keyboard shortcut: Cmd/Ctrl + Shift + Enter, which runs all the code in the chunk. Think of a chunk like a function. A chunk should be relatively self-contained, and focused around a single task.
The following sections describe the chunk header which consists of ```{r}, followed by an optional chunk label and various other chunk options, each on their own line, marked by #|.
Chunk Label
Chunks can be given an optional label, e.g.
This has three advantages:
- You can more easily navigate to specific chunks using the drop-down code navigator in the bottom-left of the script editor:
Graphics produced by the chunks will have useful names that make them easier to use elsewhere.
You can set up networks of cached chunks to avoid re-performing expensive computations on every run.
Your chunk labels should be short but evocative and should not contain spaces. We recommend using dashes (-) to separate words (instead of underscores, _) and avoiding other special characters in chunk labels.
You are generally free to label your chunk however you like, but there is one chunk name that imbues special behavior: setup. When you are in a notebook mode, the chunk named setup will be run automatically once, before any other code is run.
Additionally, chunk labels cannot be duplicated. Each chunk label must be unique.
Exercise C
Chunk Options
Chunk output can be customized with options, arguments supplied to chunk header. knitr provides almost 60 options that you can use to customize your code chunks. Here we will cover the most important chunk options that you will use frequently. You can see the full list at http://yihui.name/knitr/options/.
The most important set of options controls if your code block is executed and what results are inserted in the finished report:
eval = FALSEprevents code from being evaluated. (And obviously if the code is not run, no results will be generated). This is useful for displaying example code, or for disabling a large block of code without commenting each line.include = FALSEruns the code, but doesn’t show the code or results in the final document. Use this for setup code that you don’t want cluttering your report.echo = FALSEprevents code, but not the results from appearing in the finished file. Use this when writing reports aimed at people who don’t want to see the underlyingRcode.message = FALSEorwarning = FALSEprevents messages or warnings from appearing in the finished file.results = "hide"hides printed output.fig.show = 'hide'hides plots.error = TRUEcauses the render to continue even if code returns an error. This is rarely something you’ll want to include in the final version of your report, but can be very useful if you need to debug exactly what is going on inside your.qmd. It’s also useful if you’re teaching R and want to deliberately include an error. The default,error = FALSEcauses rendering to fail if there is a single error in the document.
Each of these chunk options get added to the header of the chunk, following #|, e.g., in the following chunk the result is not printed since eval is set to false.
The following table summarizes which types of output each option suppresses:
| Option | Run code | Show code | Output | Plots | Messages | Warnings |
|---|---|---|---|---|---|---|
eval = FALSE |
❌ | ❌ | ❌ | ❌ | ❌ | |
include = FALSE |
❌ | ❌ | ❌ | ❌ | ❌ | |
echo = FALSE |
❌ | |||||
results = "hide" |
❌ | |||||
fig.show = "hide" |
❌ | |||||
message = FALSE |
❌ | |||||
warning = FALSE |
❌ |
Exercise D
Global Options
As you work more with knitr, you will discover that some of the default chunk options do not fit your needs and you want to change them.
You can do this by adding the preferred options in the document YAML, under execute. For example, if you are preparing a report for an audience who does not need to see your code but only your results and narrative, you might set echo: false at the document level. That will hide the code by default, so only showing the chunks you deliberately choose to show (with echo: true). You might consider setting message: false and warning: false, but that would make it harder to debug problems because you would not see any messages in the final document.
Since Quarto is designed to be multi-lingual (works with R as well as other languages like Python, Julia, etc.), all of the knitr options are not available at the document execution level since some of them only work with knitr and not other engines Quarto uses for running code in other languages (e.g., Jupyter). You can, however, still set these as global options for your document under the knitr field, under opts_chunk. For example, when writing books and tutorials we set:
title: "Tutorial"
knitr:
opts_chunk:
comment: "#>"
collapse: true
This uses our preferred comment formatting and ensures that the code and output are kept closely entwined.
Exercise E
Inline code
There is one other way to embed R code into a Quarto document: directly into the text, with: ` r `. This can be very useful if you mention properties of your data in the text. For example, we can write something like:
There are `r
nrow(mtcars)` cars. The mean miles per gallon is `rmean(mtcars$mpg)`.
When the report is rendered, the results of these computations are inserted into the text:
There are 32 cars. The mean miles per gallon is 20.090625.
When inserting numbers into text, format() is your friend. It allows you to set the number of digits so you don’t print to a ridiculous degree of precision, and a big.mark to make numbers easier to read.
format(.12358124331, digits = 2)[1] "0.12"
format(3452345, big.mark = ",")[1] "3,452,345"
Hence, we can write:
There are 32 cars. The mean miles per gallon is 20.1.
Exercise F
2 Quarto Advanced
2.1 Callout Blocks
Callouts are an excellent way to draw extra attention to certain concepts, or to more clearly indicate that certain content is supplemental or applicable to only some scenarios.
Callout Types
There are five different types of callouts available.
- note
- warning
- important
- tip
- caution
The color and icon will be different depending upon the type that you select. Here are what the various types look like in HTML output:
Note that there are five types of callouts, including: note, tip, warning, caution, and important.
Callouts provide a simple way to attract attention, for example, to this warning.
Danger, callouts will really improve your writing.
This is an example of a callout with a title.
This is an example of a “collapsed” caution callout that can be expanded by the user. You can use collapse="true" to collapse it by default or collapse="false" to make a collapsible callout that is expanded by default.
Markdown Syntax
Create callouts in markdown using the following syntax (note that the first markdown heading used within the callout is used as the callout heading):
::: {.callout-note}
Note that there are five types of callouts, including:
`note`, `warning`, `important`, `tip`, and `caution`.
:::
::: {.callout-tip}
## Tip with Title
This is an example of a callout with a title.
:::
::: {.callout-caution collapse="true"}
## Expand To Learn About Collapse
This is an example of a 'folded' caution callout that can be expanded by the user. You can use `collapse="true"` to collapse it by default or `collapse="false"` to make a collapsible callout that is expanded by default.
:::
Note that above callout titles are defined by using a heading at the top of the callout. If you prefer, you can also specify the title using the title attribute. For example:
::: {.callout-tip title="Tip with Title"}
This is a callout with a title.
:::
Exercise A
2.2 Figures
The figures in a Quarto document can be embedded (e.g., a PNG or JPEG file) or generated as a result of a code chunk. Below is the syntax for inserting a figure file which is in your current working directory.
{fig-alt="optional alt text"}
For example:
{fig-alt="insert quarto logo"}
results in the following output:
Note that when specifying the options in {}. Do NOT use spaces before and after the =.
Alternatively, to embed an image from an external file, you can use the Insert menu in the Visual Editor in RStudio and select Figure / Image. This will pop open a menu where you can browse to the image you want to insert as well as add alternative text or caption to it and adjust its size. In the visual editor you can also simply paste an image from your clipboard into your document and RStudio will place a copy of that image in your project folder.
If you include a code chunk that generates a figure (e.g., includes a ggplot() call), the resulting figure will be automatically included in your Quarto document.
Figure Sizing
External file
By default figures are displayed using their actual size (subject to the width constraints imposed by the page they are rendered within). You can change the display size by adding the width and height attributes to the figure. For example
{width=300}
Note that if only width is specified then height is calculated automatically. If you need to modify the default behavior just add an explicit height attribute.
The default units for width and height are pixels. You can also specify sizes using a percentage or a conventional measurement like inches or millimeters. For example:
{width=80%}
{width=2in}
If you have several figures that appear as a group, you can create a figure division to enclose them. For example:
::: {#fig-logos layout-ncol=2}
{#fig-Qurto}
{#fig-RMarkdown}
Reporting in R
:::
Note that the empty lines between the figures (and between the last figure and the caption) are required (it is what indicates that these images belong to their own paragraphs rather than being multiple images within the same paragraph).
Note also that we also used a layout-ncol attribute to specify a two-column layout.
Exercise B
Graph generated by R code
Getting the right size and shape for a graph created by R in Quarto can be more challenging. There are five main chunk options that control figure sizing: fig-width, fig-height, fig-asp, out-width and out-height. Image sizing is challenging because there are two sizes (the size of the figure created by R and the size at which it is inserted in the output document), and multiple ways of specifying the size (i.e. height, width, and aspect ratio: pick two of three).
We recommend three of the five options:
Plots tend to be more aesthetically pleasing if they have consistent width. To enforce this, set
fig-width: 6 (6”) andfig-asp: 0.618 (the golden ratio) in the defaults. Then in individual chunks, only adjustfig-asp.Control the output size with
out-widthand set it to a percentage of the body width of the output document. We suggest toout-width: "70%"andfig-align: center. That gives plots room to breathe, without taking up too much space.To put multiple plots in a single row, set the
layout-ncolto 2 for two plots, 3 for three plots, etc. This effectively setsout-widthto “50%” for each of your plots iflayout-ncolis 2, “33%” iflayout-ncolis 3, etc. Depending on what you are trying to illustrate (e.g., show data or show plot variations), you might also tweakfig-width, as discussed below.
If you find that you are having to squint to read the text in your plot, you need to tweak fig-width. If fig-width is larger than the size the figure is rendered in the final doc, the text will be too small; if fig-width is smaller, the text will be too big. You will often need to do a little experimentation to figure out the right ratio between the fig-width and the eventual width in your document. To illustrate the principle, the following three plots have fig-width of 4, 6, and 8 respectively:
If you want to make sure the font size is consistent across all your figures, whenever you set out-width, you will also need to adjust fig-width to maintain the same ratio with your default out-width. For example, if your default fig-width is 6 and out-width is “70%”, when you set out-width: “50%” you will need to set fig-width to 4.3 (6 * 0.5 / 0.7).
Figure sizing and scaling is an art and science and getting things right can require an iterative trial-and-error approach. You can learn more about figure sizing in the taking control of plot scaling blog post.
Exercise C
2.3 Tables
Similar to figures, you can include two types of tables in a Quarto document. They can be markdown tables that you create directly in your Quarto document (using the Insert Table menu) or they can be tables generated as a result of a code chunk.
Table Created by Markdown Syntax
A table can be created by the following syntax.
| Right | Left | Default | Center |
|------:|:-----|---------|:------:|
| 12 | 12 | 12 | 12 |
| 123 | 123 | 123 | 123 |
| 1 | 1 | 1 | 1 |
As a result, we will get:
| Right | Left | Default | Center |
|---|---|---|---|
| 12 | 12 | 12 | 12 |
| 123 | 123 | 123 | 123 |
| 1 | 1 | 1 | 1 |
Exercise D
Table Created in Code Chunks
In this section we will focus on tables generated via computation.
By default, Quarto prints data frames and matrices as you would see them in the console:
head(mtcars)If you prefer that data be displayed with additional formatting you can use the knitr::kable() function. Take a look at the output generated from the code below.
knitr::kable(head(mtcars))| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
| Valiant | 18.1 | 6 | 225 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
Refer to the documentation for ?knitr::kable to see the other ways in which you can customize the table. For even deeper customization, consider the gt, huxtable, reactable, kableExtra, xtable, stargazer, pander, tables, and ascii packages. Each provides a set of tools for returning formatted tables from R code.
Exercise E
2.4 YAML Header
You can control many other “whole document” settings by tweaking the parameters of the YAML header. You might wonder what YAML stands for: it is “YAML Ain’t Markup Language”, which is designed for representing hierarchical data in a way that is easy for humans to read and write. Quarto uses it to control many details of the output. Here we will discuss self-contained documents and document parameters.
Self-contained
HTML documents typically have a number of external dependencies (e.g., images, CSS style sheets, JavaScript, etc.) and, by default, Quarto places these dependencies in a _files folder in the same directory as your .qmd file. If you publish the HTML file on a hosting platform (e.g., QuartoPub, https://quartopub.com/), the dependencies in this directory are published with your document and hence are available in the published report. However, if you want to email the report to a colleague, you might prefer to have a single, self-contained, HTML document that embeds all of its dependencies. You can do this by specifying the embed-resources option:
format:
html:
embed-resources: true
The resulting file will be self-contained, such that it will need no external files and no internet access to be displayed properly by a browser.
Parameters
Quarto documents can include one or more parameters whose values can be set when you render the report. Parameters are useful when you want to re-render the same report with distinct values for various key inputs. For example, you might be producing sales reports per branch, exam results by student, or demographic summaries by country. To declare one or more parameters, use the params field.
This example uses a my_class parameter to determine which class of cars to display:
---
format: html
params:
my_class: "suv"
---
```{r}
#| label: setup
#| include: false
library(tidyverse)
class <- mpg |> filter(class == params$my_class)
```
# Fuel economy for `r params$my_class`s
```{r}
#| message: false
ggplot(class, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(se = FALSE)
```
As you can see, parameters are available within the code chunks as a read-only list named params.
You can write atomic vectors directly into the YAML header. You can also run arbitrary R expressions by prefacing the parameter value with !expr. This is a good way to specify date/time parameters.
---
format: html
params:
start: !expr as.Date("2010-01-01")
---
```{r}
#| label: setup
#| include: false
library(tidyverse)
economics_subset <- economics |>
filter(date >= params$start)
```
```{r}
#| message: false
economics_subset |>
ggplot(aes(x = date, y = unemploy / pop)) +
geom_line()
```
Exercise F
2.5 Troubleshooting and More
Troubleshooting
Troubleshooting Quarto documents can be challenging because you are no longer in an interactive R environment, and you will need to learn some new tricks. Additionally, the error could be due to issues with the Quarto document itself or due to the R code in the Quarto document.
One common error in documents with code chunks is duplicated chunk labels, which are especially pervasive if your workflow involves copying and pasting code chunks. To address this issue, all you need to do is to change one of your duplicated labels.
If the errors are due to the R code in the document, the first thing you should always try is to recreate the problem in an interactive session. Restart R, then “Run all chunks”, either from the Code menu, under Run region or with the keyboard shortcut Cmd + Options + R (Mac) or Ctrl + Alt + R (Windows). If you are lucky, that will recreate the problem, and you can figure out what is going on interactively.
If that does not help, there must be something different between your interactive environment and the Quarto environment. You are going to need to systematically explore the options. The most common difference is the working directory: the working directory of a Quarto is the directory in which it lives. Check the working directory is what you expect by including getwd() in a chunk.
Next, brainstorm all the things that might cause the bug. You will need to systematically check that they are the same in your R session and your Quarto session. The easiest way to do that is to set error: true on the chunk causing the problem, then use print() and str() to check that settings are as you expect.
More to Explore
In this chapter we introduced you to Quarto for authoring and publishing reproducible computational documents that include your code and your prose in one place. You have learned about writing Quarto documents in RStudio with the visual or the source editor, how code chunks work and how to customize options for them, and how to include figures and tables in your Quarto documents. Additionally, you have learned about adjusting YAML header options for creating self-contained or parametrized documents. We have also given you some troubleshooting tips.
While this introduction should be sufficient to get you started with Quarto, there is still a lot more to learn. Quarto is still relatively young, and is still growing rapidly. The best place to stay on top of innovations is the official Quarto website: https://quarto.org.
There is another important topic that we have not covered here: collaboration. Collaboration is a vital part of modern data science, and you can make your life much easier by using version control tools, like Git and GitHub. You are recommended to read “Happy Git with R”, a user friendly introduction to Git and GitHub from R users, by Jenny Bryan. The book is freely available online: https://happygitwithr.com.