Lecture 11 - Introduction to Machine Learning

In this section, we will provide a brief introduction to machine learning using R. Let us first understand what machine learning entails by distinguishing it from other popular terms in the field, such as AI and data science.

In fact, the following map shows what are involved in machine learning.

We will introduce only a few classical machine learning techniques, as listed here. The resources mentioned above are cited from the blog post titled “Machine Learning for Everyone.” You are encouraged to read the full article for a more comprehensive understanding here.

The classic methods in machine learning originated from pure statistics in the 1950s. These methods were developed to solve formal mathematical tasks and establish theories behind model construction. In practical applications, these methods focus on fitting models to data, capturing patterns in numbers, and ultimately facilitating summaries or predictions.

When constructing models, this often involves partitioning the data for different purposes: training, validation, and testing. Unless specified otherwise, we will consider the following approach for splitting data in the modeling process.

In this course, we will introduce tidymodels, a package that is becoming the tidyverse toolkit for machine learning. Max Kuhn, formerly of Pfizer and now with RStudio, leads its development. He is notably also the developer of the caret package in R, which provides a uniform interface for the diverse range of machine learning models available in R.

The following diagram illustrates which step each package covers in a typical data science project.

source

Even though a model is a single step, the development of models can benefit from having a tidyverse-friendly interface. This is where tidymodels comes into play.

tidymodels is also an umbrella of packages. In this introductory section, we will showcase functions from four tidymodels packages.

The following diagram illustrates each modeling step, and lines up the tidymodels packages that we will use in this section: