Chapter 1 Introduction

Welcome to the R lecture notes for PHC 6089. In this chapter, we will introduce R and show you how to get R and RStudio up and running on your computer.

1.1 What is R?

R is the open-source statistical language that seems poised to “take over the world” of statistics and data science. R is really more than a statistical package - it is a language or an environment designed to produce statistical analysis and production of high quality graphics (for more on information see www.r-project.org/about.html.

Originally developed by two statisticians at the University of Auckland as a dialect of the S statistical language, since 1997 the development of R as an open source implementation of the S language has been overseen by a core team of some 20 professional statisticians. R is both opens source and open development (for more on information see www.r-project.org/contributors.html.

Why should we use R?

  • R is a powerful and flexible, free (open source) language designed specifically for statistical computing.
  • There is an extensive collection packages created by R users to extend R and implement modern statistical techniques.
  • Furthermore, R is an interpreted, high level language, which means that we can write code and run it in real time line by line without needing to worry about low level progamming such as memory managment.

Why should we not use R?

  • R has a fairly steep learning curve, since it is programming oriented with a minimal interface, so for students without prior programming experience in an object oriented language such as Python, C/C++, Java, etc. learning to program in R might be daunting initially.
  • There is little centralized support for R, and help largely relies on the online community and package developers.
  • R is slower and more memory intensive, than more traditional programming languages such as C/C++, Java, Perl, and Python.

In this course, we will learn R from the ground up and try to get you through that initial steep learning curve (whether or not you have previous programming experience). By the end of this course you should be able to

  • Read data into R from external files (e.g. csv or excel files)
  • Recode and manipulate datasets
  • Write R functions and use add on packages
  • Make exploratory plots
  • Understand basic programming syntax (e.g. variable assignments, if/else, for loop, etc.)
  • Perform basic statistical test (e.g. t-tests, linear regression, etc.)

1.2 Installing R and RStudio

For this course, you will need to have the most up to date versions of R and RStudio. R is freely available to download from the R project, and RStudio is freely available to download from RStudio.

  1. Install R
    • Go to R project
    • Select the link to download R under the Getting Started section
    • Select a CRAN mirror in a country closest to you (they are all copies of the same CRAN server)
    • Select the R download for your operating system (Windows/Mac/Linux)
    • Download the most recent version of base R
  2. Installing RStudio
    • Go to RStudio
    • In the menu, go to Products > RStudio
    • Select download RStudio Desktop
    • Select Download for RStudio desktop (free) and select the download for you operating system.

Once you have R/RStudio installed, you have base R and its associated packages. We will use many add on packages throughout the course. You can install them as we go by using install.packages("package_name") or going to Tools > Install Packages in RStudio and search for the package to install. You can install many of the add on packages used in this course by running the following command:

install.packages(c("tidyverse", "lubridate", "readxl", "broom", 
                   "survival", "rmarkdown", "shiny", "tidyselect"))

1.3 Useful (Free) R Resources