Chapter 3 Basic R

In this section, we will begin our introduction to R programming where we will explore

  • The R console
  • R scripts and comments
  • R variables
  • Vectors

Before we get into some programming, let’s point out some common new user frustrations when working with R.

  1. Different versions of R - R is frequently updated along with users updating packages. This can create frustration when old code stops working due to new updates. This is why you need to have the most up to date version of R to make sure that the packages we will use will work.
  2. Data type problems - If you are familiar with other programming languages, then you are familiar with variable types such as integers and strings. R is interpreted, so variables are not “typed”. Keeping track of what type of data is stored in a variable throughout a program can be tricky and lead to frustration when you get an unexpected type mismatch error from your code.
  3. Working directories - This often leads to frustration when trying to read files that R “can’t find”. Using RStudio projects can help with this. We will learn more about directories in the Data IO lesson.
  4. R is case sensitive - Typos in R such as trying to use the variable x instead of X will results in an error (or using the wrong variable), because R is case sensitive. These are considered two different variables in R. RStudio can help you avoid this problem with code completion (pressing tab with partially complete code will search for variables/functions that finish what you started).

Now, before we move on to explaining basic R programming, a note about the code and output in these lecture notes. In the notes, a command (we’ll also call them code or a code chunk) will look like this

print("I'm code")
[1] "I'm code"

Then print directly after it, will be the output of the code. So print("I'm code") is the code chunk and [1] "I'm code" is the output.

3.1 R as a Calculator

In R, we can put out code directly into the console terminal. When you first open R/RStudio, you will notice something similar to the following output in the console

R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> 

Notice the > at the end. The > is the R prompt. It means R is waiting for you to do something. You can type your code on this line next to the prompt and hit enter to submit the code and run it. Try typing or pasting the following code examples into the R console:

2 + 2
[1] 4
2 * 4
[1] 8
2^8
[1] 256
2 + (2 * 3) ^ 2
[1] 38
(1 + 3) / 2 + 45
[1] 47

Note a couple of things:

  1. When you type your command into the R console, R inherently thinks you want to print the result.
  2. Spaces don’t matter - 2+2 and 2 + 2 are the same, but use spaces to make you code easier to read.
  3. The standard mathematical operators are +, -, *, /, and ^.
  4. Standard order of operations are respected and can be altered using parentheses.
  5. All answers are preceded by [1]. (This is because the output is a single number, a vector of length 1. More on this later.)

Try evaluating the following expressions in R. You should obtain 0.5, 3, and 15 as your answers.

2 + 2 * 3 / 4 - 3
2 * 3 / 4 * 2
2^4 - 1

3.2 Scripts and Comments

It is generally better to write your code in a script, RMarkdown, or other document, so that you can easily save, edit, and share your code. You can create an R script by going to File > New File > R script or using the button that looks like a sheet of paper with a plus in a green circle on the upper left hand corner of RStudio and selecting R script.

R scripts also allow you to add comments to document your code. This is good to remind you what you were trying to do if you come back to some old code you wrote or if you share your code with others (as is common in many professions), then a new user can quickly get up to speed on what your code does. In R, # is the comment symbol. Anything to the right of a comment symbol on the same line will be ignored by R.

# this is a comment

# nothing to its right is evaluated# this 

# is still a comment
### you can use many #'s as you want
1+2# Can be the right of code
[1] 3

Note that 1+2 is still evaluated because it was to the left of the comment symbol.

3.3 R Variables

You can create variables from within the R environment and from files on your computer. R uses = or <- to assign values to a variable name. Note that variable names are case sensitive, so x and X are different variables in R. Let’s create a variable called x and store the value 2 in it.

x = 2 # Same as x <- 2
x
[1] 2
x * 4
[1] 8
x + 2
[1] 4

Once we created the variable x that holds the numeric value 2, we can then use that variable in expressions. For example, when we wrote x * 4, R substitutes the value stored in x into the expression to evaluate 2 * 4 to get 8.

We can update the value in x as much as we want by reassigning it’s value using = or <-. Let’s update x to hold the value of x + 5.

x
[1] 2
x <- x + 5
x
[1] 7

We will learn more about the different data types and classes in R such as data.frames, which are R datasets similar to an excel spreadsheet with variables as columns and observations as rows. For now, we are going to introduce the most basic data class, a vector. Vectors for the columns of a data.frame. Vectors can hold multiple values but all those values have to be of the same type. The two most basic types are numeric and character.

x
[1] 7
class(x)
[1] "numeric"
y = "hello world!"
class(y)
[1] "character"

Try assigning your full name to an R variable called name.

name = "Robert Parker"
name
[1] "Robert Parker"

Note that my name is stored as a single character string. How can we store both the first and last name separately in a single object? One way is by using a vector. The function c() concatenates or combines single R objects into a vector of R objects. It is most commonly used for creating vectors of numbers, character strings, and other simple data types. For example, we can create a vector that contains the numbers 1, 4, 6, and 8.

x <- c(1, 4, 6, 8)
x
[1] 1 4 6 8
class(x)
[1] "numeric"

Note that the class of x is numeric since the vector contains numeric values.

Now try assigning your first and last name as two separate character strings into a single vector called name2.

name2 = c("Robert", "Parker")
name2
[1] "Robert" "Parker"
class(name2)
[1] "character"

In statistics and in general programming, we often want to know how long a vector is. length() gets or sets the length of vectors (including lists, discussed later) and factors, and any other object for which the method has been defined. What is the length of x?

x
[1] 1 4 6 8
length(x)
[1] 4
y
[1] "hello world!"
length(y)
[1] 1

Note that the length of y is 1 since “hello world” is a single character string even though it consists of two words. What do you expect the lengths of name and name2 to be?

name
[1] "Robert Parker"
length(name)
[1] 1
name2
[1] "Robert" "Parker"
length(name2)
[1] 2

Note that name2 is of length 2 since we separated the first and last name into two separate character strings.

R is designed to make vector operations fast and easy. For example, we can perform mathematical operations to an entire vector.

x
[1] 1 4 6 8
x + 2 # add 2 to each element of the vector x
[1]  3  6  8 10
x * 3 # multiple each element of x by 3
[1]  3 12 18 24
x + c(1, 2, 3, 4) # add elements together in matching positions
[1]  2  6  9 12

Note that we have not modified x itself with any of the above operations. The results are printed but not stored. If we want to save the results we need to assign them to a variable.

y
[1] "hello world!"
y = x + c(1, 2, 3, 4)
y
[1]  2  6  9 12

Note that the R object y is no longer "hello world!. It has been overwritten by assigning new data to the variable and is now a numeric vector instead of a character string. This is an example of the interpretive (non-typed) aspect of R programming.

Be careful though. Even though we could overwrite a character string with a numeric vector does not mean that we can ignore classes/types. We can only perform mathematical operations on numeric vectors/objects. If we try to apply things like algebra operations to a character vector, we will receive an error.

name2
[1] "Robert" "Parker"
name2 + 4
Error in name2 + 4: non-numeric argument to binary operator

You can get more attributes than just class from an R object by using the str() function to get the structure of the object.

str(x)
 num [1:4] 1 4 6 8
str(y)
 num [1:4] 2 6 9 12

Note that str() gives us the type, length, and the (first few) elements of the vector.

3.4 Exercises

  1. create a new variable called my.num that contains 6 numbers
  2. multiply my.num by 4
  3. create a second variable called my.char that contains 5 character strings
  4. combine the two variables my.num and my.char into a variable called both
  5. what is the length of both?
  6. what class is both?
  7. divide both by 3, what happens?
  8. create a vector with elements 1 2 3 4 5 6 and call it x
  9. create another vector with elements 10 20 30 40 50 and call it y
  10. what happens if you try to add x and y together? why?
  11. append the value 60 onto the vector y (hint: you can use the c() function)
  12. add x and y together
  13. multiply x and y together. pay attention to how R performs operations on vectors of the same length.