In this lecture, we will discuss controls structures in R and creating R functions. In particular, we will cover
The basic form an if/else clause in R is as follows:
Note that the condition above must evaluate to a single TRUE or FALSE and not a vector of TRUEs and FALSEs.
In the first example, we will calculate the median of a dataset. Recall that the median can be found by ordering the dataset from smallest to largest and then taking the median to be the observation that is in position (n+1)/2 if the dataset has an odd number of observations and the average of the observations in the positions n/2 and (n+1)/2 if the dataset has an even number of observations.
x <- c(5,4,2,8,0,10) #data which is out of order
x <- sort(x) #sort the data
x
## [1] 0 2 4 5 8 10
n <- length(x) #number of observations
n%%2 #n is even
## [1] 0
if (n%%2 == 0) { #see if n is even
median <- 0.5*x[n/2] + 0.5*x[n/2 + 1]
} else {
median <- x[(n+1)/2]
}
median
## [1] 4.5
x <- c(5,4,2,8,0,10,11) #data which is out of order
x <- sort(x) #sort the data
x
## [1] 0 2 4 5 8 10 11
n <- length(x) #number of observations
n%%2 #n is odd
## [1] 1
if (n%%2 == 0) { #see if n is even
median <- 0.5*x[n/2] + 0.5*x[n/2 + 1]
} else {
median <- x[(n+1)/2]
}
median
## [1] 5
We can also nest multiple if/else clauses for making multiple decisions. Let’s use if/else statements to classify a patient to as either underweight, normal, overweight, or obeses based on BMI (body mass index).
BMI <- 32 #obese
if (BMI < 18.5) {
BMI.class <- "Underweight"
} else if (BMI < 25) {
BMI.class <- "Normal"
} else if (BMI < 30) {
BMI.class <- "Overweight"
} else BMI.class <- "Obese"
BMI.class
## [1] "Obese"
If we want to apply an if/else clause to each element of a vector or matrix, then we can use the ifelse() function.
(x <- seq(0,2,len=6)) #a vector of length 6
## [1] 0.0 0.4 0.8 1.2 1.6 2.0
#For each element in x if x <= 1 then return small else return big
ifelse(x <= 1, "small", "big")
## [1] "small" "small" "small" "big" "big" "big"
(y <- matrix(1:8, nrow=2))
## [,1] [,2] [,3] [,4]
## [1,] 1 3 5 7
## [2,] 2 4 6 8
#For each element in the matrix, if y > 3 AND y < 7, then return 1 else return 0.
ifelse(y>3 & y <7, 1, 0)
## [,1] [,2] [,3] [,4]
## [1,] 0 0 1 0
## [2,] 0 1 1 0
We can also nest ifelse() function calls to make multiple decisions.
BMI <- c(32,27,23,17) #obese, overweight, normal, underweight
ifelse(BMI < 18.5, "Underweight", ifelse(BMI < 25, "Normal", ifelse(BMI < 30, "Overweight", "Obese")))
## [1] "Obese" "Overweight" "Normal" "Underweight"
There are three main loop structures in R: for, while, and repeat.
The basic structre of the while loop is as follows:
Let’s use a while loop to calculate \(10!=10\cdot 9\cdot 8\cdot 7\cdot 6\cdot 5\cdot 4\cdot 3\cdot \cdot 2\cdot 1\).
i <- 10
f <- 1
while(i>1) {
f <- i*f
i <- i-1
cat(i, f, "\n")
}
## 9 10
## 8 90
## 7 720
## 6 5040
## 5 30240
## 4 151200
## 3 604800
## 2 1814400
## 1 3628800
f
## [1] 3628800
factorial(10) #check using R's factorial function
## [1] 3628800
The basic structure of the for loop is
Again let’s calculate 10!.
f <- 1
for(i in 1:10) {
f <- f*i
cat(i, f, "\n")
}
## 1 1
## 2 2
## 3 6
## 4 24
## 5 120
## 6 720
## 7 5040
## 8 40320
## 9 362880
## 10 3628800
f
## [1] 3628800
factorial(10)
## [1] 3628800
The repeate loop has the following structure
Let’s calculate 10! one more time.
f <- 1
i <- 10
repeat {
if (i > 1) {
f <- i*f
i <- i - 1
cat(i, f, "\n")
} else break
}
## 9 10
## 8 90
## 7 720
## 6 5040
## 5 30240
## 4 151200
## 3 604800
## 2 1814400
## 1 3628800
f
## [1] 3628800
factorial(10)
## [1] 3628800
To define a function you make a call to function()
Below is an example of function documentation.It is good practice to add a comment before a user defined function to say what the function does, what are its arguments, and what will it return if anything. This is informative for others who will use your code, and as a reminder to yourself if you come back to the code after not using it for a while. This first function will create a confidence for a population proportion.
#########################################################################################
# Calculate the point estimate, standard error, and confidence intervale for one proportion
# Input: x = count of successes
# n = sample size
# conf.level = confidence level
# Return: A list containing the sample proportion, standard error, and confidence interval
##########################################################################################
prop.CI <- function(x, n, conf.level = 0.95) {
p.hat <- x/n
std.err <- sqrt(p.hat*(1-p.hat)/n)
alpha <- 1 - conf.level
CI.lower <- p.hat - qnorm(1 - alpha/2)*std.err
CI.upper <- p.hat + qnorm(1 - alpha/2)*std.err
list("p.hat" = p.hat, "std.err" = std.err,
"CI.lower" = CI.lower, "CI.upper" = CI.upper)
}
unlist(prop.CI(30, 100)) #conf.level has a defualt value of 0.95 so we don't need to specify it
## p.hat std.err CI.lower CI.upper
## 0.30000000 0.04582576 0.21018317 0.38981683
unlist(prop.CI(30, 100, conf.level = 0.9)) #change conf.level to 0.9 for a 90% CI
## p.hat std.err CI.lower CI.upper
## 0.30000000 0.04582576 0.22462334 0.37537666
#if we use the name of the arguments, then they can be given out of order.
unlist(prop.CI(n = 100, x = 30, conf.level = 0.9))
## p.hat std.err CI.lower CI.upper
## 0.30000000 0.04582576 0.22462334 0.37537666
result <- prop.CI(30, 100)
result$p.hat
## [1] 0.3
By default, the last line of code is returned from a function. We can be more explicit in what we want to return by adding a return statement to our function.
# Returning objects from a function
# By default R will return the last statement in a function. In the previous example, we returned
# a list.
# We can also return objects by using return() or invisible()
# return() - returns and prints the object
# invisible() - returns but does not print
# if we don't want to return anything, but only print a result. Then we can use cat() or print()
# to output text.
prop.CI <- function(x, n, conf.level = 0.95) {
p.hat <- x/n
std.err <- sqrt(p.hat*(1-p.hat)/n)
alpha <- 1 - conf.level
CI.lower <- p.hat - qnorm(1 - alpha/2)*std.err
CI.upper <- p.hat + qnorm(1 - alpha/2)*std.err
CI.result <- list("p.hat" = p.hat, "std.err" = std.err,
"CI.lower" = CI.lower, "CI.upper" = CI.upper)
return(CI.result)
}
prop.CI(30, 100)
## $p.hat
## [1] 0.3
##
## $std.err
## [1] 0.04582576
##
## $CI.lower
## [1] 0.2101832
##
## $CI.upper
## [1] 0.3898168
#invisible can be useful when you want to return an object but not print it.
#For example, we will update prop.CI to print some output and return the list
#but we don't want to print the list as well.
prop.CI <- function(x, n, conf.level = 0.95, quietly = TRUE) {
p.hat <- x/n
std.err <- sqrt(p.hat*(1-p.hat)/n)
alpha <- 1 - conf.level
CI.lower <- p.hat - qnorm(1 - alpha/2)*std.err
CI.upper <- p.hat + qnorm(1 - alpha/2)*std.err
CI.result <- list("p.hat" = p.hat, "std.err" = std.err,
"CI.lower" = CI.lower, "CI.upper" = CI.upper)
if (quietly == FALSE) {
cat("The sample proportion is ", p.hat, ".\n",
"The standard error of p-hat is ", std.err, ".\n",
"The ", 100*conf.level, "% confidence interval for p is (",
CI.lower, ",", CI.upper, ").", sep="")
}
invisible(CI.result)
}
prop.CI(30, 100) #note that nothing is printed
prop.CI(30, 100, quietly = FALSE) #Note that the list was not printed.
## The sample proportion is 0.3.
## The standard error of p-hat is 0.04582576.
## The 95% confidence interval for p is (0.2101832,0.3898168).
#but we could have saved it to use later
result <- prop.CI(30, 100, quietly = FALSE)
## The sample proportion is 0.3.
## The standard error of p-hat is 0.04582576.
## The 95% confidence interval for p is (0.2101832,0.3898168).
result
## $p.hat
## [1] 0.3
##
## $std.err
## [1] 0.04582576
##
## $CI.lower
## [1] 0.2101832
##
## $CI.upper
## [1] 0.3898168
A variables scope refers to where the variable is defined and can be accessed. This depends on where the variable was created and whether a local or global assignment was used.
v1 <- 3
#v2 doesn't exist outside this function
scope.fct <- function(a,b) {
#v1 can be seen in here
v2 <- a + b + v1
v1 <- v1 + 1
return(v2)
}
v1 #function hasn't been called yet so still 3
## [1] 3
#v2 #function was defined but v2 only exists within the function uncomment to see that it gives an error
scope.fct(2,3) #v1 was used in the function
## [1] 8
v1 #v1 unchanged by local assignment inside the function
## [1] 3
v1 <- 4
scope.fct(2,3) #changing v1 out here changes v1 inside the function
## [1] 9
#v2 #v2 still only exists inside the function uncomment to see that it gives an error
#global assignemnts allow us to make changes to variables that hold outside the current scope
v1 <- 3
#v2 doesn't exist outside this function
scope.fct2 <- function(a,b) {
#v1 can be seen in here
v2 <<- a + b + v1
v1 <<- v1 + 1
return(v2)
}
scope.fct2(2,3)
## [1] 8
v1 #now the addtion of 1 inside the function is visible out here
## [1] 4
v2 #v2 exists globally now
## [1] 8