The R package ggplot2 is built for statistical graphics based on the grammar of graphics of Wilkinson. The grammar of graphics allows plots to be built in layers from different objects such as

qplot

The ggplot2 package as a function qplot which stands for “quick plot”, and this function is built similarly to R’s plot function. The basic syntax is the same qplot(x,y)

library(ggplot2)
qplot(SYSBP,DIASBP,data=whas500)

qplot(SYSBP,log(DIASBP),data=whas500)

When applying color and plotting characters to different subgroups of the data, you need to manually provide a vector of colors or plotting characters for each observation using plot. qplot will do this for you and create a legend. In the following plot, we look at the different subgroups defined by gender and cvd (history of cardiovascular disease 0 = no 1 = yes)

qplot(SYSBP,log(DIASBP),data=whas500,colour=GENDER,shape=CVD)

Colour, size and shape are all examples of aesthetic attributes, visual properties that affect the way observations are displayed. For each aesthetic attribute, there is a function called a scale which maps data values to valid values for that aesthetic. For example, in the above plot the scale for colour maps male to red and female to blue.

We can also add regression lines and smoothers using the geom options

qplot(SYSBP, DIASBP,data=whas500,geom=c("point","smooth"),method="loess",span=0.4,se=FALSE) #adds loess curve to scatterplot
## Warning: Ignoring unknown parameters: method, span, se

qplot(SYSBP, DIASBP,data=whas500,geom=c("point","smooth"),method="lm") #adds linear regression line
## Warning: Ignoring unknown parameters: method

Just like with plot, we are not restricted to only scatterplots. We can make different types of plots by using a different geom.

qplot(GENDER,DIASBP,data=whas500,geom="boxplot",ylab="Diastolic Blood Pressure (mmHg)") #side by side boxplots

qplot(DIASBP,data=whas500,geom="histogram",bins=50)

qplot(DIASBP,data=whas500,geom="histogram",binwidth=5)

qplot(DIASBP,data=whas500,geom="histogram",binwidth=10)

qplot(DIASBP,data=whas500,geom="histogram",binwidth=15)

qplot(DIASBP,data=whas500,geom="histogram",binwidth=30)

qplot(DIASBP,data=whas500,geom="density",adjust=0.2)

qplot(DIASBP,data=whas500,geom="density",adjust=0.4)

qplot(DIASBP,data=whas500,geom="density",adjust=0.6)

qplot(DIASBP,data=whas500,geom="density",adjust=0.8)

Faceting is used to create grids of plots based on subgroups.

qplot(DIASBP,..density..,data=whas500,facets = GENDER ~ CVD,geom="histogram",bins=20)

Other options familiar from plot: main, xlab, ylab, xlim, ylim. To add more to the plot, we will need to use layers. (Note that qplot is not generic like plot, so it cannot be applied to any R object.)

The Power of ggplot2: Layers

The function qplot is useful for quick plots, but it does not use the full power of ggplot2. The grammar of graphics allows for plots to be built in layers and to update different pieces of the graphic without generating an entirely new plot. Here, we will look at a few examples of using a single ggplot starting with the dataset and updating different aspects to form different types of plots and adding plots onto other plots.

p <- ggplot(whas500,aes(x=SYSBP,y=DIASBP))
p

Note that nothing appears in the plot. We have defined what will go on the x and y axis and the dataset to read from, but we have not defined the geom to plot.

p2 <- p + geom_point()
p2

Now that we have added the geom_point, a scatterlot is made. We can also add on a regression line by using geom_smooth.

p3 <- p2 + geom_smooth(method="lm")
p3

Let’s update some of the aesthetics like color and shape.

p4 <- p3 + aes(shape=CVD,color=GENDER,group=1)
p4

Let’s add a regression line for each gender.

p5 <- p4 + geom_smooth(aes(group=GENDER),method="lm",se=F)
p5

Now, let’s go back and make a boxplot of SYSBP by GENDER

p6 <- p + aes(x=GENDER,y=SYSBP) + geom_boxplot(fill=I(c("red","blue")))
p6

Controlling Legends, Labels, and Axes

Adding labels to the axes is so common that ggplot has helper functions: xlab, ylab, and labs.

p.lab <- p2 + labs(x="Systolic Blood Pressure (mmHg)",y="Diastolic Blood Pressure (mmHg)")
p.lab

For axis, we have two different functions: scale_x_discrete (for categrical data) or scale_x_continuous (for contiuous data) with similar functions for the y-axis. In this function we can adjust the tick marks on the axis with breaks, the labels at the tick marks with labels, and the range of the axis with limits.

p.axis <- p.lab + scale_x_continuous(breaks=seq(50,250,by=25)) + scale_y_continuous(breaks=seq(0,200,by=25))
p.axis

p6 + scale_x_discrete(breaks=c("male","female"),labels=c("M","F"))

What goes in the legend is controlled by the scale of the color, shape or whatever structure is being displayed in the legend. This can be generated by default or you can set the labels and colors/shapes manually.

p3 + aes(shape=CVD,color=GENDER) + 
  scale_color_manual(breaks=c("male","female"),labels=c("M","F"),values=I(c("purple","green"))) +
  ggtitle("Worcester Heart Attack Study")

For more on ggplot2, see the book ggplot2: Elegeant Graphics for Data Analysis by Hadley Wickham which is freely avaiable through UF on springer link. There are many more options to explore such as themes for changing font syles and size, bakcground and border colos and many other polishing features of the plot. There are also other topics such as using the viewport to make panles of plots (similar to mfrow/mfcol in par for regular plot).