Let’s take a look at some examples, so we can get some practice with these interpretations.

Example 1. The skin cancer data set contain 49 observations on skin cancer mortatlity rates by state with the latitude (North/South position) at the center of the state. Let

We should always start by exploring our data with plots. In this case, since we are interested in regression, we need to make a scatterplot to see if there is an approximately linear relationship.

The scatterplot looks reasonably linear with a negative slope, so linear regression is appropriate. As latitude increases, the skin cancer mortality rate tendss to descrease. The following output is from R, where we fit the simple linear regression model

\[\widehat{Moratlity}=\beta_0+\beta_1Latitude\]

## 
## Call:
## lm(formula = Mort ~ Lat, data = skin.dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -38.972 -13.185   0.972  12.006  43.938 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 389.1894    23.8123   16.34  < 2e-16 ***
## Lat          -5.9776     0.5984   -9.99 3.31e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.12 on 47 degrees of freedom
## Multiple R-squared:  0.6798, Adjusted R-squared:  0.673 
## F-statistic:  99.8 on 1 and 47 DF,  p-value: 3.309e-13
## [1] "r =  -0.825"

The estimated regression equation is \(\widehat{Mort}=389.19-5.98Lat\) with \(R^2=67.98\%\) and correlation coefficient \(r=-0.825\).

Example 2: Teen Birth Rate and Poverty Level Data. This dataset contains 51 observations for the 50 states and Washington D.C. The variables are

The plot of the data below shows that the relationship is reasonably linear with a positive slope. As poverty level increases, the birth rate for 15 to 17 year old females tends to increase.

The output below if for the linear regression model

\[\widehat{Birth Rate}=\beta_0+\beta_1Poverty.\]

## 
## Call:
## lm(formula = Brth15to17 ~ PovPct, data = poverty.dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.2275  -3.6554  -0.0407   2.4972  10.5152 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.2673     2.5297   1.687    0.098 .  
## PovPct        1.3733     0.1835   7.483 1.19e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.551 on 49 degrees of freedom
## Multiple R-squared:  0.5333, Adjusted R-squared:  0.5238 
## F-statistic:    56 on 1 and 49 DF,  p-value: 1.188e-09
## [1] "r =  0.7303"

The estimated regression equation is \(\hat{y}=4.267+1.373x\) with \(R^2=53.33\%\) and \(0.7303\).

Learn By Doing: Lung Function in 6 to 10 Year Old Children

This dataset contain 345 observations on children between 6 and 10 years old. The variables are

Below is a scatterplot of age vs FEV along with the simple linear regression output. Use this output to answer the following questions.

## 
## Call:
## lm(formula = FEV ~ age, data = fev.dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.57539 -0.34567 -0.04989  0.32124  2.12786 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.431648   0.077895   5.541 4.36e-08 ***
## age         0.222041   0.007518  29.533  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5675 on 652 degrees of freedom
## Multiple R-squared:  0.5722, Adjusted R-squared:  0.5716 
## F-statistic: 872.2 on 1 and 652 DF,  p-value: < 2.2e-16
## [1] "r =  0.7565"

The estimated regression equation is \(\hat{y}=0.432 + 0.222x\) with \(R^2=57.22\%\) and \(r=0.7565\).

  1. Is linear regression appropriate?
  2. Interpret the slope.
  3. Interpret the y-intercept. Does the y-intercept interpretation make sense?
  4. Interpret \(R^2\).
  5. Interpret Pearson’s correlation coefficient, \(r\).

Solutions