Now it's your turn. Use the the skin cancer dataset (csv) to obtain the following output.

  1. Fit the linear regression model
$$MORT=\beta_0+\beta_1*Lat+\beta_2*Long+\beta_{12}Lat*Long+\varepsilon_i$$
  1. Obtain the confidence intervals for the regression parameters (the $\beta$'s).
  2. Create a plot of the residuals vs the fitted values.
  3. Create a histogram and QQ plot of the residuals.
  4. Calculate the confidence interval for the mean mortality rate and the prediction interval when Lat = 33 and Long = 86.
  5. Fit the reduced regression model without the interaction term.

Solutions

Questions 1-4 can be answered by using PROC REG with the CLB option in the MODEL statement. We will first need to calculate the product of Lat and Long for the interaction term.

In [2]:
LIBNAME Survey "H:\BiostatCourses\PHC6937SurveryBiostat\Lectures\MLR\Data";

PROC IMPORT datafile="H:\BiostatCourses\PHC6937SurveryBiostat\Lectures\MLR\Data\skincancer.csv"
out=Survey.cancer dbms=csv replace;
getnames=Yes;
RUN;

DATA cancer_temp;
SET survey.cancer;
LatLong = Lat*Long;
RUN;

PROC REG DATA=cancer_temp;
MODEL Mort = Lat Long LatLong / CLB;
RUN;
Out[2]:
SAS Output

SAS Output

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: Mort

The REG Procedure

MODEL1

Fit

Mort

Number of Observations

Number of Observations Read 49
Number of Observations Used 49

Analysis of Variance

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 3 36778 12259 32.72 <.0001
Error 45 16860 374.65766    
Corrected Total 48 53637      

Fit Statistics

Root MSE 19.35608 R-Square 0.6857
Dependent Mean 152.87755 Adj R-Sq 0.6647
Coeff Var 12.66116    

Parameter Estimates

Parameter Estimates
Variable DF Parameter
Estimate
Standard
Error
t Value Pr > |t| 95% Confidence Limits
Intercept 1 489.37255 185.66083 2.64 0.0115 115.43245 863.31265
Lat 1 -8.08381 4.49542 -1.80 0.0789 -17.13804 0.97043
Long 1 -1.10396 1.98941 -0.55 0.5817 -5.11084 2.90293
LatLong 1 0.02318 0.04794 0.48 0.6312 -0.07339 0.11974

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: Mort

Observation-wise Statistics

Mort

Diagnostic Plots

Fit Diagnostics

Panel of fit diagnostics for Mort.

Residual Plots

Panel 1

Panel of scatterplots of residuals by regressors for Mort.

For number 5, we need to add a row to the dataset with a missing value for Mort so we can use the output statement to get the confidence and prediction intervals.

In [6]:
DATA cancer_temp2;
INPUT Mort Lat Long LatLong;
DATALINES;
. 33 86 2838
;
RUN;

DATA cancer_temp;
SET cancer_temp cancer_temp2;
RUN;

ODS SELECT NONE;
PROC REG DATA=cancer_temp;
MODEL Mort = Lat Long LatLong / CLB;
OUTPUT OUT=pred(where=(Mort=.)) p=predicted lcl=UCL_Pred ucl=LCL_Pred 
LCLM=LCLM_Pred UCLM=UCLM_Pred;
RUN;
ODS SELECT ALL;

PROC PRINT DATA=pred;
VAR LAT LONG LATLONG predicted LCLM_Pred UCLM_Pred UCL_Pred LCL_Pred;
RUN;
Out[6]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.PRED

Obs Lat Long LatLong predicted LCLM_Pred UCLM_Pred UCL_Pred LCL_Pred
1 33 86 2838 193.439 182.645 204.233 152.987 233.891

For number 6, we need to run PROC REG again without the interaction term in the model statement.

In [3]:
PROC REG DATA=Survey.cancer;
MODEL Mort = Lat Long / CLB;
RUN;
Out[3]:
SAS Output

SAS Output

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: Mort

The REG Procedure

MODEL1

Fit

Mort

Number of Observations

Number of Observations Read 49
Number of Observations Used 49

Analysis of Variance

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 2 36690 18345 49.79 <.0001
Error 46 16947 368.41600    
Corrected Total 48 53637      

Fit Statistics

Root MSE 19.19417 R-Square 0.6840
Dependent Mean 152.87755 Adj R-Sq 0.6703
Coeff Var 12.55525    

Parameter Estimates

Parameter Estimates
Variable DF Parameter
Estimate
Standard
Error
t Value Pr > |t| 95% Confidence Limits
Intercept 1 400.67551 28.05118 14.28 <.0001 344.21142 457.13960
Lat 1 -5.93084 0.60381 -9.82 <.0001 -7.14625 -4.71542
Long 1 -0.14665 0.18727 -0.78 0.4376 -0.52362 0.23031

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: Mort

Observation-wise Statistics

Mort

Diagnostic Plots

Fit Diagnostics

Panel of fit diagnostics for Mort.

Residual Plots

Panel 1

Panel of scatterplots of residuals by regressors for Mort.
In [ ]: