Multiple Regression

The linear model has the following general form

[Y_i = \beta_0 + \beta_1 X_1 + \cdots +\beta_p X_p +\varepsilon_i]

  • $Y_i$: the response for the ith subect
  • $X_j$: The jth covariate (predictor)
  • $\beta_0$: intercept
  • $\beta_l, j=1,\ldots,p$: The jth regression coefficient
  • $\varepsilon_i$: Error term for subject i

The linear model has the following assumptions:

  1. Independence: Observations are not related and do not influence each other
  2. Linearity: There is a true underlying linear relationship between the mean of the response and the predictors.
  3. Normality: At any given value of the predictors, the response variable is normally distribute. (Equivalently, the error term is normally distributed)
  4. Homoscedasticity: The variance of y is constant for all values of the predictors

Lets examine whether the relationship between systolic blood pressure and the Quetelet index (BMI) by fitting the simple linear regression model [y_i=\beat_0 + \beta_1*Quet +\varepsilon_i.]

In [2]:
LIBNAME mreg "H:\BiostatCourses\PublicHealthComputing\Lectures\Week9MultipleReg\SAS";

PROC REG DATA=mreg.sbp_quet;
MODEL SBP = quet / CLM CLI CLB;
OUTPUT OUT = diag r = residuals;
RUN;
QUIT;

/* Test for normality of the residuals */
PROC UNIVARIATE DATA=diag normal;
var residuals;
RUN;
Out[2]:
SAS Output

SAS Output

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: SBP SBP

The REG Procedure

MODEL1

Fit

SBP

Number of Observations

Number of Observations Read 32
Number of Observations Used 32

Analysis of Variance

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 1 3537.94574 3537.94574 36.75 <.0001
Error 30 2888.02301 96.26743    
Corrected Total 31 6425.96875      

Fit Statistics

Root MSE 9.81160 R-Square 0.5506
Dependent Mean 144.53125 Adj R-Sq 0.5356
Coeff Var 6.78856    

Parameter Estimates

Parameter Estimates
Variable Label DF Parameter
Estimate
Standard
Error
t Value Pr > |t| 95% Confidence Limits
Intercept Intercept 1 85.62057 9.87116 8.67 <.0001 65.46098 105.78016
QUET QUET 1 2.14917 0.35451 6.06 <.0001 1.42515 2.87318

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: SBP SBP

Observation-wise Statistics

SBP

Output Statistics

Output Statistics
Obs Dependent
Variable
Predicted
Value
Std
Error
Mean
Predict
95% CL Mean 95% CL Predict Residual
1 135 132.3864 2.6499 126.9747 137.7982 111.6306 153.1423 2.6136
2 122 140.4458 1.8608 136.6456 144.2460 120.0507 160.8409 -18.4458
3 130 137.2006 2.1144 132.8824 141.5187 116.7026 157.6985 -7.2006
4 148 151.5570 2.0860 147.2968 155.8172 131.0712 172.0428 -3.5570
5 146 134.6001 2.3858 129.7276 139.4725 113.9782 155.2219 11.3999
6 129 130.5382 2.8873 124.6416 136.4347 109.6506 151.4257 -1.5382
7 162 149.4078 1.9119 145.5032 153.3125 128.9930 169.8227 12.5922
8 160 148.2043 1.8372 144.4522 151.9565 127.8181 168.5905 11.7957
9 144 121.4687 4.1810 112.9299 130.0074 99.6873 143.2501 22.5313
10 180 170.2333 4.5807 160.8782 179.5884 148.1191 192.3475 9.7667
11 166 153.8996 2.3230 149.1553 158.6439 133.3077 174.4915 12.1004
12 138 157.2308 2.7197 151.6764 162.7852 136.4373 178.0243 -19.2308
13 152 159.0361 2.9552 153.0008 165.0714 138.1090 179.9632 -7.0361
14 138 149.5153 1.9194 145.5953 153.4353 129.0975 169.9331 -11.5153
15 140 147.1297 1.7866 143.4809 150.7785 126.7623 167.4972 -7.1297
16 134 135.0084 2.3401 130.2294 139.7875 114.4085 155.6084 -1.0084
17 145 142.7884 1.7581 139.1978 146.3790 122.4313 163.1455 2.2116
18 142 135.5672 2.2792 130.9124 140.2220 114.9957 156.1387 6.4328
19 135 138.7265 1.9812 134.6803 142.7727 118.2841 159.1689 -3.7265
20 142 143.6696 1.7403 140.1155 147.2237 123.3189 164.0203 -1.6696
21 150 148.5482 1.8567 144.7562 152.3401 128.1546 168.9418 1.4518
22 144 151.1917 2.0531 146.9986 155.3847 130.7197 171.6636 -7.1917
23 137 141.4129 1.8091 137.7182 145.1077 121.0372 161.7887 -4.4129
24 132 139.5647 1.9182 135.6471 143.4822 119.1473 159.9820 -7.5647
25 149 141.5204 1.8042 137.8358 145.2050 121.1465 161.8943 7.4796
26 132 135.4168 2.2954 130.7290 140.1046 114.8378 155.9958 -3.4168
27 120 130.5167 2.8901 124.6143 136.4190 109.6275 151.4058 -10.5167
28 126 134.1058 2.4425 129.1175 139.0940 113.4563 154.7553 -8.1058
29 161 152.2447 2.1511 147.8516 156.6379 131.7309 172.7586 8.7553
30 170 159.3800 3.0013 153.2505 165.5094 138.4255 180.3344 10.6200
31 152 155.7264 2.5335 150.5523 160.9005 135.0312 176.4216 -3.7264
32 164 156.7580 2.6601 151.3254 162.1906 135.9967 177.5193 7.2420

Residual Statistics

Sum of Residuals 0
Sum of Squared Residuals 2888.02301
Predicted Residual SS (PRESS) 3476.56899

Diagnostic Plots

Fit Diagnostics

Panel of fit diagnostics for SBP.

Residual Plots

QUET

Scatter plot of residuals by QUET for SBP.

Fit Plot

Scatterplot of SBP by QUET overlaid with the fit line, a 95% confidence band and lower and upper 95% prediction limits.

The SAS System

The UNIVARIATE Procedure

Variable: residuals (Residual)

The UNIVARIATE Procedure

residuals

Moments

Moments
N 32 Sum Weights 32
Mean 0 Sum Observations 0
Std Deviation 9.6520481 Variance 93.1620326
Skewness 0.13173666 Kurtosis -0.2581232
Uncorrected SS 2888.02301 Corrected SS 2888.02301
Coeff Variation . Std Error Mean 1.70625717

Basic Measures of Location and Variability

Basic Statistical Measures
Location Variability
Mean 0.00000 Std Deviation 9.65205
Median -1.60386 Variance 93.16203
Mode . Range 41.76214
    Interquartile Range 15.27812

Tests For Location

Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 0 Pr > |t| 1.0000
Sign M -2 Pr >= |M| 0.5966
Signed Rank S 4 Pr >= |S| 0.9418

Tests For Normality

Tests for Normality
Test Statistic p Value
Shapiro-Wilk W 0.97006 Pr < W 0.5011
Kolmogorov-Smirnov D 0.107078 Pr > D >0.1500
Cramer-von Mises W-Sq 0.072792 Pr > W-Sq >0.2500
Anderson-Darling A-Sq 0.432666 Pr > A-Sq >0.2500

Quantiles

Quantiles (Definition 5)
Level Quantile
100% Max 22.53133
99% 22.53133
95% 12.59216
90% 11.79569
75% Q3 8.11743
50% Median -1.60386
25% Q1 -7.16069
10% -10.51667
5% -18.44582
1% -19.23081
0% Min -19.23081

Extreme Observations

Extreme Observations
Lowest Highest
Value Obs Value Obs
-19.23081 12 11.3999 5
-18.44582 2 11.7957 8
-11.51530 14 12.1004 11
-10.51667 27 12.5922 7
-8.10578 28 22.5313 9

Let's examine the regression assumptions for this example.

  1. Independence: In this case, this assumption is unknown since we do not know how to subjects were sampled.
  2. Linearity: Looking at the scatterplot given in the fit plot, we can see that the relationship between systolic blood pressure and BMI does look linear.
  3. Normality: The QQ-plot and histogram of residuals show no major deviations from normality. Furthermore, the Shapiro-Wilks and Kolmogorov-Smirnov tests do not show any evidence of deviation from normality with p-values 0.5011 and >0.15 respectively.
  4. Homoscedasticity: Th eplots of residuals vs predicted values and the original scatter plot show no evidence of non-constant variance (usually seen as "fanning out").

The ANOVA F-test and t-test (equivalently in the simple linear regression case) show that the relationship is significant with an estimated $\hat{\beta}_{quet}=2.15$. This means that for each 1 unit increase in BMI (Quetelet index), the mean systolic blood pressure will increase by 2.15 mmHg. The CLB option in the model statement provides 95% confidence intervals for the regression coefficients and intercept. The 95% confidence interval for $\beta_{quet}$ is (1.43,2.87). The CLM option provides confidence intervals for the mean response $E(Y|X)$ and the CLI option provides prediction intervals.

Note that in this case, $R^2=0.5506$, so 55.06% of the variation in systolic blood pressure is explained by the linear regression on the Quetelet index (BMI). Maybe if we control for some other covariates, we can develop a better model. Let's add in age and see if the model fits better.

In [5]:
PROC SGSCATTER DATA=mreg.sbp_quet;
matrix sbp age quet/ diagonal = (histogram kernel);
RUN;

PROC REG DATA=mreg.sbp_quet;
MODEL SBP = quet age;
RUN;
QUIT;
Out[5]:
SAS Output

SAS Output

The SGSCATTER Procedure

The SGScatter Procedure

The SGScatter Procedure

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: SBP SBP

The REG Procedure

MODEL1

Fit

SBP

Number of Observations

Number of Observations Read 32
Number of Observations Used 32

Analysis of Variance

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 2 4120.59225 2060.29612 25.92 <.0001
Error 29 2305.37650 79.49574    
Corrected Total 31 6425.96875      

Fit Statistics

Root MSE 8.91604 R-Square 0.6412
Dependent Mean 144.53125 Adj R-Sq 0.6165
Coeff Var 6.16893    

Parameter Estimates

Parameter Estimates
Variable Label DF Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept Intercept 1 62.14895 12.47519 4.98 <.0001
QUET QUET 1 0.97507 0.54025 1.80 0.0815
AGE AGE 1 1.04516 0.38606 2.71 0.0113

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: SBP SBP

Observation-wise Statistics

SBP

Diagnostic Plots

Fit Diagnostics

Panel of fit diagnostics for SBP.

Residual Plots

Panel 1

Panel of scatterplots of residuals by regressors for SBP.

The new regression equation is [\widehat{SBP}=62.15 + 0.98Quet + 1.05Age]

  • Note that the Quet regression coefficient deacreased with the inclusion of age
  • As always, $R^2$ increased when adding a new cariable, but the adjusted $R^2$ also increased indicating that age has at least slighlty improved the model. (The adjusted $R^2$ has a penalty term for the number of predictors.)

For the next model, let's consider a categorical predictor, smoking (0 = no, 1 = yes).

If we want to evaluate the regression equation at a particular point, we could use the ESTIMATE statment in PROC GLM.

In [5]:
PROC GLM DATA = mreg.sbp_quet;
MODEL SBP = quet age / solution;
ESTIMATE 'Quet = 20 Age = 25' intercept 1 quet 20 age 25;
RUN;
QUIT;
Out[5]:
SAS Output

SAS Output

The SAS System

The GLM Procedure

The GLM Procedure

Data

Number of Observations

Number of Observations Read 32
Number of Observations Used 32

The SAS System

The GLM Procedure

Dependent Variable: SBP SBP

Analysis of Variance

SBP

Overall ANOVA

Source DF Sum of Squares Mean Square F Value Pr > F
Model 2 4120.592245 2060.296123 25.92 <.0001
Error 29 2305.376505 79.495742    
Corrected Total 31 6425.968750      

Fit Statistics

R-Square Coeff Var Root MSE SBP Mean
0.641241 6.168935 8.916038 144.5313

Type I Model ANOVA

Source DF Type I SS Mean Square F Value Pr > F
QUET 1 3537.945739 3537.945739 44.50 <.0001
AGE 1 582.646506 582.646506 7.33 0.0113

Type III Model ANOVA

Source DF Type III SS Mean Square F Value Pr > F
QUET 1 258.9618700 258.9618700 3.26 0.0815
AGE 1 582.6465058 582.6465058 7.33 0.0113

Estimates

Parameter Estimate Standard
Error
t Value Pr > |t|
Quet = 20 Age = 25 107.779347 8.20687420 13.13 <.0001

Solution

Parameter Estimate Standard
Error
t Value Pr > |t|
Intercept 62.14894871 12.47519150 4.98 <.0001
QUET 0.97507319 0.54024560 1.80 0.0815
AGE 1.04515739 0.38605667 2.71 0.0113

Contour Fit Plot

Contour Fit Plot for SBP
In [6]:
PROC REG DATA=mreg.sbp_quet;
MODEL SBP = quet smk;
RUN;
QUIT;
Out[6]:
SAS Output

SAS Output

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: SBP SBP

The REG Procedure

MODEL1

Fit

SBP

Number of Observations

Number of Observations Read 32
Number of Observations Used 32

Analysis of Variance

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 2 4120.36649 2060.18325 25.91 <.0001
Error 29 2305.60226 79.50353    
Corrected Total 31 6425.96875      

Fit Statistics

Root MSE 8.91647 R-Square 0.6412
Dependent Mean 144.53125 Adj R-Sq 0.6165
Coeff Var 6.16924    

Parameter Estimates

Parameter Estimates
Variable Label DF Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept Intercept 1 79.35696 9.26430 8.57 <.0001
QUET QUET 1 2.21156 0.32300 6.85 <.0001
SMK SMK 1 8.57101 3.16670 2.71 0.0113

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: SBP SBP

Observation-wise Statistics

SBP

Diagnostic Plots

Fit Diagnostics

Panel of fit diagnostics for SBP.

Residual Plots

Panel 1

Panel of scatterplots of residuals by regressors for SBP.
In [9]:
PROC GLM DATA=mreg.sbp_quet;
CLASS smk (ref = "0");
MODEL SBP = quet smk / solution;
RUN;
QUIT;
Out[9]:
SAS Output

SAS Output

The SAS System

The GLM Procedure

The GLM Procedure

Data

Class Levels

Class Level Information
Class Levels Values
SMK 2 1 0

Number of Observations

Number of Observations Read 32
Number of Observations Used 32

The SAS System

The GLM Procedure

Dependent Variable: SBP SBP

Analysis of Variance

SBP

Overall ANOVA

Source DF Sum of Squares Mean Square F Value Pr > F
Model 2 4120.366493 2060.183247 25.91 <.0001
Error 29 2305.602257 79.503526    
Corrected Total 31 6425.968750      

Fit Statistics

R-Square Coeff Var Root MSE SBP Mean
0.641205 6.169237 8.916475 144.5313

Type I Model ANOVA

Source DF Type I SS Mean Square F Value Pr > F
QUET 1 3537.945739 3537.945739 44.50 <.0001
SMK 1 582.420754 582.420754 7.33 0.0113

Type III Model ANOVA

Source DF Type III SS Mean Square F Value Pr > F
QUET 1 3727.268332 3727.268332 46.88 <.0001
SMK 1 582.420754 582.420754 7.33 0.0113

Solution

Parameter Estimate   Standard
Error
t Value Pr > |t|
Intercept 79.35695590 B 9.26429554 8.57 <.0001
QUET 2.21156035   0.32299564 6.85 <.0001
SMK 1 8.57101456 B 3.16670062 2.71 0.0113
SMK 0 0.00000000 B . . .

Note:The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

ANCOVA Plot

Analysis of Covariance for SBP by QUET categorized by SMK

With this categorical predictor, we get two regression equations: one for smokers and one for non-smokers.

  • $\widehat{SBP} = 79.36 + 2.21*Quet$ (non-smokers)
  • $\widehat{SBP} = 87.93 + 2.21*Quet$ (smokers)

Note that in this model, the slopes for the two euqations are forced to be the same. If we want to allow the effect of Quetelet score on SBP to differ between smokers and non-smokers, then we will need to include an interaction term.

In [10]:
PROC GLM DATA=mreg.sbp_quet;
CLASS smk (ref = "0");
MODEL SBP = quet|smk / solution;
RUN;
QUIT;
Out[10]:
SAS Output

SAS Output

The SAS System

The GLM Procedure

The GLM Procedure

Data

Class Levels

Class Level Information
Class Levels Values
SMK 2 1 0

Number of Observations

Number of Observations Read 32
Number of Observations Used 32

The SAS System

The GLM Procedure

Dependent Variable: SBP SBP

Analysis of Variance

SBP

Overall ANOVA

Source DF Sum of Squares Mean Square F Value Pr > F
Model 3 4184.107589 1394.702530 17.42 <.0001
Error 28 2241.861161 80.066470    
Corrected Total 31 6425.968750      

Fit Statistics

R-Square Coeff Var Root MSE SBP Mean
0.651125 6.191040 8.947987 144.5313

Type I Model ANOVA

Source DF Type I SS Mean Square F Value Pr > F
QUET 1 3537.945739 3537.945739 44.19 <.0001
SMK 1 582.420754 582.420754 7.27 0.0117
QUET*SMK 1 63.741095 63.741095 0.80 0.3799

Type III Model ANOVA

Source DF Type III SS Mean Square F Value Pr > F
QUET 1 3590.846203 3590.846203 44.85 <.0001
SMK 1 140.094758 140.094758 1.75 0.1966
QUET*SMK 1 63.741095 63.741095 0.80 0.3799

Solution

Parameter Estimate   Standard
Error
t Value Pr > |t|
Intercept 67.72373692 B 16.01336515 4.23 0.0002
QUET 2.63028254 B 0.57034924 4.61 <.0001
SMK 1 25.61422160 B 19.36402171 1.32 0.1966
SMK 0 0.00000000 B . . .
QUET*SMK 1 -0.61847849 B 0.69317067 -0.89 0.3799
QUET*SMK 0 0.00000000 B . . .

Note:The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

ANCOVA Plot

Analysis of Covariance for SBP by QUET categorized by SMK

Now we have the following two regression equations for smokers and non-smokers

  • $\widehat{SBP} = 67.72 + 2.63*Quet$ (non-smokers)
  • $\widehat{SBP} = 93.33 + 2.01*Quet$ (smokers)

Model Selections

  • Ideally, there is a physiologic model for how the process works, so you can just specify those terms in the model and be done.
  • Other times, usually in the exploratory phase, we have many possible models
    • We may want a parsimonious (few covariates) with good predictions
    • Determine which predictors explain indepndent proportions of the variability in the outcome.
    • It depends on your goal: Are you interested in prediction? Are you interested in indentifying a group of important predctors?
  • We will look at a few common variable selection techniques available in SAS
    • Forward selection
    • Backward selection
    • Stepwise regression
    • All subsets
In [12]:
/* Forward Selection */
PROC REG DATA=mreg.sbp_quet;
model sbp = age smk quet / selection = f sle = 0.05;
RUN;
QUIT;
Out[12]:
SAS Output

SAS Output

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: SBP SBP

The REG Procedure

MODEL1

Forward Selection Method

SBP

Number of Observations

Number of Observations Read 32
Number of Observations Used 32

Step 1

Forward Selection: Step 1

ANOVA

Variable AGE Entered: R-Square = 0.6009 and C(p) = 18.7414

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 1 3861.63038 3861.63038 45.18 <.0001
Error 30 2564.33838 85.47795    
Corrected Total 31 6425.96875      

Parameter Estimates

Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 59.09163 12.81626 1817.11840 21.26 <.0001
AGE 1.60450 0.23872 3861.63038 45.18 <.0001

Bounds on condition number: 1, 1

Step 2

Forward Selection: Step 2

ANOVA

Variable SMK Entered: R-Square = 0.7298 and C(p) = 5.6481

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 2 4689.68423 2344.84211 39.16 <.0001
Error 29 1736.28452 59.87188    
Corrected Total 31 6425.96875      

Parameter Estimates

Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 48.04960 11.12956 1115.95464 18.64 0.0002
AGE 1.70916 0.20176 4296.58607 71.76 <.0001
SMK 10.29439 2.76811 828.05385 13.83 0.0009

Bounds on condition number: 1.0198, 4.0794

No other variable met the 0.0500 significance level for entry into the model.

Selection Summary

Summary of Forward Selection
Step Variable
Entered
Label Number
Vars In
Partial
R-Square
Model
R-Square
C(p) F Value Pr > F
1 AGE AGE 1 0.6009 0.6009 18.7414 45.18 <.0001
2 SMK SMK 2 0.1289 0.7298 5.6481 13.83 0.0009

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: SBP SBP

Observation-wise Statistics

SBP

Diagnostic Plots

Fit Diagnostics

Panel of fit diagnostics for SBP.

Residual Plots

Panel 1

Panel of scatterplots of residuals by regressors for SBP.
Procedure
In [13]:
/* Backward Selection */
PROC REG DATA=mreg.sbp_quet;
model sbp = age smk quet / selection = b sls = 0.05;
RUN;
QUIT;
Out[13]:
SAS Output

SAS Output

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: SBP SBP

The REG Procedure

MODEL1

Backward Elimination Method

SBP

Number of Observations

Number of Observations Read 32
Number of Observations Used 32

Step 0

Backward Elimination: Step 0

ANOVA

All Variables Entered: R-Square = 0.7609 and C(p) = 4.0000

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 3 4889.82570 1629.94190 29.71 <.0001
Error 28 1536.14305 54.86225    
Corrected Total 31 6425.96875      

Parameter Estimates

Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 51.11791 10.77421 1234.94960 22.51 <.0001
AGE 1.21271 0.32382 769.45920 14.03 0.0008
SMK 9.94557 2.65606 769.23345 14.02 0.0008
QUET 0.85924 0.44987 200.14147 3.65 0.0664

Bounds on condition number: 2.867, 20.152

Step 1

Backward Elimination: Step 1

ANOVA

Variable QUET Removed: R-Square = 0.7298 and C(p) = 5.6481

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 2 4689.68423 2344.84211 39.16 <.0001
Error 29 1736.28452 59.87188    
Corrected Total 31 6425.96875      

Parameter Estimates

Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 48.04960 11.12956 1115.95464 18.64 0.0002
AGE 1.70916 0.20176 4296.58607 71.76 <.0001
SMK 10.29439 2.76811 828.05385 13.83 0.0009

Bounds on condition number: 1.0198, 4.0794

All variables left in the model are significant at the 0.0500 level.

Selection Summary

Summary of Backward Elimination
Step Variable
Removed
Label Number
Vars In
Partial
R-Square
Model
R-Square
C(p) F Value Pr > F
1 QUET QUET 2 0.0311 0.7298 5.6481 3.65 0.0664

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: SBP SBP

Observation-wise Statistics

SBP

Diagnostic Plots

Fit Diagnostics

Panel of fit diagnostics for SBP.

Residual Plots

Panel 1

Panel of scatterplots of residuals by regressors for SBP.
Procedure
In [14]:
/* Stepwise Selection */
PROC REG DATA=mreg.sbp_quet;
model sbp = age smk quet / selection = stepwise sls =0.05 sle = 0.05;
RUN;
QUIT;
Out[14]:
SAS Output

SAS Output

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: SBP SBP

The REG Procedure

MODEL1

Stepwise Selection Method

SBP

Number of Observations

Number of Observations Read 32
Number of Observations Used 32

Step 1

Stepwise Selection: Step 1

ANOVA

Variable AGE Entered: R-Square = 0.6009 and C(p) = 18.7414

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 1 3861.63038 3861.63038 45.18 <.0001
Error 30 2564.33838 85.47795    
Corrected Total 31 6425.96875      

Parameter Estimates

Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 59.09163 12.81626 1817.11840 21.26 <.0001
AGE 1.60450 0.23872 3861.63038 45.18 <.0001

Bounds on condition number: 1, 1

Step 2

Stepwise Selection: Step 2

ANOVA

Variable SMK Entered: R-Square = 0.7298 and C(p) = 5.6481

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 2 4689.68423 2344.84211 39.16 <.0001
Error 29 1736.28452 59.87188    
Corrected Total 31 6425.96875      

Parameter Estimates

Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 48.04960 11.12956 1115.95464 18.64 0.0002
AGE 1.70916 0.20176 4296.58607 71.76 <.0001
SMK 10.29439 2.76811 828.05385 13.83 0.0009

Bounds on condition number: 1.0198, 4.0794

All variables left in the model are significant at the 0.0500 level.

No other variable met the 0.0500 significance level for entry into the model.

Selection Summary

Summary of Stepwise Selection
Step Variable
Entered
Variable
Removed
Label Number
Vars In
Partial
R-Square
Model
R-Square
C(p) F Value Pr > F
1 AGE   AGE 1 0.6009 0.6009 18.7414 45.18 <.0001
2 SMK   SMK 2 0.1289 0.7298 5.6481 13.83 0.0009

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: SBP SBP

Observation-wise Statistics

SBP

Diagnostic Plots

Fit Diagnostics

Panel of fit diagnostics for SBP.

Residual Plots

Panel 1

Panel of scatterplots of residuals by regressors for SBP.
Procedure
In [16]:
/* All subset selection */
/* This uses Mallow's C_p: lower C_p is better */
PROC REG DATA=mreg.sbp_quet;
model sbp = age smk quet / selection = cp BEST = 8;
RUN;
QUIT;
Out[16]:
SAS Output

SAS Output

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: SBP

C(p) Selection Method

The REG Procedure

MODEL1

C(p) Selection Method

SBP

Number of Observations

Number of Observations Read 32
Number of Observations Used 32

Results

Number in
Model
C(p) R-Square Variables in Model
3 4.0000 0.7609 AGE SMK QUET
2 5.6481 0.7298 AGE SMK
2 16.0212 0.6412 AGE QUET
2 16.0253 0.6412 SMK QUET
1 18.7414 0.6009 AGE
1 24.6414 0.5506 QUET
1 81.9640 0.0612 SMK


The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: SBP SBP

Observation-wise Statistics

SBP

Diagnostic Plots

Fit Diagnostics

Panel of fit diagnostics for SBP.

Residual Plots

Panel 1

Panel of scatterplots of residuals by regressors for SBP.