Poisson Regression in SAS

If the response variable is a count, then we may want to use a Poisson regression model. In this model,

$$E(Y)=\mu\text{ is modeled by } \log(\mu)=\beta_0+\beta_1$$

This results in $\mu=e^{\beta_0} e^{\beta_1X}$. So a 1 unit increase in X has a multiplicative effect of $e^{\beta_1}$ on $\mu$. More frequently though, the counts are measured over some period of time. In this case, we are more intersted in the rate of occurence of this event. So instead of modeling $\mu$, we model $\lambda=\mu/t$. In this model, we assume that the response has a poisson distribution with $\mu=\lambda t$. (Note that this doesn't necessarily have to be time. We could have had a volume or other 'exposure unit'.) In the case of rates and a single X, we have the model

$$\log(\lambda)=\log\left(\dfrac{\mu}{t}\right)=\beta_0+\beta_1X$$

or

$$\log(\mu)=\log(t)+\beta_0+\beta_1X$$

The term $\log(t)$ is called an offset. Let's look at an example concerning bladder cancer. In this study, there were 31 male patients treated for bladder cancer. The outcome measured was $N=\text{ number of recurrent tumors}$. Also measure was

  • X: 0 = primary tumor < 3cm, 1 = primary tumor >= 3cm
  • time: time period in months in which the subject was observed
In [2]:
data bladder;
  input time x n;
  logtime=log(time);
cards;
2   0   1
3   0   1
6   0   1
8   0   1
9   0   1
10  0   1
11  0   1
13  0   1
14  0   1
16  0   1
21  0   1
22  0   1
24  0   1
26  0   1
27  0   1
7   0   2
13  0   2
15  0   2
18  0   2
23  0   2
20  0   3
24  0   4
1   1   1
5   1   1
17  1   1
18  1   1
25  1   1
18  1   2
25  1   2
4   1   3
19  1   4
;
run;

PROC PRINT DATA=bladder (obs=5);
RUN;
Out[2]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.BLADDER

Obs time x n logtime
1 2 0 1 0.69315
2 3 0 1 1.09861
3 6 0 1 1.79176
4 8 0 1 2.07944
5 9 0 1 2.19722

To fit a poisson model, we will use PROC GENMOD. Note that we have to calculate log(time) to put in the model as the offset. SAS does not do this for you.

In [4]:
proc genmod data=bladder;
  model n=x / offset=logtime dist=P link=log;
run;
Out[4]:
SAS Output

SAS Output

The SAS System

The GENMOD Procedure

The GENMOD Procedure

Model Information

Model Information
Data Set WORK.BLADDER
Distribution Poisson
Link Function Log
Dependent Variable n
Offset Variable logtime

Number of Observations

Number of Observations Read 31
Number of Observations Used 31

Criteria For Assessing Goodness Of Fit

Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 29 25.4189 0.8765
Scaled Deviance 29 25.4189 0.8765
Pearson Chi-Square 29 38.5938 1.3308
Scaled Pearson X2 29 38.5938 1.3308
Log Likelihood   -33.3234  
Full Log Likelihood   -48.1150  
AIC (smaller is better)   100.2301  
AICC (smaller is better)   100.6586  
BIC (smaller is better)   103.0980  

Convergence Status

Algorithm converged.

Analysis Of Parameter Estimates

Analysis Of Maximum Likelihood Parameter Estimates
Parameter DF Estimate Standard
Error
Wald 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept 1 -2.3394 0.1768 -2.6859 -1.9929 175.13 <.0001
x 1 0.2292 0.3062 -0.3709 0.8293 0.56 0.4541
Scale 0 1.0000 0.0000 1.0000 1.0000    

Note:The scale parameter was held fixed.

This results in the following model

$$\log(\hat{\mu})=-2.34+0.23X+\log(t)$$

This means the rate of tumor recurrence is modeled by

$$\hat{\lambda}=e^{-2.234+0.23X}$$

So for smaller tumors (X=0), the estimated monthly rate of tumor recurrence is $e^{-2.43}=0.096$. (Or 1/0.096=10.4 months per occurrence on average.) For larger tumors (X=1), it is $e^{-2.34+0.23}=0.121$ (or 8.3 months/tumor). The estimated rate ratio ($\lambda(large)/\lambda(small)$) is $e^{\hat{\beta_1}}=1.26$ showing an estimated 26\% increase in the rate of recurrent tumors for those with large baseline tumors.

In [6]:
proc genmod data=bladder;
  model n=x / offset=logtime dist=P link=log;
  estimate 'small' intercept 1 /exp; 
  estimate 'large' intercept 1 x 1 /exp;
  estimate 'rate ratio' x 1 /exp;
run;
Out[6]:
SAS Output

SAS Output

The SAS System

The GENMOD Procedure

The GENMOD Procedure

Model Information

Model Information
Data Set WORK.BLADDER
Distribution Poisson
Link Function Log
Dependent Variable n
Offset Variable logtime

Number of Observations

Number of Observations Read 31
Number of Observations Used 31

Parameter Information

Parameter Information
Parameter Effect
Prm1 Intercept
Prm2 x

Criteria For Assessing Goodness Of Fit

Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 29 25.4189 0.8765
Scaled Deviance 29 25.4189 0.8765
Pearson Chi-Square 29 38.5938 1.3308
Scaled Pearson X2 29 38.5938 1.3308
Log Likelihood   -33.3234  
Full Log Likelihood   -48.1150  
AIC (smaller is better)   100.2301  
AICC (smaller is better)   100.6586  
BIC (smaller is better)   103.0980  

Convergence Status

Algorithm converged.

Analysis Of Parameter Estimates

Analysis Of Maximum Likelihood Parameter Estimates
Parameter DF Estimate Standard
Error
Wald 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept 1 -2.3394 0.1768 -2.6859 -1.9929 175.13 <.0001
x 1 0.2292 0.3062 -0.3709 0.8293 0.56 0.4541
Scale 0 1.0000 0.0000 1.0000 1.0000    

Note:The scale parameter was held fixed.

ESTIMATE Statement Results

Contrast Estimate Results
Label Mean Estimate Mean L'Beta Estimate Standard
Error
Alpha L'Beta Chi-Square Pr > ChiSq
Confidence Limits Confidence Limits
small 0.0964 0.0682 0.1363 -2.3394 0.1768 0.05 -2.6859 -1.9929 175.13 <.0001
Exp(small)       0.0964 0.0170 0.05 0.0682 0.1363    
large 0.1212 0.0743 0.1979 -2.1102 0.2500 0.05 -2.6002 -1.6202 71.25 <.0001
Exp(large)       0.1212 0.0303 0.05 0.0743 0.1979    
rate ratio 1.2576 0.6901 2.2917 0.2292 0.3062 0.05 -0.3709 0.8293 0.56 0.4541
Exp(rate ratio)       1.2576 0.3851 0.05 0.6901 2.2917    

Logistic Regression with PROC GENMOD

In [10]:
Data ecg;
input ecg CA $ gend $ cnt;
datalines;
0 absence female_0 11
0 presence female_0 4
1 presence female_0 8
1 absence female_0 10
0 presence male_1 9
0 absence male_1 9
1 presence male_1 21
1 absence male_1 6
;
run;

PROC GENMOD DATA=ecg desc;
class gend (ref="female_0") / param=ref ref=first;
model CA=gend ecg / dist=b link = logit;
weight cnt;
run;
Out[10]:
SAS Output

SAS Output

The SAS System

The GENMOD Procedure

The GENMOD Procedure

Model Information

Model Information
Data Set WORK.ECG
Distribution Binomial
Link Function Logit
Dependent Variable CA
Scale Weight Variable cnt

Number of Observations

Number of Observations Read 8
Number of Observations Used 8
Sum of Weights 78
Number of Events 4
Number of Trials 8

Class Level Information

Class Level Information
Class Value Design Variables
gend female_0 0
  male_1 1

Response Profile

Response Profile
Ordered
Value
CA Total
Frequency
Total
Weight
1 presence 4 42
2 absence 4 36

PROC GENMOD is modeling the probability that CA='presence'.

Criteria For Assessing Goodness Of Fit

Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Log Likelihood   -47.9498  
Full Log Likelihood   -47.9498  
AIC (smaller is better)   101.8996  
AICC (smaller is better)   107.8996  
BIC (smaller is better)   102.1379  

Convergence Status

Algorithm converged.

Analysis Of Parameter Estimates

Analysis Of Maximum Likelihood Parameter Estimates
Parameter   DF Estimate Standard
Error
Wald 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept   1 -1.1747 0.4854 -2.1260 -0.2234 5.86 0.0155
gend male_1 1 1.2770 0.4980 0.3009 2.2530 6.58 0.0103
ecg   1 1.0545 0.4980 0.0785 2.0305 4.48 0.0342
Scale   0 1.0000 0.0000 1.0000 1.0000    

Note:The scale parameter was held fixed.

In [ ]: