Poisson Regression in SAS¶

If the response variable is a count, then we may want to use a Poisson regression model. In this model,

$$E(Y)=\mu\text{ is modeled by } \log(\mu)=\beta_0+\beta_1$$

This results in $\mu=e^{\beta_0} e^{\beta_1X}$. So a 1 unit increase in X has a multiplicative effect of $e^{\beta_1}$ on $\mu$. More frequently though, the counts are measured over some period of time. In this case, we are more intersted in the rate of occurence of this event. So instead of modeling $\mu$, we model $\lambda=\mu/t$. In this model, we assume that the response has a poisson distribution with $\mu=\lambda t$. (Note that this doesn't necessarily have to be time. We could have had a volume or other 'exposure unit'.) In the case of rates and a single X, we have the model

$$\log(\lambda)=\log\left(\dfrac{\mu}{t}\right)=\beta_0+\beta_1X$$

or

$$\log(\mu)=\log(t)+\beta_0+\beta_1X$$

The term $\log(t)$ is called an offset. Let's look at an example concerning bladder cancer. In this study, there were 31 male patients treated for bladder cancer. The outcome measured was $N=\text{ number of recurrent tumors}$. Also measure was

X: 0 = primary tumor < 3cm, 1 = primary tumor >= 3cm
time: time period in months in which the subject was observed

data bladder;
  input time x n;
  logtime=log(time);
cards;
2   0   1
3   0   1
6   0   1
8   0   1
9   0   1
10  0   1
11  0   1
13  0   1
14  0   1
16  0   1
21  0   1
22  0   1
24  0   1
26  0   1
27  0   1
7   0   2
13  0   2
15  0   2
18  0   2
23  0   2
20  0   3
24  0   4
1   1   1
5   1   1
17  1   1
18  1   1
25  1   1
18  1   2
25  1   2
4   1   3
19  1   4
;
run;

PROC PRINT DATA=bladder (obs=5);
RUN;

To fit a poisson model, we will use PROC GENMOD. Note that we have to calculate log(time) to put in the model as the offset. SAS does not do this for you.

proc genmod data=bladder;
  model n=x / offset=logtime dist=P link=log;
run;

This results in the following model

$$\log(\hat{\mu})=-2.34+0.23X+\log(t)$$

This means the rate of tumor recurrence is modeled by

$$\hat{\lambda}=e^{-2.234+0.23X}$$

So for smaller tumors (X=0), the estimated monthly rate of tumor recurrence is $e^{-2.43}=0.096$. (Or 1/0.096=10.4 months per occurrence on average.) For larger tumors (X=1), it is $e^{-2.34+0.23}=0.121$ (or 8.3 months/tumor). The estimated rate ratio ($\lambda(large)/\lambda(small)$) is $e^{\hat{\beta_1}}=1.26$ showing an estimated 26\% increase in the rate of recurrent tumors for those with large baseline tumors.

proc genmod data=bladder;
  model n=x / offset=logtime dist=P link=log;
  estimate 'small' intercept 1 /exp; 
  estimate 'large' intercept 1 x 1 /exp;
  estimate 'rate ratio' x 1 /exp;
run;

Logistic Regression with PROC GENMOD¶

Data ecg;
input ecg CA $ gend $ cnt;
datalines;
0 absence female_0 11
0 presence female_0 4
1 presence female_0 8
1 absence female_0 10
0 presence male_1 9
0 absence male_1 9
1 presence male_1 21
1 absence male_1 6
;
run;

PROC GENMOD DATA=ecg desc;
class gend (ref="female_0") / param=ref ref=first;
model CA=gend ecg / dist=b link = logit;
weight cnt;
run;

Obs	time	n	logtime
1	2	1	0.69315
2	3	1	1.09861
3	6	1	1.79176
4	8	1	2.07944
5	9	1	2.19722

Model Information
Data Set	WORK.BLADDER
Distribution	Poisson
Link Function	Log
Dependent Variable	n
Offset Variable	logtime

Criteria For Assessing Goodness Of Fit
Criterion	DF	Value	Value/DF
Deviance	29	25.4189	0.8765
Scaled Deviance	29	25.4189	0.8765
Pearson Chi-Square	29	38.5938	1.3308
Scaled Pearson X2	29	38.5938	1.3308
Log Likelihood		-33.3234
Full Log Likelihood		-48.1150
AIC (smaller is better)		100.2301
AICC (smaller is better)		100.6586
BIC (smaller is better)		103.0980

Analysis Of Maximum Likelihood Parameter Estimates
Parameter	DF	Estimate	Standard Error	Wald 95% Confidence Limits		Wald Chi-Square	Pr > ChiSq
Intercept	1	-2.3394	0.1768	-2.6859	-1.9929	175.13	<.0001
x	1	0.2292	0.3062	-0.3709	0.8293	0.56	0.4541
Scale	0	1.0000	0.0000	1.0000	1.0000

Model Information
Data Set	WORK.BLADDER
Distribution	Poisson
Link Function	Log
Dependent Variable	n
Offset Variable	logtime

Poisson Regression in SAS¶

SAS Output

The PRINT Procedure

Data Set WORK.BLADDER

SAS Output

The GENMOD Procedure

Model Information

Number of Observations

Criteria For Assessing Goodness Of Fit

Convergence Status

Analysis Of Parameter Estimates

SAS Output

The GENMOD Procedure

Model Information

Number of Observations

Parameter Information

Criteria For Assessing Goodness Of Fit

Convergence Status

Analysis Of Parameter Estimates

ESTIMATE Statement Results

Logistic Regression with PROC GENMOD¶

SAS Output

The GENMOD Procedure

Model Information

Number of Observations

Class Level Information

Response Profile

Criteria For Assessing Goodness Of Fit

Convergence Status

Analysis Of Parameter Estimates

Contrast Estimate Results
Label	Mean Estimate	Mean		L'Beta Estimate	Standard Error	Alpha	L'Beta		Chi-Square	Pr > ChiSq
Label	Mean Estimate	Confidence Limits		L'Beta Estimate	Standard Error	Alpha	Confidence Limits		Chi-Square	Pr > ChiSq
small	0.0964	0.0682	0.1363	-2.3394	0.1768	0.05	-2.6859	-1.9929	175.13	<.0001
Exp(small)				0.0964	0.0170	0.05	0.0682	0.1363
large	0.1212	0.0743	0.1979	-2.1102	0.2500	0.05	-2.6002	-1.6202	71.25	<.0001
Exp(large)				0.1212	0.0303	0.05	0.0743	0.1979
rate ratio	1.2576	0.6901	2.2917	0.2292	0.3062	0.05	-0.3709	0.8293	0.56	0.4541
Exp(rate ratio)				1.2576	0.3851	0.05	0.6901	2.2917

Model Information
Data Set	WORK.ECG
Distribution	Binomial
Link Function	Logit
Dependent Variable	CA
Scale Weight Variable	cnt

Number of Observations Read	8
Number of Observations Used	8
Sum of Weights	78
Number of Events	4
Number of Trials	8

Response Profile
Ordered Value	CA	Total Frequency	Total Weight
1	presence	4	42
2	absence	4	36