Data Set WORK.BLADDER
Obs | time | x | n | logtime |
---|---|---|---|---|
1 | 2 | 0 | 1 | 0.69315 |
2 | 3 | 0 | 1 | 1.09861 |
3 | 6 | 0 | 1 | 1.79176 |
4 | 8 | 0 | 1 | 2.07944 |
5 | 9 | 0 | 1 | 2.19722 |
If the response variable is a count, then we may want to use a Poisson regression model. In this model,
$$E(Y)=\mu\text{ is modeled by } \log(\mu)=\beta_0+\beta_1$$This results in $\mu=e^{\beta_0} e^{\beta_1X}$. So a 1 unit increase in X has a multiplicative effect of $e^{\beta_1}$ on $\mu$. More frequently though, the counts are measured over some period of time. In this case, we are more intersted in the rate of occurence of this event. So instead of modeling $\mu$, we model $\lambda=\mu/t$. In this model, we assume that the response has a poisson distribution with $\mu=\lambda t$. (Note that this doesn't necessarily have to be time. We could have had a volume or other 'exposure unit'.) In the case of rates and a single X, we have the model
$$\log(\lambda)=\log\left(\dfrac{\mu}{t}\right)=\beta_0+\beta_1X$$or
$$\log(\mu)=\log(t)+\beta_0+\beta_1X$$The term $\log(t)$ is called an offset. Let's look at an example concerning bladder cancer. In this study, there were 31 male patients treated for bladder cancer. The outcome measured was $N=\text{ number of recurrent tumors}$. Also measure was
data bladder;
input time x n;
logtime=log(time);
cards;
2 0 1
3 0 1
6 0 1
8 0 1
9 0 1
10 0 1
11 0 1
13 0 1
14 0 1
16 0 1
21 0 1
22 0 1
24 0 1
26 0 1
27 0 1
7 0 2
13 0 2
15 0 2
18 0 2
23 0 2
20 0 3
24 0 4
1 1 1
5 1 1
17 1 1
18 1 1
25 1 1
18 1 2
25 1 2
4 1 3
19 1 4
;
run;
PROC PRINT DATA=bladder (obs=5);
RUN;
To fit a poisson model, we will use PROC GENMOD. Note that we have to calculate log(time) to put in the model as the offset. SAS does not do this for you.
proc genmod data=bladder;
model n=x / offset=logtime dist=P link=log;
run;
This results in the following model
$$\log(\hat{\mu})=-2.34+0.23X+\log(t)$$This means the rate of tumor recurrence is modeled by
$$\hat{\lambda}=e^{-2.234+0.23X}$$So for smaller tumors (X=0), the estimated monthly rate of tumor recurrence is $e^{-2.43}=0.096$. (Or 1/0.096=10.4 months per occurrence on average.) For larger tumors (X=1), it is $e^{-2.34+0.23}=0.121$ (or 8.3 months/tumor). The estimated rate ratio ($\lambda(large)/\lambda(small)$) is $e^{\hat{\beta_1}}=1.26$ showing an estimated 26\% increase in the rate of recurrent tumors for those with large baseline tumors.
proc genmod data=bladder;
model n=x / offset=logtime dist=P link=log;
estimate 'small' intercept 1 /exp;
estimate 'large' intercept 1 x 1 /exp;
estimate 'rate ratio' x 1 /exp;
run;
Data ecg;
input ecg CA $ gend $ cnt;
datalines;
0 absence female_0 11
0 presence female_0 4
1 presence female_0 8
1 absence female_0 10
0 presence male_1 9
0 absence male_1 9
1 presence male_1 21
1 absence male_1 6
;
run;
PROC GENMOD DATA=ecg desc;
class gend (ref="female_0") / param=ref ref=first;
model CA=gend ecg / dist=b link = logit;
weight cnt;
run;