Statistics
treat | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|
1 | 57 | 0.4430 | 0.3981 | 0.0527 | 0 | 1.0000 |
2 | 54 | 0.6852 | 0.3704 | 0.0504 | 0 | 1.0000 |
Diff (1-2) | -0.2422 | 0.3849 | 0.0731 |
When dealing with correlated data, it is important to account for this dependence structure. Most of the standard methods we have discussed assume a random sample (independent units). If this assumption is violated (as it is for correlated data), these methods will results in incorrect standard errors and type-1 error rates. We looked at linear mixed models last week as a way to handle correlated data for quantitative responses. This week we will discuss how to handle correlated binary and count data. With independent binaray and count data, we have discussed logistic and poisson regression models (generalized linear models). For correlated binary and count data, we have two options
Here we will very briefly discuss these models and learn how to fit them with SAS and R. Let's begin with an example of correlated binary data. This data is from a clinical trial comparing two treatments for a respiratory illness.
For this data, the subjects are independent of each other, but the repeated measures on each subject over the 4 times points are correlated.
Before we go into GEE, note that we can still do a simple analysis using summary measures analysis. Our goal here is to see if the treatment has an effect on respiratory status. Since the respiratory status is binary, we can compare proportions between the two groups by calculating the proportion of positive (response = 1) status visits for each subject as our summary measure. We can then perform a t-test on the two groups with the response as the proportion of good respiratory status visits.
data resptrial;
input id center treat sex age bl v1-v4;
cards;
1 1 1 1 46 0 0 0 0 0
2 1 1 1 28 0 0 0 0 0
3 1 2 1 23 1 1 1 1 1
4 1 1 1 44 1 1 1 1 0
5 1 1 2 13 1 1 1 1 1
6 1 2 1 34 0 0 0 0 0
7 1 1 1 43 0 1 0 1 1
8 1 2 1 28 0 0 0 0 0
9 1 2 1 31 1 1 1 1 1
10 1 1 1 37 1 0 1 1 0
11 1 2 1 30 1 1 1 1 1
12 1 2 1 14 0 1 1 1 0
13 1 1 1 23 1 1 0 0 0
14 1 1 1 30 0 0 0 0 0
15 1 1 1 20 1 1 1 1 1
16 1 2 1 22 0 0 0 0 1
17 1 1 1 25 0 0 0 0 0
18 1 2 2 47 0 0 1 1 1
19 1 1 2 31 0 0 0 0 0
20 1 2 1 20 1 1 0 1 0
21 1 2 1 26 0 1 0 1 0
22 1 2 1 46 1 1 1 1 1
23 1 2 1 32 1 1 1 1 1
24 1 2 1 48 0 1 0 0 0
25 1 1 2 35 0 0 0 0 0
26 1 2 1 26 0 0 0 0 0
27 1 1 1 23 1 1 0 1 1
28 1 1 2 36 0 1 1 0 0
29 1 1 1 19 0 1 1 0 0
30 1 2 1 28 0 0 0 0 0
31 1 1 1 37 0 0 0 0 0
32 1 2 1 23 0 1 1 1 1
33 1 2 1 30 1 1 1 1 0
34 1 1 1 15 0 0 1 1 0
35 1 2 1 26 0 0 0 1 0
36 1 1 2 45 0 0 0 0 0
37 1 2 1 31 0 0 1 0 0
38 1 2 1 50 0 0 0 0 0
39 1 1 1 28 0 0 0 0 0
40 1 1 1 26 0 0 0 0 0
41 1 1 1 14 0 0 0 0 1
42 1 2 1 31 0 0 1 0 0
43 1 1 1 13 1 1 1 1 1
44 1 1 1 27 0 0 0 0 0
45 1 1 1 26 0 1 0 1 1
46 1 1 1 49 0 0 0 0 0
47 1 1 1 63 0 0 0 0 0
48 1 2 1 57 1 1 1 1 1
49 1 1 1 27 1 1 1 1 1
50 1 2 1 22 0 0 1 1 1
51 1 2 1 15 0 0 1 1 1
52 1 1 1 43 0 0 0 1 0
53 1 2 2 32 0 0 0 1 0
54 1 2 1 11 1 1 1 1 0
55 1 1 1 24 1 1 1 1 1
56 1 2 1 25 0 1 1 0 1
57 2 1 2 39 0 0 0 0 0
58 2 2 1 25 0 0 1 1 1
59 2 2 1 58 1 1 1 1 1
60 2 1 2 51 1 1 0 1 1
61 2 1 2 32 1 0 0 1 1
62 2 1 1 45 1 1 0 0 0
63 2 1 2 44 1 1 1 1 1
64 2 1 2 48 0 0 0 0 0
65 2 2 1 26 0 1 1 1 1
66 2 2 1 14 0 1 1 1 1
67 2 1 2 48 0 0 0 0 0
68 2 2 1 13 1 1 1 1 1
69 2 1 1 20 0 1 1 1 1
70 2 2 1 37 1 1 0 0 1
71 2 2 1 25 1 1 1 1 1
72 2 2 1 20 0 0 0 0 0
73 2 1 2 58 0 1 0 0 0
74 2 1 1 38 1 1 0 0 0
75 2 2 1 55 1 1 1 1 1
76 2 2 1 24 1 1 1 1 1
77 2 1 2 36 1 1 0 0 1
78 2 1 1 36 0 1 1 1 1
79 2 2 2 60 1 1 1 1 1
80 2 1 1 15 1 0 0 1 1
81 2 2 1 25 1 1 1 1 0
82 2 2 1 35 1 1 1 1 1
83 2 2 1 19 1 1 0 1 1
84 2 1 2 31 1 1 1 1 1
85 2 2 1 21 1 1 1 1 1
86 2 2 2 37 0 1 1 1 1
87 2 1 1 52 0 1 1 1 1
88 2 2 1 55 0 0 1 1 0
89 2 1 1 19 1 0 0 1 1
90 2 1 1 20 1 0 1 1 1
91 2 1 1 42 1 0 0 0 0
92 2 2 1 41 1 1 1 1 1
93 2 2 1 52 0 0 0 0 0
94 2 1 2 47 0 1 1 0 1
95 2 1 1 11 1 1 1 1 1
96 2 1 1 14 0 0 0 1 0
97 2 1 1 15 1 1 1 1 1
98 2 1 1 66 1 1 1 1 1
99 2 2 1 34 0 1 1 0 1
100 2 1 1 43 0 0 0 0 0
101 2 1 1 33 1 1 1 0 1
102 2 1 1 48 1 1 0 0 0
103 2 2 1 20 0 1 1 1 1
104 2 1 2 39 1 0 1 0 0
105 2 2 1 28 0 1 0 0 0
106 2 1 2 38 0 0 0 0 0
107 2 2 1 43 1 1 1 1 1
108 2 2 2 39 0 1 1 1 1
109 2 2 1 68 0 1 1 1 1
110 2 2 2 63 1 1 1 1 1
111 2 2 1 31 1 1 1 1 1
;
run;
data resp2; set resptrial;
ngood=sum(of v1-v4);
visits=4;
mnstatus=mean(of v1-v4);
arcsin=arsin(mnstatus);
run;
proc ttest data=resp2;
class treat;
var mnstatus arcsin;
run;
The arscin transformation is used when comparing proportions to improve the normal approximation. In either case, with or without the arscin transform we conclude that the treatment did improve respiratory status.
GEE is an example of a marginal (or population averaged) model. Marginal models refer to the fact that the mean response depends only on the covariates (rather than within subject correlations/random effects) and can be viewed as repeated cross-sectional GLM analysis at each repeated measure.
To specify the marginal model, we need three pieces
The form of the variance of the response: for example for logistic regression, the variance of a binary response is Var($\mu)=\phi\mu(1-\mu)$. An additional $\phi$ parameter is usuall included to allow for overdispersion.
A correlation structure for the within subject correlations: we discussed some such as independence, compound symmetry (exchangeable), and AR(1) last time.
This model formulation does not fit into a suitable likelihood, so an interative methods known as GEE is used. GEE is implemented in PROC GENMOD. Before we look at code, here are a few points about GEE
Let's fit a model using GEE with different correlation structures.
*Make data into long form;
data respl; set resptrial;
array vis[4] v1-v4;
do time=1 to 4;
status=vis{time};
output;
end;
drop v1 v2 v3 v4;
run;
proc genmod data=respl desc;
class id;
model status=center treat sex age time bl /d=b;
repeated subject=id / type=ind; *modelse; *gives nonsandwich ests;
estimate 'treatment' treat 1 /exp;
run;
proc genmod data=respl desc;
class id;
model status=center treat sex age time bl /d=b;
repeated subject=id / type=un; *modelse; *option would give nonsandwich ests;
estimate 'treatment' treat 1 /exp;
run;
proc genmod data=respl desc;
class id;
model status=center treat sex age time bl /d=b;
repeated subject=id / type=cs;* modelse; *gives nonsandwich ests;
estimate 'treatment' treat 1 /exp;
run;
proc genmod data=respl descending;
class id;
model status=center treat sex age time bl /d=b; * dist=binomial;
repeated subject=id / type=ar(1); *modelse; *option would give nonsandwich ests;
estimate 'treatment' treat 1 /exp;
run;
With th GLMM approach, we introduce random effects which are allowed to vary from one subject to another. As with linear mixed models, adding random effects to the mean response model to induces correlations, but not in as simple a way since the model is non-linear through the link function. These models are more complicated and sometimes cannot even be fit. We will look at just a simple random intercept model. This model assumes that conditional on the random intercepts, that the data follows a usual logistic regression model. The random intercept is assumed to be normal just as in linear mixed models.
Interpretation of parameters:
proc glimmix data=respl noclprint;
class id;
model status(desc)=center treat sex age time bl
/ d=binary solution ddfm=kr;
random int / subject=id;
estimate 'treatment' treat 1 /exp;
run;