Correlated Data

Subjects in medical/health studies are usually followed ober time (prospective studies). In clinical trials there is often a single outcome measure after a period of time that is then analyzed using techniques we've discussed so far such as linear regression, anova, logistic regresseion etc. We have also examine time to event data (survival analysis data) where we follow patients across time until an event happens or they are censored. Methods for this type of data include the Kaplan-Meier estimatro and the proportional hazards regression model.

All of the methods discussed so far assume that all of the responses are independent. In many applications this is not the case. For example, in longitudinal data an outcome is measured multple times on each patient through time. These responses measured through time on the same subject can no longer be assumed to be independent and different methods are needed for analysis. Longitudinal data is not the only type of data where we need to consider correlations. Consider a multi-site clinical trial. It can be assumed that patients from different sites are indpendent but patients within a site may be correlated due to common measurement procedures, level of care, and other factors. This type of data is uaually referred to as clustered, hierarchical, or multilevel data (The site forms one level and the patients another level nested within each site).

We will focus mainly on longitudinal data (which is interested in change over time) here but these methods can be applied to clustered data as well. We will look at

  • Graphical methods
  • Simple analysis methods through derived variables (summary measures analysis)
  • Linear Mixed Models
  • GEE and Genearlized Mixed Effects Models (GLMM)

Let's consider an example data set from a clinical trila with 40 subjects randomly assigned to two treatment groups and measured on the Brief Psychiatric Rating Scale (BPRS) at baseline and weekly for 8 weeks.

  • BPRS: 18 item measure used to screen for schizophrenia (range is 18-26)
  • First 20 subjects in dataset are Trt 1 and the next 20 are Trt2
In [1]:
data bprs;  
  input id x0-x8;
  grp=1;
  if _n_>20 then grp=2;
  id=100*grp+id;
cards; *datalines;
    1   42  36  36  43  41  40  38  47  51
    2   58  68  61  55  43  34  28  28  28
    3   54  55  41  38  43  28  29  25  24
    4   55  77  49  54  56  50  47  42  46
    5   72  75  72  65  50  39  32  38  32
    6   48  43  41  38  36  29  33  27  25
    7   71  61  47  30  27  40  30  31  31
    8   30  36  38  38  31  26  26  25  24
    9   41  43  39  35  28  22  20  23  21
    10  57  51  51  55  53  43  43  39  32
    11  30  34  34  41  36  36  38  36  36
    12  55  52  49  54  48  43  37  36  31
    13  36  32  36  31  25  25  21  19  22
    14  38  35  36  34  25  27  25  26  26
    15  66  68  65  49  36  32  27  30  37
    16  41  35  45  42  31  31  29  26  30
    17  45  38  46  38  40  33  27  31  27
    18  39  35  27  25  29  28  21  25  20
    19  24  28  31  28  29  21  22  23  22
    20  38  34  27  25  25  27  21  19  21
    1   52  73  42  41  39  38  43  62  50
    2   30  23  32  24  20  20  19  18  20
    3   65  31  33  28  22  25  24  31  32
    4   37  31  27  31  31  26  24  26  23
    5   59  67  58  61  49  38  37  36  35
    6   30  33  37  33  28  26  27  23  21
    7   69  52  41  33  34  37  37  38  35
    8   62  54  49  39  55  51  55  59  66
    9   38  40  38  27  31  24  22  21  21
    10  65  44  31  34  39  34  41  42  39
    11  78  95  75  76  66  64  64  60  75
    12  38  41  36  27  29  27  21  22  23
    13  63  65  60  53  52  32  37  52  28
    14  40  37  31  38  35  30  33  30  27
    15  40  36  55  55  42  30  26  30  37
    16  54  45  35  27  25  22  22  22  22
    17  33  41  30  32  46  43  43  43  43
    18  28  30  29  33  30  26  36  33  30
    19  52  43  26  27  24  32  21  21  21
    20  47  36  32  29  25  23  23  23  23
;
run;

PROC PRINT DATA=BPRS(OBS=5);
RUN;
SAS Connection established. Subprocess id is 4788

Out[1]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.BPRS

Obs id x0 x1 x2 x3 x4 x5 x6 x7 x8 grp
1 101 42 36 36 43 41 40 38 47 51 1
2 102 58 68 61 55 43 34 28 28 28 1
3 103 54 55 41 38 43 28 29 25 24 1
4 104 55 77 49 54 56 50 47 42 46 1
5 105 72 75 72 65 50 39 32 38 32 1

Note that this dataset is in wide form, where we have one row per subject (mutltiple observations on each subject per row). This form is good for calculating summary measures. Long form is more useful for plots and analyses such as mixed models.

In [4]:
*Baseline as covatiate;
data bprslb;
  set bprs;
  array xs x1-x8;
  bprs0=x0;
  do week=1 to 8;
    bprs=xs{week};
    output;
  end;
  keep id grp week bprs0 bprs;
run;

PROC PRINT DATA=bprslb(obs=5);
run;
Out[4]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.BPRSLB

Obs id grp bprs0 week bprs
1 101 1 42 1 36
2 101 1 42 2 36
3 101 1 42 3 43
4 101 1 42 4 41
5 101 1 42 5 40

Graphical Displays for Longitudinal Data

Graphical displays of longitudinal data are always useful for examining patterns and variability features of the data on both an individual level and at a population level (by group). This information may be informative to help determine the details of subsequent formal modeling.

Some simple guidlines given by Diggle et al, 2002 for graphical displays of longitudinal data

  • Show as much of the raw data as possible
  • Highlight aggregate patterns of potential interest
  • Identify both crossectional and longitudinal patterns
  • Try to make identification of unusual individuals or observations simple

One useful plot is called a "spaghetti plot"

In [5]:
proc sgpanel data=bprsl ;
  panelby grp / spacing=10;
  series y=bprs x=week /group=id ;
  Title 'All subjects by group';
run;
Out[5]:
SAS Output

SAS Output

The SGPANEL Procedure

The SGPanel Procedure

The SGPanel Procedure
  • BPRS is decreasing on an individual level for almost everyone over the course of the study
  • Those who start higher (or lower) tend to stay higher (or lower).
  • There are individual differences in change (they dont all share the same slope for example).

To detect unusual observations, it may be helpful to standardize the reponses within each week.

In [6]:
proc sort data=bprsl;
 by week;
run;
proc stdize data=bprsl out=bprslz method=std;
  var bprs;
  by week;
run;
proc sgpanel data=bprslz ;
  panelby grp / spacing=10;
  series y=bprs x=week /group=id ;
  title ‘standardized scores by time and trt’;
run;
Out[6]:
SAS Output

SAS Output

The SGPANEL Procedure

The SGPanel Procedure

The SGPanel Procedure

When looking at a dataset with a large number of observations, much of the information that can be gleaned by looking at the individual responses will be obscured. In this case, it is often useful to only plot some of the data (such as a randomly chosen subset of the data) or to look at the mean plots by trt group.

In [7]:
proc sgplot data=bprsl;
  vline week / response=bprs stat=mean group=grp limitstat=stderr groupdisplay=cluster;
  Title 'Means by week and TRT';
run;
Out[7]:
SAS Output

SAS Output

The SGPLOT Procedure

The SGPlot Procedure

The SGPlot Procedure

This plot suggests little difference between groups with respect to mean BPRS score. Another similar plot, would be to look at side by side boxplots by week.

In [8]:
proc sgplot data=bprsl;
  vbox bprs / category=week group=grp;* nocaps;
  Title 'Boxplots by week and TRT';
run;
Out[8]:
SAS Output

SAS Output

The SGPLOT Procedure

The SGPlot Procedure

The SGPlot Procedure

Summary Measures Analysis

This simple methods of repeated measures analysis uses a summary of the repeated measures for each individual to capture the information about the responses in a single value. Then we have a single measurement for each individual which are independent, so we can then apply standard analyses for independent groups that we have already covered. Which summary statistic to use depends on the question of interest in the study.

Let's try this approach on the BPRS data. Is the overall mean BPRS score different between the two treatments post baseline? We could calculate the mean of the bprs score for each individual post baseline and perform a two sample t-test

In [10]:
data bprsm; *Calculate means using wide dataset;
 set bprs;
 mnbprs=mean(of x1-x8);
run;
proc sgplot data=bprsm;;
  vbox mnbprs / category=grp ;
 Title "boxplots of mean outomes";
run;
proc ttest data=bprsm;  *t-test for mean outcome;
  class grp;
  var mnbprs;
run;
Out[10]:
SAS Output

SAS Output

The SGPLOT Procedure

The SGPlot Procedure

The SGPlot Procedure

The TTEST Procedure

mnbprs


boxplots of mean outomes

The TTEST Procedure

Variable: mnbprs

Statistics

grp N Mean Std Dev Std Err Minimum Maximum
1 20 36.1688 8.3691 1.8714 24.8750 52.6250
2 20 36.5625 12.2090 2.7300 22.0000 71.8750
Diff (1-2)   -0.3937 10.4667 3.3098    

Confidence Limits

grp Method Mean 95% CL Mean Std Dev 95% CL Std Dev
1   36.1688 32.2519 40.0856 8.3691 6.3646 12.2236
2   36.5625 30.8485 42.2765 12.2090 9.2848 17.8321
Diff (1-2) Pooled -0.3937 -7.0942 6.3067 10.4667 8.5538 13.4892
Diff (1-2) Satterthwaite -0.3937 -7.1229 6.3354      

T-Tests

Method Variances DF t Value Pr > |t|
Pooled Equal 38 -0.12 0.9059
Satterthwaite Unequal 33.626 -0.12 0.9060

Equality of Variances

Equality of Variances
Method Num DF Den DF F Value Pr > F
Folded F 19 19 2.13 0.1083

Summary Panel

Summary Panel for mnbprs

Q-Q Plots

Q-Q Plots for mnbprs

It may be better to include the baseline measurement as a coviarate in a regression model becuase these baseline values are often corrlated with the outcome. Including these baseline measurements can increase precision. (This is ANCOVA)

In [11]:
proc means data=bprsm;  *Compare baseline means;
 class grp;
 var x0;
run;

proc glm data=bprsm;  *ANCOVA;
  class grp;
  model mnbprs=x0 grp  /solution;
  title "ANCOVA for mean outcome";
run;
Out[11]:
SAS Output

SAS Output

boxplots of mean outomes

The MEANS Procedure

The MEANS Procedure

Summary statistics

Analysis Variable : x0
grp N Obs N Mean Std Dev Minimum Maximum
1 20 20 47.0000000 13.6034051 24.0000000 72.0000000
2 20 20 49.0000000 14.9243707 28.0000000 78.0000000

ANCOVA for mean outcome

The GLM Procedure

The GLM Procedure

Data

Class Levels

Class Level Information
Class Levels Values
grp 2 1 2

Number of Observations

Number of Observations Read 40
Number of Observations Used 40

ANCOVA for mean outcome

The GLM Procedure

Dependent Variable: mnbprs

Analysis of Variance

mnbprs

Overall ANOVA

Source DF Sum of Squares Mean Square F Value Pr > F
Model 2 1871.515626 935.757813 15.10 <.0001
Error 37 2292.965234 61.972033    
Corrected Total 39 4164.480859      

Fit Statistics

R-Square Coeff Var Root MSE mnbprs Mean
0.449400 21.64745 7.872232 36.36563

Type I Model ANOVA

Source DF Type I SS Mean Square F Value Pr > F
x0 1 1868.066649 1868.066649 30.14 <.0001
grp 1 3.448977 3.448977 0.06 0.8148

Type III Model ANOVA

Source DF Type III SS Mean Square F Value Pr > F
x0 1 1869.965235 1869.965235 30.17 <.0001
grp 1 3.448977 3.448977 0.06 0.8148

Solution

Parameter Estimate   Standard
Error
t Value Pr > |t|
Intercept 12.49017488 B 4.72259392 2.64 0.0119
x0 0.49127194   0.08943408 5.49 <.0001
grp 1 0.58879388 B 2.49583596 0.24 0.8148
grp 2 0.00000000 B . . .

Note:The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

ANCOVA Plot

Analysis of Covariance for mnbprs by x0 categorized by grp

Another question of interest may be in how does the rate of change in the BPRS scores differ between the treatment groups? In this case, we could use the slope from a simple linear regression of the outcome bprs on time for each subject as the summary measure.

In [12]:
proc sort data=bprsl;
by id;
run;

proc reg data=bprsl outest=regco noprint;
 by grp id x0 ;
 model bprs=week;
run;

data reggrp; set regco;
   bprsrate=week;
   keep id grp bprsrate x0;
run;

proc sgplot data=reggrp;
  vbox bprsrate / category=grp ;
 Title ‘boxplots of mean rates’;
run;

proc ttest data=reggrp;  *t-test for reg coef;
  class grp;
  var bprsrate;
run;
Out[12]:
SAS Output

SAS Output

The SGPLOT Procedure

The SGPlot Procedure

The SGPlot Procedure

The TTEST Procedure

bprsrate


‘boxplots of mean rates’

The TTEST Procedure

Variable: bprsrate

Statistics

grp N Mean Std Dev Std Err Minimum Maximum
1 20 -2.6283 1.8740 0.4190 -6.2833 1.1667
2 20 -1.9125 1.5539 0.3475 -4.2333 1.3833
Diff (1-2)   -0.7158 1.7214 0.5444    

Confidence Limits

grp Method Mean 95% CL Mean Std Dev 95% CL Std Dev
1   -2.6283 -3.5054 -1.7513 1.8740 1.4252 2.7372
2   -1.9125 -2.6398 -1.1852 1.5539 1.1817 2.2696
Diff (1-2) Pooled -0.7158 -1.8178 0.3862 1.7214 1.4068 2.2186
Diff (1-2) Satterthwaite -0.7158 -1.8191 0.3874      

T-Tests

Method Variances DF t Value Pr > |t|
Pooled Equal 38 -1.31 0.1964
Satterthwaite Unequal 36.74 -1.31 0.1967

Equality of Variances

Equality of Variances
Method Num DF Den DF F Value Pr > F
Folded F 19 19 1.45 0.4217

Summary Panel

Summary Panel for bprsrate

Q-Q Plots

Q-Q Plots for bprsrate

Again, we could also control for the baseline measurment by performing ANCOVA.

In [13]:
*Ancova on slopes;
proc glm data=reggrp;
  class grp;
  model bprsrate=x0 grp /solution;
  title 'ANCOVA for mean outcome';
run;
Out[13]:
SAS Output

SAS Output

ANCOVA for mean outcome

The GLM Procedure

The GLM Procedure

Data

Class Levels

Class Level Information
Class Levels Values
grp 2 1 2

Number of Observations

Number of Observations Read 40
Number of Observations Used 40

ANCOVA for mean outcome

The GLM Procedure

Dependent Variable: bprsrate

Analysis of Variance

bprsrate

Overall ANOVA

Source DF Sum of Squares Mean Square F Value Pr > F
Model 2 48.2429139 24.1214570 12.84 <.0001
Error 37 69.4884680 1.8780667    
Corrected Total 39 117.7313819      

Fit Statistics

R-Square Coeff Var Root MSE bprsrate Mean
0.409771 -60.36010 1.370426 -2.270417

Type I Model ANOVA

Source DF Type I SS Mean Square F Value Pr > F
x0 1 40.79852340 40.79852340 21.72 <.0001
grp 1 7.44439053 7.44439053 3.96 0.0539

Type III Model ANOVA

Source DF Type III SS Mean Square F Value Pr > F
x0 1 43.11874032 43.11874032 22.96 <.0001
grp 1 7.44439053 7.44439053 3.96 0.0539

Solution

Parameter Estimate   Standard
Error
t Value Pr > |t|
Intercept 1.742894941 B 0.82212572 2.12 0.0408
x0 -0.074599897   0.01556900 -4.79 <.0001
grp 1 -0.865033127 B 0.43448388 -1.99 0.0539
grp 2 0.000000000 B . . .

Note:The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

ANCOVA Plot

Analysis of Covariance for bprsrate by x0 categorized by grp

Mixed Effect Models

With longitudinal/clustered data, we have correlated observations. If we ignore this correlation and fit standard linear regression models/ANOVA that assume independence, then our estimates of the parameter will still be valid (assuming the model is correct), but our inferences will be wrong (the standard errors/p-values will be incorect). Usually the estimation and inference for the regression parameters is of primary interest but there are cases where the correlation itself is important. In either case, we need to choose a model for the covariance structure.

We will fit linear mixed models using PROC MIXED. PROC MIXED has several built in correlation structures to choose from. We will describe the most common ones now:

  • Independence - no correlation is assumed between observations made on the same subject and so assumes all correlations are 0. This is the same as using a standard linear regression model.
$$\left(\begin{array}{ccc} \sigma^2 & 0 & 0 \\ 0 & \sigma^2 & 0 \\ 0 & 0 & \sigma^2\end{array}\right)$$
  • Compound symmetry - Every obervation is equally correlated with every other observation from the same subject. Only one additional parameter to estimate.
$$\left(\begin{array}{ccc} \sigma^2 & \rho & \rho \\ \rho & \sigma^2 & \rho \\ \rho & \rho & \sigma^2\end{array}\right)$$
  • Autoregressive (AR(1)) - assumes observations are related to their own past observations such that obesrvations closer to each other in time are more highly correlated than observations further apart in time. The correlation is assumed to drop off exponentially with time.
$$\left(\begin{array}{ccc} \sigma^2 & \rho & \rho^2 \\ \rho & \sigma^2 & \rho \\ \rho^2 & \rho & \sigma^2\end{array}\right)$$
  • Unstructured - no restrictions on the structure of the correlation, so each correlation gets its own parameter. This adds a lot of parameters and requires a lot of data to reliably estimate all parameters, but can match any correlation structure.
$$\left(\begin{array}{ccc} \sigma^2 & \rho_{12} & \rho_{13} \\ \rho_{21} & \sigma^2 & \rho_{23} \\ \rho_{31} & \rho_{32} & \sigma^2\end{array}\right)$$

PROC MIXED can be a rather complicated procedure and to fully discuss how the options work would require looking at the matrix form of the general linear mixed effects model. We will not persue that discussion here. We will instead look at a few examples of using proc mixed and try to heuristically describe the options for a given model.

First, let's discuss a simple linear regression model with possible random coefficients.

$$y_{i,j}=\beta_0 + \beta_1t_j + u_{i0} + u_{i1}*t_j\varepsilon_{i,j}$$

The fixed effects are $\beta_0$ and $\beta_1$ and the random effects are $u_{i0}$ and $u_{i1}$. You can think of this model as describing the population mean regression line through the betas and inddividuals follow this general shape but vary around this line with the random error for the idividual described by the random effects $u_{i0}$ and $u_{i1}$.

Unfortunately, we do not have enough time to discuss how to derive a model from a given dataset. We will only discuss how to fit a given model.

For out first example, lets consider the following dental dataset. This study followed 16 boys and 11 girls for dental visits at the ages of 8,10,12, and 14 where the distance between the center of the pituiatry gland to the pteryomaxillary fissure was measured in mm. The goals of the study were to describe the distance in boys and girls as simple functions of age, and then to compare the growth functions for boys and girls.

In [16]:
TITLE ;
Data dental; *Inputted as “Wide Form”;
input ID gender $ D8 D10 D12 D14;
datalines;
 1  F 21.0    20.0    21.5    23.0
 2  F 21.0    21.5    24.0    25.5
 3  F 20.5    24.0    24.5    26.0
 4  F 23.5    24.5    25.0    26.5
 5  F 21.5    23.0    22.5    23.5
 6  F 20.0    21.0    21.0    22.5
 7  F 21.5    22.5    23.0    25.0
 8  F 23.0    23.0    23.5    24.0
 9  F 20.0    21.0    22.0    21.5
10  F 16.5    19.0    19.0    19.5
11  F 24.5    25.0    28.0    28.0
12  M 26.0    25.0    29.0    31.0
13  M 21.5    22.5    23.0    26.5
14  M 23.0    22.5    24.0    27.5
15  M 25.5    27.5    26.5    27.0
16  M 20.0    23.5    22.5    26.0
17  M 24.5    25.5    27.0    28.5
18  M 22.0    22.0    24.5    26.5
19  M 24.0    21.5    24.5    25.5
20  M 23.0    20.5    31.0    26.0
21  M 27.5    28.0    31.0    31.5
22  M 23.0    23.0    23.5    25.0
23  M 21.5    23.5    24.0    28.0
24  M 17.0    24.5    26.0    29.5
25  M 22.5    25.5    25.5    26.0
26  M 23.0    24.5    26.0    30.0
27  M 22.0    21.5    23.5    25.0
;
RUN;

data DentL;
  set dental;
  array DR[4] d8 d10 d12 d14;
  do age=8 to 14 by 2;
    Dist=DR{(Age-6)/2};
    output;
  end;
  keep id gender age dist;
run;

PROC PRINT DATA=DENTL (obs=20) NOOBS;
RUN;
Out[16]:
SAS Output

SAS Output

The PRINT Procedure

Data Set WORK.DENTL

ID gender age Dist
1 F 8 21.0
1 F 10 20.0
1 F 12 21.5
1 F 14 23.0
2 F 8 21.0
2 F 10 21.5
2 F 12 24.0
2 F 14 25.5
3 F 8 20.5
3 F 10 24.0
3 F 12 24.5
3 F 14 26.0
4 F 8 23.5
4 F 10 24.5
4 F 12 25.0
4 F 14 26.5
5 F 8 21.5
5 F 10 23.0
5 F 12 22.5
5 F 14 23.5
In [17]:
proc sgpanel data=DentL noautolegend;
  panelby gender / spacing=10;
  series y=Dist x=age /group=ID ;
  Title 'All subjects by group';
run;

*Means plot;
proc sgplot data=dentl;
  vline age /response=dist stat=mean 
         group=gender limitstat=stderr;  
run;
Out[17]:
SAS Output

SAS Output

The SGPANEL Procedure

The SGPanel Procedure

The SGPanel Procedure

The SGPLOT Procedure

The SGPlot Procedure

The SGPlot Procedure
In [21]:
proc mixed data=dentl method=REML covtest;
Class gender;
model dist=age gender age*gender/solution ddfm=kr outp=pred outpm=outm residual   ;
random Intercept /subject=id type=un solution G;
ods output solutionr=t;
run;
Out[21]:
SAS Output

SAS Output

All subjects by group

The Mixed Procedure

The MIXED Procedure

Model Information

Model Information
Data Set WORK.DENTL
Dependent Variable Dist
Covariance Structure Unstructured
Subject Effect ID
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Kenward-Roger
Degrees of Freedom Method Kenward-Roger

Class Level Information

Class Level Information
Class Levels Values
gender 2 F M

Dimensions

Dimensions
Covariance Parameters 2
Columns in X 6
Columns in Z per Subject 1
Subjects 27
Max Obs per Subject 4

Number of Observations

Number of Observations
Number of Observations Read 108
Number of Observations Used 108
Number of Observations Not Used 0

Iteration History

Iteration History
Iteration Evaluations -2 Res Log Like Criterion
0 1 483.55911746  
1 1 433.75724920 0.00000000

Convergence Status

Convergence criteria met.

Estimated G Matrix

Estimated G Matrix
Row Effect Subject Col1
1 Intercept 1 3.2986

Covariance Parameter Estimates

Covariance Parameter Estimates
Cov Parm Subject Estimate Standard
Error
Z Value Pr > Z
UN(1,1) ID 3.2986 1.0716 3.08 0.0010
Residual   1.9221 0.3058 6.28 <.0001

Fit Statistics

Fit Statistics
-2 Res Log Likelihood 433.8
AIC (Smaller is Better) 437.8
AICC (Smaller is Better) 437.9
BIC (Smaller is Better) 440.3

Null Model Likelihood Ratio Test

Null Model Likelihood Ratio Test
DF Chi-Square Pr > ChiSq
1 49.80 <.0001

Solution for Fixed Effects

Solution for Fixed Effects
Effect gender Estimate Standard
Error
DF t Value Pr > |t|
Intercept   16.3406 0.9813 104 16.65 <.0001
age   0.7844 0.07750 79 10.12 <.0001
gender F 1.0321 1.5374 104 0.67 0.5035
gender M 0 . . . .
age*gender F -0.3048 0.1214 79 -2.51 0.0141
age*gender M 0 . . . .

Solution for Random Effects

Solution for Random Effects
Effect Subject Estimate Std Err Pred DF t Value Pr > |t|
Intercept 1 -1.1109 0.8324 62.9 -1.33 0.1868
Intercept 2 0.3075 0.8324 62.9 0.37 0.7131
Intercept 3 0.9621 0.8324 62.9 1.16 0.2521
Intercept 4 1.9441 0.8324 62.9 2.34 0.0227
Intercept 5 -0.01984 0.8324 62.9 -0.02 0.9811
Intercept 6 -1.3291 0.8324 62.9 -1.60 0.1153
Intercept 7 0.3075 0.8324 62.9 0.37 0.7131
Intercept 8 0.6348 0.8324 62.9 0.76 0.4485
Intercept 9 -1.3291 0.8324 62.9 -1.60 0.1153
Intercept 10 -3.6203 0.8324 62.9 -4.35 <.0001
Intercept 11 3.2534 0.8324 62.9 3.91 0.0002
Intercept 12 2.4276 0.7819 77.6 3.10 0.0027
Intercept 13 -1.3911 0.7819 77.6 -1.78 0.0791
Intercept 14 -0.6274 0.7819 77.6 -0.80 0.4248
Intercept 15 1.4457 0.7819 77.6 1.85 0.0683
Intercept 16 -1.7184 0.7819 77.6 -2.20 0.0310
Intercept 17 1.2274 0.7819 77.6 1.57 0.1205
Intercept 18 -1.0638 0.7819 77.6 -1.36 0.1776
Intercept 19 -0.9547 0.7819 77.6 -1.22 0.2258
Intercept 20 0.1364 0.7819 77.6 0.17 0.8620
Intercept 21 3.9551 0.7819 77.6 5.06 <.0001
Intercept 22 -1.1729 0.7819 77.6 -1.50 0.1377
Intercept 23 -0.6274 0.7819 77.6 -0.80 0.4248
Intercept 24 -0.6274 0.7819 77.6 -0.80 0.4248
Intercept 25 -0.08183 0.7819 77.6 -0.10 0.9169
Intercept 26 0.7910 0.7819 77.6 1.01 0.3149
Intercept 27 -1.7184 0.7819 77.6 -2.20 0.0310

Type 3 Tests of Fixed Effects

Type 3 Tests of Fixed Effects
Effect Num DF Den DF F Value Pr > F
age 1 79 108.36 <.0001
gender 1 104 0.45 0.5035
age*gender 1 79 6.30 0.0141

Marginal Residual Plots

Residuals

Panel of marginal residuals for Dist. The panel consists of a scatterplot of the residuals, a histogram with normal density, a Q-Q plot, and summary statistics for the residuals and the model fit.

Studentized Residuals

Panel of marginal studentized residuals for Dist. The panel consists of a scatterplot of the residuals, a histogram with normal density, a Q-Q plot, and summary statistics for the residuals and the model fit.

Pearson Residuals

Panel of marginal Pearson residuals for Dist. The panel consists of a scatterplot of the residuals, a histogram with normal density, a Q-Q plot, and summary statistics for the residuals and the model fit.

Conditional Residual Plots

Residuals

Panel of conditional residuals for Dist. The panel consists of a scatterplot of the residuals, a histogram with normal density, a Q-Q plot, and summary statistics for the residuals and the model fit.

Studentized Residuals

Panel of conditional studentized residuals for Dist. The panel consists of a scatterplot of the residuals, a histogram with normal density, a Q-Q plot, and summary statistics for the residuals and the model fit.

Pearson Residuals

Panel of conditional Pearson residuals for Dist. The panel consists of a scatterplot of the residuals, a histogram with normal density, a Q-Q plot, and summary statistics for the residuals and the model fit.
In [22]:
proc sgpanel data=pred noautolegend;
  panelby gender / spacing=10;
  series y=pred x=age /group=ID ;
  Title 'Individual Predictions by Gender';
run;

proc sgplot data=outm noautolegend;
  series y=pred x=age /group=ID ;
  Title 'Means by Gender';
run;
Out[22]:
SAS Output

SAS Output

The SGPANEL Procedure

The SGPanel Procedure

The SGPanel Procedure

The SGPLOT Procedure

The SGPlot Procedure

The SGPlot Procedure
In [23]:
/* Equivalent model using repeated instead of random*/
proc mixed data=dentl method=REML;
Class gender;
model dist=age gender age*gender/solution ddfm=kr;
repeated  /subject=ID type=CS r rcorr;
run;
Out[23]:
SAS Output

SAS Output

Means by Gender

The Mixed Procedure

The MIXED Procedure

Model Information

Model Information
Data Set WORK.DENTL
Dependent Variable Dist
Covariance Structure Compound Symmetry
Subject Effect ID
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Kenward-Roger
Degrees of Freedom Method Kenward-Roger

Class Level Information

Class Level Information
Class Levels Values
gender 2 F M

Dimensions

Dimensions
Covariance Parameters 2
Columns in X 6
Columns in Z 0
Subjects 27
Max Obs per Subject 4

Number of Observations

Number of Observations
Number of Observations Read 108
Number of Observations Used 108
Number of Observations Not Used 0

Iteration History

Iteration History
Iteration Evaluations -2 Res Log Like Criterion
0 1 483.55911746  
1 1 433.75724920 0.00000000

Convergence Status

Convergence criteria met.

Estimated R Matrix for Subject 1

Estimated R Matrix for Subject 1
Row Col1 Col2 Col3 Col4
1 5.2207 3.2986 3.2986 3.2986
2 3.2986 5.2207 3.2986 3.2986
3 3.2986 3.2986 5.2207 3.2986
4 3.2986 3.2986 3.2986 5.2207

Estimated R Correlation Matrix for Subject 1

Estimated R Correlation Matrix for Subject 1
Row Col1 Col2 Col3 Col4
1 1.0000 0.6318 0.6318 0.6318
2 0.6318 1.0000 0.6318 0.6318
3 0.6318 0.6318 1.0000 0.6318
4 0.6318 0.6318 0.6318 1.0000

Covariance Parameter Estimates

Covariance Parameter Estimates
Cov Parm Subject Estimate
CS ID 3.2986
Residual   1.9221

Fit Statistics

Fit Statistics
-2 Res Log Likelihood 433.8
AIC (Smaller is Better) 437.8
AICC (Smaller is Better) 437.9
BIC (Smaller is Better) 440.3

Null Model Likelihood Ratio Test

Null Model Likelihood Ratio Test
DF Chi-Square Pr > ChiSq
1 49.80 <.0001

Solution for Fixed Effects

Solution for Fixed Effects
Effect gender Estimate Standard
Error
DF t Value Pr > |t|
Intercept   16.3406 0.9813 104 16.65 <.0001
age   0.7844 0.07750 79 10.12 <.0001
gender F 1.0321 1.5374 104 0.67 0.5035
gender M 0 . . . .
age*gender F -0.3048 0.1214 79 -2.51 0.0141
age*gender M 0 . . . .

Type 3 Tests of Fixed Effects

Type 3 Tests of Fixed Effects
Effect Num DF Den DF F Value Pr > F
age 1 79 108.36 <.0001
gender 1 104 0.45 0.5035
age*gender 1 79 6.30 0.0141

Next, we will add in a random intercept to the model.

In [24]:
proc mixed data=dentl method=REML covtest;
Class gender;
model dist=age gender age*gender /solution ddfm=kr outp=predIS residual;
random Intercept age /subject=id type=un ;
run;
Out[24]:
SAS Output

SAS Output

Means by Gender

The Mixed Procedure

The MIXED Procedure

Model Information

Model Information
Data Set WORK.DENTL
Dependent Variable Dist
Covariance Structure Unstructured
Subject Effect ID
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Kenward-Roger
Degrees of Freedom Method Kenward-Roger

Class Level Information

Class Level Information
Class Levels Values
gender 2 F M

Dimensions

Dimensions
Covariance Parameters 4
Columns in X 6
Columns in Z per Subject 2
Subjects 27
Max Obs per Subject 4

Number of Observations

Number of Observations
Number of Observations Read 108
Number of Observations Used 108
Number of Observations Not Used 0

Iteration History

Iteration History
Iteration Evaluations -2 Res Log Like Criterion
0 1 483.55911746  
1 1 432.58166150 0.00000000

Convergence Status

Convergence criteria met.

Covariance Parameter Estimates

Covariance Parameter Estimates
Cov Parm Subject Estimate Standard
Error
Z Value Pr Z
UN(1,1) ID 5.7864 5.1352 1.13 0.1299
UN(2,1) ID -0.2896 0.4152 -0.70 0.4855
UN(2,2) ID 0.03252 0.03732 0.87 0.1918
Residual   1.7162 0.3303 5.20 <.0001

Fit Statistics

Fit Statistics
-2 Res Log Likelihood 432.6
AIC (Smaller is Better) 440.6
AICC (Smaller is Better) 441.0
BIC (Smaller is Better) 445.8

Null Model Likelihood Ratio Test

Null Model Likelihood Ratio Test
DF Chi-Square Pr > ChiSq
3 50.98 <.0001

Solution for Fixed Effects

Solution for Fixed Effects
Effect gender Estimate Standard
Error
DF t Value Pr > |t|
Intercept   16.3406 1.0185 25 16.04 <.0001
age   0.7844 0.08600 25 9.12 <.0001
gender F 1.0321 1.5957 25 0.65 0.5237
gender M 0 . . . .
age*gender F -0.3048 0.1347 25 -2.26 0.0326
age*gender M 0 . . . .

Type 3 Tests of Fixed Effects

Type 3 Tests of Fixed Effects
Effect Num DF Den DF F Value Pr > F
age 1 25 88.00 <.0001
gender 1 25 0.42 0.5237
age*gender 1 25 5.12 0.0326

Marginal Residual Plots

Residuals

Panel of marginal residuals for Dist. The panel consists of a scatterplot of the residuals, a histogram with normal density, a Q-Q plot, and summary statistics for the residuals and the model fit.

Studentized Residuals

Panel of marginal studentized residuals for Dist. The panel consists of a scatterplot of the residuals, a histogram with normal density, a Q-Q plot, and summary statistics for the residuals and the model fit.

Pearson Residuals

Panel of marginal Pearson residuals for Dist. The panel consists of a scatterplot of the residuals, a histogram with normal density, a Q-Q plot, and summary statistics for the residuals and the model fit.

Conditional Residual Plots

Residuals

Panel of conditional residuals for Dist. The panel consists of a scatterplot of the residuals, a histogram with normal density, a Q-Q plot, and summary statistics for the residuals and the model fit.

Studentized Residuals

Panel of conditional studentized residuals for Dist. The panel consists of a scatterplot of the residuals, a histogram with normal density, a Q-Q plot, and summary statistics for the residuals and the model fit.

Pearson Residuals

Panel of conditional Pearson residuals for Dist. The panel consists of a scatterplot of the residuals, a histogram with normal density, a Q-Q plot, and summary statistics for the residuals and the model fit.
In [25]:
proc sgpanel data=predIS noautolegend;
  panelby gender / spacing=10;
  series y=pred x=age /group=ID ;
  Title 'Individual Predictions by Gender';
run;
Out[25]:
SAS Output

SAS Output

The SGPANEL Procedure

The SGPanel Procedure

The SGPanel Procedure
In [26]:
*using AR(1) form;
proc mixed data=dentl method=REML;
Class gender;
model dist=age gender age*gender/solution ddfm=kr;
Random intercept /subject=id type=un;
repeated  /subject=ID type=AR(1) r rcorr;
run;
Out[26]:
SAS Output

SAS Output

Individual Predictions by Gender

The Mixed Procedure

The MIXED Procedure

Model Information

Model Information
Data Set WORK.DENTL
Dependent Variable Dist
Covariance Structures Unstructured, Autoregressive
Subject Effects ID, ID
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Kenward-Roger
Degrees of Freedom Method Kenward-Roger

Class Level Information

Class Level Information
Class Levels Values
gender 2 F M

Dimensions

Dimensions
Covariance Parameters 3
Columns in X 6
Columns in Z per Subject 1
Subjects 27
Max Obs per Subject 4

Number of Observations

Number of Observations
Number of Observations Read 108
Number of Observations Used 108
Number of Observations Not Used 0

Iteration History

Iteration History
Iteration Evaluations -2 Res Log Like Criterion
0 1 483.55911746  
1 2 433.70815006 0.00000031
2 1 433.70811232 0.00000000

Convergence Status

Convergence criteria met.

Estimated R Matrix for Subject 1

Estimated R Matrix for Subject 1
Row Col1 Col2 Col3 Col4
1 1.8854 -0.07077 0.002656 -0.00010
2 -0.07077 1.8854 -0.07077 0.002656
3 0.002656 -0.07077 1.8854 -0.07077
4 -0.00010 0.002656 -0.07077 1.8854

Estimated R Correlation Matrix for Subject 1

Estimated R Correlation Matrix for Subject 1
Row Col1 Col2 Col3 Col4
1 1.0000 -0.03753 0.001409 -0.00005
2 -0.03753 1.0000 -0.03753 0.001409
3 0.001409 -0.03753 1.0000 -0.03753
4 -0.00005 0.001409 -0.03753 1.0000

Covariance Parameter Estimates

Covariance Parameter Estimates
Cov Parm Subject Estimate
UN(1,1) ID 3.3355
AR(1) ID -0.03753
Residual   1.8854

Fit Statistics

Fit Statistics
-2 Res Log Likelihood 433.7
AIC (Smaller is Better) 439.7
AICC (Smaller is Better) 439.9
BIC (Smaller is Better) 443.6

Null Model Likelihood Ratio Test

Null Model Likelihood Ratio Test
DF Chi-Square Pr > ChiSq
2 49.85 <.0001

Solution for Fixed Effects

Solution for Fixed Effects
Effect gender Estimate Standard
Error
DF t Value Pr > |t|
Intercept   16.3252 0.9747 55.7 16.75 <.0001
age   0.7854 0.07686 36.1 10.22 <.0001
gender F 1.0510 1.5271 55.7 0.69 0.4942
gender M 0 . . . .
age*gender F -0.3062 0.1204 36.1 -2.54 0.0154
age*gender M 0 . . . .

Type 3 Tests of Fixed Effects

Type 3 Tests of Fixed Effects
Effect Num DF Den DF F Value Pr > F
age 1 36.1 110.31 <.0001
gender 1 55.7 0.47 0.4942
age*gender 1 36.1 6.47 0.0154
In [ ]: