Mixed Effect Models¶
With longitudinal/clustered data, we have correlated observations. If we ignore this correlation and fit standard linear regression models/ANOVA that assume independence, then our estimates of the parameter will still be valid (assuming the model is correct), but our inferences will be wrong (the standard errors/p-values will be incorect). Usually the estimation and inference for the regression parameters is of primary interest but there are cases where the correlation itself is important. In either case, we need to choose a model for the covariance structure.
We will fit linear mixed models using PROC MIXED. PROC MIXED has several built in correlation structures to choose from. We will describe the most common ones now:
- Independence - no correlation is assumed between observations made on the same subject and so assumes all correlations are 0. This is the same as using a standard linear regression model.
$$\left(\begin{array}{ccc} \sigma^2 & 0 & 0 \\ 0 & \sigma^2 & 0 \\ 0 & 0 & \sigma^2\end{array}\right)$$
- Compound symmetry - Every obervation is equally correlated with every other observation from the same subject. Only one additional parameter to estimate.
$$\left(\begin{array}{ccc} \sigma^2 & \rho & \rho \\ \rho & \sigma^2 & \rho \\ \rho & \rho & \sigma^2\end{array}\right)$$
- Autoregressive (AR(1)) - assumes observations are related to their own past observations such that obesrvations closer to each other in time are more highly correlated than observations further apart in time. The correlation is assumed to drop off exponentially with time.
$$\left(\begin{array}{ccc} \sigma^2 & \rho & \rho^2 \\ \rho & \sigma^2 & \rho \\ \rho^2 & \rho & \sigma^2\end{array}\right)$$
- Unstructured - no restrictions on the structure of the correlation, so each correlation gets its own parameter. This adds a lot of parameters and requires a lot of data to reliably estimate all parameters, but can match any correlation structure.
$$\left(\begin{array}{ccc} \sigma^2 & \rho_{12} & \rho_{13} \\ \rho_{21} & \sigma^2 & \rho_{23} \\ \rho_{31} & \rho_{32} & \sigma^2\end{array}\right)$$
PROC MIXED can be a rather complicated procedure and to fully discuss how the options work would require looking at the matrix form of the general linear mixed effects model. We will not persue that discussion here. We will instead look at a few examples of using proc mixed and try to heuristically describe the options for a given model.
First, let's discuss a simple linear regression model with possible random coefficients.
$$y_{i,j}=\beta_0 + \beta_1t_j + u_{i0} + u_{i1}*t_j\varepsilon_{i,j}$$
The fixed effects are $\beta_0$ and $\beta_1$ and the random effects are $u_{i0}$ and $u_{i1}$. You can think of this model as describing the population mean regression line through the betas and inddividuals follow this general shape but vary around this line with the random error for the idividual described by the random effects $u_{i0}$ and $u_{i1}$.
Unfortunately, we do not have enough time to discuss how to derive a model from a given dataset. We will only discuss how to fit a given model.
For out first example, lets consider the following dental dataset. This study followed 16 boys and 11 girls for dental visits at the ages of 8,10,12, and 14 where the distance between the center of the pituiatry gland to the pteryomaxillary fissure was measured in mm. The goals of the study were to describe the distance in boys and girls as simple functions of age, and then to compare the growth functions for boys and girls.