 Generalized Linear Latent and Mixed Models

 Speakers Sophia Rabe-Hesketh, Andrew Pickles and Colin Taylor

We describe a Stata program called gllamm that can fit a large number of generalised linear latent and mixed models. These models are extensions of the random intercept models that may be estimated in Stata 6 using xtreg, xtlogit, xtpois etc. for cross-sectional time-series or other clustered data.

All these models include a random intercept for clusters in the linear predictor. If there is a single explanatory variable x, the linear predictor is given by

ηij01xij+ui

where the index ij refers to the ith “level 1” unit clustered within the jth “level 2” unit, e.g. observation times within subjects or pupils within schools, and the random effect ui is usually assumed to have a normal distribution with mean zero.

The program gllamm allows any combination of five basic extensions to the random intercept model: (1) discrete random effects distributions, (2) multi-level models, (3) random coefficients (4) factor loadings and (5) mixed responses.

Similarly to the xt programs, gllamm requires the data to be in “long” form with all responses stacked into a single variable and the cluster index (or indices) stored in separate variable(s).

The program simply uses Stata’s ml commands with method deriv0 to maximise the likelihood which is evaluated by numerical integration. The five extensions to the random intercept model are described below using the examples that are also used in the talk.

1. Discrete random effects distributions: We may wish to assume that there are several latent classes or groups of subjects where each group is homogeneous in its random effect.
2. Multi-level models: The level 2 units may be clustered within level 3 units. For example, there may be multiple observations (k) per person (j) clustered in families (i). The linear predictor now includes two random effects, one for families (level 3) and one for subjects (level 2):
ηijk01xijk+ui(3)+uj(2).
3. Random coefficients: The coefficient of an explanatory variable may differ between level 2 units. For example, the effect of pupil's (j) maths results in year 3 on maths results in year 5 may differ between schools (i). The linear predictor now has a random intercept ui(0) and a random slope ui(1) where the two random effects may be correlated,
ηij0+ ui(0)+1+ ui(1))math3ij.
4. Factor loadings: Several variables may load on a latent variable. For example, on a psychometric test, the results on the items (j) for each subject (i) could be modelled as a 2-parameter Rasch model
ηijj+ uiλj.
Here the random effect ui is a measure of the subject’s ability, -βj is a measure of the difficulty of item j and λj is a “factor loading” representing the effect of the subject’s ability on their performance on item j. A separate loading is estimated for each item.
5. Mixed responses: If the items or variables are of different types, e.g. continuous and dichotomous, then different generalised linear models (families and links) need to be specified for different observations'. An example of this is logistic regression with a continuous explanatory or exposure' variable which is subject to measurement error. We need to model the measured exposure and dichotomous outcome simultaneously using
μij0+ ui
and

logit(πij)=β1ui

respectively.