Linear mixed models
Stata’s mixed-models estimation makes it easy to specify and to fit
multilevel and hierarchical random-effects models.
To fit a model of SAT scores with fixed coefficient on x1 and random
coefficient on x2 at the school level and with random intercepts at both
the school and class-within-school level, you type
. xtmixed sat x1 x2 || school: x2 || class:
Estimation is either by ML or by REML, and standard errors and confidence
intervals (in fact, the full covariance matrix) are estimated for all
variance components.
xtmixed provides four random-effects variance
structures—identity, independent, exchangeable, and
unstructured—and you can combine them to form even more complex
block-diagonal structures.
After estimation, you can obtain best linear unbiased predictions (BLUPs) of
random effects, their standard errors, their conditional means
(fitted values), and other diagnostic measures.
For details, see
[XT] xtmixed.
After estimation with xtmixed,
- estat recovariance reports the estimated
variance–covariance matrix of the random effects for each level.
- estat group summarizes the composition of the nested
groups, providing minimum, average, and maximum group size for each
level in the model.
predict after xtmixed can compute best linear unbiased
predictions (BLUPs) for each random effect and its standard error.
It can also compute the linear predictor, the standard error of the
linear predictor, the fitted values (linear predictor plus
contributions of random effects), the residuals, and the standardized
residuals.
Examples
Here are some examples from the xtmixed manual entry.
Example 1: Two-level model, random intercept
Consider a longitudinal dataset used by both Ruppert, Wand, and Carroll
(2003) and Diggle et al.
(2002), consisting of weight measurements of 48
pigs on nine successive weeks. Pigs are identified by variable id.
Below is a plot of the growth curves for the first ten pigs.
It seems clear that each pig experiences a linear trend in growth and that
overall weight measurements vary from pig to pig. Because we are not really
interested in these particular 48 pigs per se, we instead treat them
as a random sample from a larger population and model the between-pig
variability as a random effect or, equivalently, as a random-intercept term
at the pig level. We thus wish to fit the following model
weight_ij = b_0 + b_1 week_ij + u_j + e_ij (3)
for i = 1,...,9 and j = 1,...,48. The fixed portion of the
model, b_0 + b_1*week_ij, simply states that we want
one overall regression line representing the population average. The random
effect u_j serves to shift this regression line up or down for each
pig. Because the random effects occur at the pig-level, we fit the model by
typing
At this point, a guided tour of the model specification and output is in
order:
- By typing weight week we specified the response,
weight, and the fixed portion of the model in the same way we
would if we were using regress or any other estimation command.
Our fixed effects are a coefficient on week and a constant term.
- When we added “|| id:”, we specified random effects
at the level identified by group variable id, i.e., the
pig-level. Because we only wanted a random intercept, that is all we had
to type.
- The output title informs us that our model was fitted using ML, the
default. For REML estimates, use option reml. Because this model is
a simple random-intercept model, our results are
equivalent to using xtreg, also with option mle.
- The first estimation table is for the fixed effects. We estimate
b_0 = 19.36 and b_1 = 6.21.
- The second estimation table shows the estimated variance components.
The first section of the table is labeled “id: Identity”,
meaning that these are random effects at the id (pig) level
and that their variance–covariance matrix is a multiple of the
identity matrix.
Because we only have one random effect at this level, xtmixed knew
that Identity is the only possible covariance structure.
In any case, the standard deviation of the level-two errors,
sigma_u, is estimated as 3.85 with standard error 0.406.
If you prefer variance estimates, e.g., sigma^2_u, to standard
deviation estimates, sigma_u, then specify the
variance option either at estimation or on replay.
- The row labeled “sd(Residual)” displays the estimated
standard deviation of the overall error term, i.e.,
sigma_e = 2.09.
- Finally, a likelihood-ratio test comparing the model to ordinary linear
regression is provided and is highly significant for these data.
Example 2: Two-level model, random coefficient
Extending (3) to also allow for a random slope on week yields the
model
weight_ij = b_0 + b_1 week_ij + u_oj + u_1j week_ij + e_ij (4)
fitted using xtmixed.
Because we did not specify a covariance structure for the random effects
(u_0j, u_1j), xtmixed used the default Independent
structure and estimated sigma_u0 = 2.60 and sigma_u1 = 0.61.
Our point estimates of the fixed effects are, for all intents and purposes,
identical, but this does not hold generally. Given the 95% confidence
interval for sigma_u1, it would seem that the random slope is
significant, and we can use lrtest to verify this fact:
Example 3: Three-level model
Baltagi, Song, and Jung (2001) estimate a
Cobb–Douglas production function examining the productivity of public
capital in each state’s private output. Originally provided by
Munnell (1990), the data were recorded over the
period 1970–1986 for 48 states grouped into nine regions.
Because the states are nested within regions, we fit a two-level mixed model
with random intercepts at both the region and at the state-within-region
levels.
Some items of note:
- This is a three-level model, with the observations level one,
the states level two, and the regions level three.
- Our model now has two random-effects equations, separated by ||.
The first is a random intercept (constant-only) at the region
level, the second a random intercept at the state level. The
order in which these are specified (from left to right) is
significant—xtmixed assumes that state is nested
within region.
- The variance component estimates are now organized and labeled according
to level. After adjusting for the nested-level error structure, we
find that the highway and water components of public capital had
significant positive effects upon private output while the other
public buildings component had a negative effect.
xtmixed also allows you to model the correlation structure of the
level one residuals. Pierson and Ginther (1987)
analyzed data from the estrus cycles of mares, whose number of ovarian
follicles larger than 10mm was measured daily. Because of the daily nature
of the data, an autoregressive residual-error structure could be
appropriate:
Besides autoregressive (AR), other available residual structures are moving
average (MA), unstructured, banded, exponential, exchangeable, and Toeplitz.
Finally, xtmixed can also be used with survey data; see
Multilevel
models with survey data for more information.
References
- Baltagi, B. H., S. H. Song, and B. C. Jung. 2001.
- Journal of Econometrics 101: 357–381.
- Diggle, P. J., P. Heagerty, K.-Y. Liang, and S. L. Zeger. 2002.
- Analysis of Longitudinal Data. 2nd edition.
Oxford: Oxford University Press.
- Munnell, A. 1990.
- Why has productivity growth declined? Productivity
and public investment.
- New England Economic Review. Jan. / Feb.:3–22.
- Pierson, R. A., and O. J. Ginther. 1987.
- Follicular population dynamics during the estrous cycle of the mare.
Animal Reproduction Science 14: 219–231.
- Ruppert, D., M. P. Wand, and R. J. Carroll. 2003.
- Semiparametric Regression. Cambridge: Cambridge University Press.
|
Stata 12
Overview: Why use Stata?
Stata/MP
Capabilities
New in Stata 12
Supported platforms
Which Stata?
Technical support
User comments
|