Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: Multilevel modelling: questions about independence, residuals, balance

From (Roberto G. Gutierrez, StataCorp)
Subject   st: Re: Multilevel modelling: questions about independence, residuals, balance
Date   Tue, 01 Feb 2011 17:06:45 -0600

Owen Corrigan <> asks:

> I have some general questions about Multilevel Modeling (MLM). I've been
> reading the Stata Press book by Rabe-Hesketh and Skrondal (2008, 2nd edn.)
> but there are some specifics I'm unsure about. The dataset consists of
> individual person-level (level-1) observations (N = 23759) over 14 level-2
> units (units are countries), but the level-1 data is spread very unevenly
> (unbalanced) over the level-2 units, with the min.  observations per group =
> 300, and the max obs per group = 3000 approx.  The dataset is actually two
> merged datasets from 2006 and 2007 (independent observations though,
> non-longitudinal). I have covariates at both levels; the point of the entire
> exercise is really to test whether a hypothesised level-2 covariate is
> significant.

> 1. Diagnostics: my level-2 residuals --predict varname, reffects-- after
> --xtmixed-- are non-normal. Does the violation of this assumption mean that
> I cannot be confident in the standard errors for my key level-2 variable
> (which proved signficant, incidentally). I have read (Maas & Hox 2004) that
> this only matters for the standard errors of the random effects; but this is
> of no interest to me and I am only concerned with the beta/standard error
> for this one level-2 variable.

I would tend to agree.  Main regression parameters (the betas) and their
standard errors are very robust to departures from normality for the
random effects.  Estimates of the variance components and random-effect 
standard errors, not as much.

> --->QUESTION: If this variable is significant despite non-normality of
> level-2 residuals can I content myself with this? Or do I have reason to
> doubt the significance?  -AND, Maas & Hox say that "Robust standard errors
> turn out to be more reliable than the asymptotic standard errors based on
> maximum likelihood" - so should I call for robust standard errors? This is
> only possible with GLS estimation as opposed to ML/REML; do I lose out on
> something by choosing this estimation method? (Gllamm is a non-runner.)

Given the above, I don't think you need to resort to GLS.  What you
would lose with GLS is the ability to fit more complex
random-coefficient models and models with more than two levels, but that
doesn't appear to be an issue here.

> -ALSO, the level-1 and level-2 residuals must be independent and
> non-correlated. How to test for this? If I just generate one variable
> containing the lvl-1 residuals and another containing the lvl-2 residuals
> can I just do a --pwcorr-- on the two and assess it like that? Or does this
> require something more fancy/complicated?

This is reasonable.  Such diagnostics usually take the form of simple
graphs and tests.

> 2. Balance: as I said, my dataset is really quite unbalanced; does this have
> any implications for inference or standard errors? Could weighting play any
> role here, and how and why might one go about weighting, say, for some
> countries which may have relatively very few cases (or is this even
> something I need to worry about?)?

No weighting is necessary, as the unbalanced nature of the data works
itself out in the likelihood calculations and the resulting parameter
estimates and standard errors.  One concern is that REML/ML results are
asymptotic, but for the number of groups and group sizes for your data,
this should not be of concern.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index