Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Multilevel modelling: questions about independence, residuals, balance

From   Owen Corrigan <[email protected]>
To   [email protected]
Subject   st: Multilevel modelling: questions about independence, residuals, balance
Date   Mon, 31 Jan 2011 13:12:13 +0000

I have some general questions about Multilevel Modeling (MLM). I've been reading
the Stata Press book by Rabe-Hesketh and Skrondal (2008, 2nd edn.) but
there are some specifics I'm unsure about. The dataset consists
of individual person-level (level-1) observations (N = 23759) over 14
level-2 units (units are countries), but the level-1 data is spread
very unevenly (unbalanced) over the level-2 units, with the min.
observations per group = 300, and the max obs per group = 3000 approx.
The dataset is actually two merged datasets from 2006 and 2007
(independent observations though, non-longitudinal). I have covariates
at both levels; the point of the entire exercise is really to test
whether a hypothesised level-2 covariate is significant.

1. Diagnostics: my level-2 residuals --predict varname, reffects--
after --xtmixed-- are non-normal. Does the violation of this
assumption mean that I cannot be confident in the standard errors for
my key level-2 variable (which proved signficant, incidentally). I
have read (Maas & Hox 2004) that this only matters for the standard
errors of the random effects; but this is of no interest to me and I
am only concerned with the beta/standard error for this one level-2

--->QUESTION: If this variable is significant despite non-normality of
level-2 residuals can I content myself with this? Or do I have reason
to doubt the significance?
-AND, Maas & Hox say that "Robust standard errors turn out to be more
reliable than the asymptotic standard errors based on maximum
likelihood" - so should I call for robust standard errors? This is
only possible with GLS estimation as opposed to ML/REML; do I lose out
on something by choosing this estimation method? (Gllamm is a

-ALSO, the level-1 and level-2 residuals must be independent and
non-correlated. How to test for this? If I just generate one variable
containing the lvl-1 residuals and another containing the lvl-2
residuals can I just do a --pwcorr-- on the two and assess it like
that? Or does this require something more fancy/complicated?

2. Balance: as I said, my dataset is really quite unbalanced; does
this have any implications for inference or standard errors? Could
weighting play any role here, and how and why might one go about
weighting, say, for some countries which may have relatively very few
cases (or is this even something I need to worry about?)?

Many thanks for all and any clarification you may be able to render on
any point here, small or large.
Owen Corrigan.
PhD student
Trinity College Dublin.
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index