Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Advice on xtmixed specification,pre/post two-group design

From   "Clyde Schechter" <>
Subject   Re: st: Advice on xtmixed specification,pre/post two-group design
Date   Thu, 28 Jul 2011 08:41:48 -0700


Without actually working with your data directly it is hard for me to say
too much more in specific terms.  But a couple of thoughts:

First, check that the three approaches (xtmixed, ANCOVA, regression for
change score) are actually being estimated on the same sample.  These
three approaches can differ in the way missing data affect case
inclusion--so be sure that the number of pairs being analyzed in each case
are the same.  If not, you are applying the models to different samples
and all bets are off.

Assuming that the estimation sample is indeed the same for all the
analyses, then look at the meaning of the three different models.  To
simplify the writing of equations, I'm going to ignore the teacher level
in your data--the point I'm making doesn't depend on that in any way.

Notationally, let's call y the score variable, X the vector of covariates
(including treatment group, interactions, etc.).  Let _i mean subscript
for the i'th subject, and _j (j = 1,2) denote the pre- and post-

The Hierarchical Linear Model (xtmixed with score as dependent variable) is:

y_ij = a + bX_ij + u_i + e_ij, with the usual assumptions about the u's
and e's being iid, independent of each other, expectation zero...  Note
that X_ij may include variables that change between pre- and post-, such
as time and time#control.  It is also permissible for other covariates to
change between pre- and post-.

ANCOVA is a somewhat different model (given that you have another level of
nesting you are not, strictly speaking, doing ANCOVA, but the idea is the
same for our present purposes):

y_i2 = a' + b'X'_i + cy_i1 + e'_i

Note that in this case X_i may not contain any variables that change
between pre- and post- conditions because there is only one observation
per pre/post pair.  If there are such variables in your data, then in
setting up this analysis you had to have somehow excluded time-varying
covariates or selected which value you entered into the analysis.  Clearly
that has to be done systematically and meaningfully in light of the
science of which value better predicts y_i2, or it may be that the pre-
and post- values of such covariates both enter separately in the analysis.
 Whatever the case may be in your situation, double-check that you have
set this up properly.  You can see, though, that the covariate vector X_i
may look markedly different from the X'_ij vector in the HLM.  This may
lead to different inferences about effects of particular covariates, even
about covariates that are common to both X and X'.

Now, assuming that none of the foregoing complications are involved in
your situation, think about intra-class correlation (ICC).  The HLM model
as written forces a non-negative ICC = (Var u/(Var u + var e)).  In fact,
if Var u is close to zero you probably will have problems getting the
estimation to converge, so for practical purposes, the HLM model forces
ICC >> 0.  While this is more often than not the case, there are
situations where ICC is negative--so consider whether that might be the
case in your situation.  If it is, you cannot use the HLM--it is
completely misspecified.  There is nothing exactly analogous to the ICC in
the ANCOVA model, but you can see that the magnitude and sign of the
coefficient c capture the same concept--but c is not constrained to be
non-negative.  In fact, the ANCOVA model doesn't constrain c at all--it is
a freely estimated parameter of the model.

The change score model, finally, is, in effect,  ANCOVA with the
constraint c = 1.

Now, if the reality is that the c = 1 constraint is far off the mark, this
misspecification will lead to biased estimates of the b' coefficients.

So, if the science in your field doesn't make a clear a priori statement
about which (if any) of these models best reflects your data generating
process, looking at the coefficient c in the ANCOVA model may give you a
sense of whether the HLM or change-score models are bad specifications for
your data.

Anyway, after verifying that your data management has been done correctly,
the choice of which model is best for your situation depends on what the
science in your field tells you about how these different model
specifications match up with the underlying data generating process. 
There is no uniform, generic answer to the question of which approach is

Hope this helps.

Clyde Schechter, MA MD
Associate Professor of Family & Social Medicine

Please note new e-mail address:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index