Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: non-nested random effects


From   rgutierrez@stata.com (Roberto G. Gutierrez, StataCorp)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: non-nested random effects
Date   Sat, 11 Apr 2009 07:13:12 -0500

In response to my previous post, Jacob Wegelin <jacob.wegelin@gmail.com>
asks:

> Thank you for the tip. With your suggested syntax (see below), Stata fit the
> model and returned coefficients and SEs very close to what I had obtained in
> R. But

> (1) Why is different syntax needed for the two different random effects?
> That is, why are the two random effects (group variables) not treated
> symmetrically?

Normally they would be treated symmetrically, but the large dimension required
that you use some creative nesting to get what you want.  The -xtmixed- syntax
is one of nested levels of random effects, with the nested proceeding from
left to right as the nesting gets deeper.  Each successive level is separated
by ||.

In a crossed model there is no nesting.  As such, the standard syntax treats
all the data (_all:) as one big group and models the effects as random
coefficients on the whole data.

    xtmixed y x1 x2 || _all:R.year || _all:R.name

Say that year has 20 levels and name 100.  Then the above syntax treats the
model as one realization of a random-effects vector of dimension 120.  You can
identify the variance components because of the diagonal structure inherent in
the model.

When the dimension gets too large to make the above feasible, you can fit the
same model by taking the year effects precisely as above, and then "nesting"
the groups identified by name with the entire dataset:

    xtmixed y x1 x2 || _all:R.year || name:

Given the year effects at the _all: level, the crossed effects due to name can
be treated as nested within all the data.  The advantage of this method is
that you don't have to evaluate the likelihood on the whole data all at once.
You can calculate the likelihood for the first name group, then the second,
and so on, and then add them up to get the total.  You can do this because
conditional on the coeffcients for years at the _all: level, the name groups
are statistically independent.

You get the same answer as the first syntax, but the dimension is reduced.
For every name group you do consider all 20 levels or year, but you only have
one "level" of name within that group.  The total matrix dimension you have to
work on is 21.

> (2) Why does Stata's ability to fit the model depend on the order of the
> random effects in the -xtmixed- command?

In the above, we treat name as nested within all the data.  If we reverse the
order we would be treating year as nested within name, in which case for every
sub-likelihood evalulated at each year level we would have to process all the
levels of name simultaneously.  If the name dimension is much greater than the
year dimension, this can affect computation time and create numerical issues
due to large matrix inversions.

> (3) Where do you check the matrix dimension of your model in Stata? Do you
> refer to the number of columns of Z in the standard notation Y=X beta + Z b
> + epsilon?

See the above. It is the dimension of "R.name" (20) plus that of a random 
intercept at the name level (1).

> (4) How could Stata fit a model with 2930 + 22 BLUPs and a matrix dimension
> of only 23? (Using -xtmixed postestimation- I obtained the BLUPs for both
> random effects and confirmed that Stata computed 22 BLUPs for Year and 2930
> for NAME.)

See the above.  Creative nesting.  But I assure you that Stata did fit the 
model you wanted.

> (5) Why does the output below state that the first group variable is "_all"
> rather than Year; why does it not plainly state that Year has 22 levels?

See the above.  The output is designed to describe the nesting structure of
the data.  When you have crossed effects instead, the first "nested" level is
all the data.

> (6) Finally, does this syntax extend to three or more non-nested grouping
> variables?

Yes.  You would then have two "_all:R.varname" levels, followed by a
"varname3:" level nested within all the data.  Be sure to put the big grouping
last.

--Bobby
rgutierrez@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index