[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Hierarchical model

From (Roberto G. Gutierrez, StataCorp)
Subject   Re: st: Hierarchical model
Date   Fri, 02 Oct 2009 14:54:36 -0500

Evans Jadotte <> provides more detail:

> My original model is: 

> .xi: xtmixed logequiv i.mpsex mpage* i.sector non_farm
> small_cattle large_cattle s_s_cattle shock_cattle dumland1 tech_land
> electricity landline piped_water road_access death d_rem i.hhtype ||strata:
> ||cluster:, var

> Strata is my level-3 identifier and cluster (which is nested into strata) is
> my level-2 identifier. Then I have households nested into clusters.

> Predict stub* produces the empirical Bayes (EB) prediction of the random
> effects on strata ad clusters, which I hoped would coincide with the one
> calculated manually after estimation via "restricted maximum
> likelihood-REML".

> .egen REML_id = mean(resid), by(id)

> The residuals were calculated previously using the command: predict double
> resid, residuals and account for both random intercepts above.  They are
> supposed to be the total residuals, i.e. the raw residuals from level-1 for
> households, plus the residuals for the two random intercepts (strata and
> cluster). I understand that stub* should inferior or equal to REML_id, which
> is not for level-3 (strata). Specifically, Stata gives me an EB for strata
> that is more than 300 times greater than REML_strata

In the -egen- command above, I assume you mean -by(strata)- instead of
-by(id)-, because variable -id- is not part of your mixed model.

In any case, why would you expect the estimated random effects to be smaller
in magnitude than the group-averaged residuals?  Random effects can vary
significantly from group to group around their mean of zero.  Total residuals,
however, have mean zero over the entire dataset, and although they don't need
to have mean zero within each group, it would be weird for them to have a mean
significantly far from zero for any particular group.  After all, these
residuals do take into account the estimated random intercepts, and so all you
have left is observation-level variability.

As an example, consider 

. webuse pig, clear
(Longitudinal analysis of pig weights)

. xtmixed weight week || id:

(output omitted)

. predict r1, reffects

. predict resid, residuals

. egen reml_id = mean(resid), by(id)

. sum r1 reml_id

    Variable |       Obs        Mean    Std. Dev.       Min        Max
          r1 |       432   -1.98e-08    3.794276   -7.17375   9.079873
     reml_id |       432   -3.93e-09    .1223594  -.2313422   .2928116

This shows the estimated random effects to have larger magnitude (see the 
Std. Dev. column) than the group-averaged residuals, which is what you should

What I think you are trying to calculate for -reml_id- are group mean
residuals that DO NOT take into account random effects.  Try

. predict xbeta, xb			// No random effects

. gen resid2 = weight - xb 	        // Residual calculated manually

. egen reml_id2 = mean(resid2), by(id)

. sum r1 reml_id2

    Variable |       Obs        Mean    Std. Dev.       Min        Max
          r1 |       432   -1.98e-08    3.794276   -7.17375   9.079873
    reml_id2 |       432   -7.02e-07    3.916635  -7.405093   9.372684

No you see the behavior that you expect.  The key is that REML-based estimates
of random effects should be based on residuals that do not take into account
random effects.  After all, you are trying to estimate the random effect
itself -- if you leave it in, there is nothing left to estimate.

> Additionally, the sum of the residuals provided by Stata in the random part
> of the output should be the total residual (residual level-1+ residual
> level-2 + residual level-3), whose variance I calculate as follows:

> .nl(resid2=exp({xb: $list one}))

> .predict sigma2, xb

> where $list is a skedasticity function. The estimate I get is less than the
> residual at level-1, which I did not expect. I do not see quite well what I
> am doing wrong here. I would really appreciate your feedback.

Given the above, my guess is that if you redefine your residuals to be those
that do not take into account random effects, you'll see behavior more in line
with your expectations.

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index