From  Evans Jadotte <evans.jadotte@uab.es> 
To  statalist@hsphsun2.harvard.edu 
Subject  Re: st: Hierarchical model 
Date  Sat, 03 Oct 2009 12:18:18 +0200 
Roberto G. Gutierrez, StataCorp wrote:
Hi Bobby,Evans Jadotte <evans.jadotte@uab.es> provides more detail:My original model is:.xi: xtmixed logequiv i.education i.mpsex mpage* i.sector non_farm small_cattle large_cattle s_s_cattle shock_cattle dumland1 tech_land electricity landline piped_water road_access death d_rem i.hhtype strata: cluster:, varStrata is my level3 identifier and cluster (which is nested into strata) is my level2 identifier. Then I have households nested into clusters.Predict stub* produces the empirical Bayes (EB) prediction of the random effects on strata ad clusters, which I hoped would coincide with the one calculated manually after estimation via "restricted maximum likelihoodREML"..egen REML_id = mean(resid), by(id)The residuals were calculated previously using the command: predict double resid, residuals and account for both random intercepts above. They are supposed to be the total residuals, i.e. the raw residuals from level1 for households, plus the residuals for the two random intercepts (strata and cluster). I understand that stub* should inferior or equal to REML_id, which is not for level3 (strata). Specifically, Stata gives me an EB for strata that is more than 300 times greater than REML_strataIn the egen command above, I assume you mean by(strata) instead of by(id), because variable id is not part of your mixed model. In any case, why would you expect the estimated random effects to be smaller in magnitude than the groupaveraged residuals? Random effects can vary significantly from group to group around their mean of zero. Total residuals, however, have mean zero over the entire dataset, and although they don't need to have mean zero within each group, it would be weird for them to have a mean significantly far from zero for any particular group. After all, these residuals do take into account the estimated random intercepts, and so all you have left is observationlevel variability. As an example, consider . webuse pig, clear (Longitudinal analysis of pig weights) . xtmixed weight week  id: (output omitted) . predict r1, reffects . predict resid, residuals . egen reml_id = mean(resid), by(id) . sum r1 reml_id Variable  Obs Mean Std. Dev. Min Max + r1  432 1.98e08 3.794276 7.17375 9.079873 reml_id  432 3.93e09 .1223594 .2313422 .2928116 This shows the estimated random effects to have larger magnitude (see the Std. Dev. column) than the groupaveraged residuals, which is what you should expect. What I think you are trying to calculate for reml_id are group mean residuals that DO NOT take into account random effects. Try . predict xbeta, xb // No random effects . gen resid2 = weight  xb // Residual calculated manually . egen reml_id2 = mean(resid2), by(id) . sum r1 reml_id2 Variable  Obs Mean Std. Dev. Min Max + r1  432 1.98e08 3.794276 7.17375 9.079873 reml_id2  432 7.02e07 3.916635 7.405093 9.372684 No you see the behavior that you expect. The key is that REMLbased estimates of random effects should be based on residuals that do not take into account random effects. After all, you are trying to estimate the random effect itself  if you leave it in, there is nothing left to estimate.Additionally, the sum of the residuals provided by Stata in the random part of the output should be the total residual (residual level1+ residual level2 + residual level3), whose variance I calculate as follows:.nl(resid2=exp({xb: $list one})).predict sigma2, xbwhere $list is a skedasticity function. The estimate I get is less than the residual at level1, which I did not expect. I do not see quite well what I am doing wrong here. I would really appreciate your feedback.Given the above, my guess is that if you redefine your residuals to be those that do not take into account random effects, you'll see behavior more in line with your expectations. Bobby rgutierrez@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ (Time difference inhibited prompt response). Indeed many thanks for your feedback. Yes the 'id' stands for either strata or cluster. I shall replace it now. And yes I have tried your procedure of calculating the REMLs based on the raw residuals, but since the results did not make sense to me (they do not have zero mean) so I thought I should proceed with calculations based on total residuals, which gave me a mean zero. My calculations for instance are, .xi: quietly xtmixed logequiv i.education i.mpsex mpage* i.sector non_farm small_cattle large_cattle s_s_cattle shock_cattle dumland1 tech_land electricity landline piped_water road_access death d_rem i.hhtype cluster:, var estimates store model1 .xi: quietly xtmixed logequiv i.education i.mpsex mpage* i.sector non_farm small_cattle large_cattle s_s_cattle shock_cattle dumland1 tech_land electricity landline piped_water road_access death d_rem i.hhtype strata: cluster:, var . estimates store model2The random part of the model2 look like this:  . lrtest model1 model2 Likelihoodratio
test
LR chibar2(01) = 135.49 The last test strongly suggests that cluster is nested into strata and that the latter should be in the model. I tried estimating the other way around (strata nested into cluster) just to check, and convergence failed. So, based on the threelevel model: . predict xbeta, xb . gen res_fix=logequivxbeta //raw residuals calculated manually . predict res_tot, residuals //get total residuals (which account for the random intercepts) . egen random_c=mean(res_fix), by(cluster) //get the random effect for cluster . egen random_s=mean(res_fix), by(strata) //get the random effect for strata . predict EB_cluster, reffects level(cluster) //get the empirical Bayes of re_cluster . predict EB_strata, reffects level(strata) //get the empirical Bayes of re_strata . sum res_fix res_tot EB_cluster random_c EB_strata random_s Variable

Obs Mean Std.
Dev.
Min Max
.
predict sigma2
+
sigma2  7157
.967807 .1937023 .2968854
4.334954
In any case, the nesting in the data seems to be
correct, but the results are not what I expected according to theory
even by trying to set Z to overlap totally the fixed part matrix of th
original model. I am quite puzzled and the reference books
(RabeHesketh ad Skrondal, 2005); (Raudenbush and Bryck, 2002) do not
help much. Well, this is a long thread Bobby and I do not
want to take too much of your time with this and I will understand if
you have better fish to fry. I will keep digging to see what I find but
any further suggestions would be very helpful! Thanks, Evans 
© Copyright 1996–2014 StataCorp LP  Terms of use  Privacy  Contact us  What's new  Site index 