From |
"David M. Drukker, Stata Corp" <[email protected]> |

To |
[email protected] |

Subject |
Re: st: Within and between variances |

Date |
Tue, 13 May 2003 14:57:12 -0500 |

Ineta Sokolowski <[email protected]> wrote: > Can anybody explain, why the one-way ANOVA (-loneway-) and -xtsum- gives > different standard deviations for within and between effects of a > variable? > > . loneway diagnosis gpnum > . xtsum diagnosis, i(gpnum) > > where "diagnosis" is a dichotomous variable (0=no illness, 1=illness) > and "gpnum" is the general practitioners (GP) number (38 different > numbers). Each GP has different number of patients (between 44 and 111). > > How are the SD calculated in each procedure? > -xtsum- and -loneway- provide different summaries of the data. -xtsum- is summarizing the overall variable, the between transformed variable and the within transformed variable. The reported standard deviations are the estimated standard deviations for the transformed variables. In contrast, -loneway- provides a one-way analysis of variance decomposition of the specified variable. The formula for computing these standard deviations are standard in the ANOVA literature and documented in [R] loneway. It is interesting to note that the reported standard deviations correspond to the variance components in a constant only model. Let's consider the case of -xtsum- in more detail. Since the manual does not go into great detail, I will. Let _ _ _ ytilde_it = y_it - y_i + y be the within transformed variable, where y_it are the observations on the specified variable in group i at time t, _ y_i is the mean of y_it over the observations in group i, and _ _ y is the overall mean of y. The reported within standard deviation is the estimated standard deviation of ytilde. For the between model, the reported standard deviation is the estimated _ standard deviation of the n group means y_i. Since the formulas for computing the standard deviations reported by -loneway- are given in [R] loneway, I will not repeat them here. Still, for those who think in -xt- terms, it is interesting to note that the between standard deviation is an estimate of the standard deviation of the individual level effect in a random-effects models in which the only regressor is a constant. Furthermore, the reported within standard deviation is an estimate of the standard deviation of the idiosyncratic error in a random-effects model in which the only regressor is a constant. I hope that this helps. David [email protected] * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

