 From "David M. Drukker, Stata Corp" <[email protected]> To [email protected] Subject Re: st: Within and between variances Date Tue, 13 May 2003 14:57:12 -0500

```Ineta Sokolowski <[email protected]> wrote:

> Can anybody explain, why the one-way ANOVA (-loneway-) and -xtsum- gives
> different standard deviations for within and between effects of a
> variable?
> . loneway diagnosis gpnum
> . xtsum diagnosis, i(gpnum)
> where "diagnosis" is a dichotomous variable (0=no illness, 1=illness)
> and "gpnum" is the general practitioners (GP) number (38 different
> numbers). Each GP has different number of patients (between 44 and 111).
> How are the SD calculated in each procedure?
-xtsum- and -loneway- provide different summaries of the data.  -xtsum- is
summarizing the overall variable, the between transformed variable and the
within transformed variable.  The reported standard deviations are the
estimated standard deviations for the transformed variables.  In contrast,
-loneway- provides a one-way analysis of variance decomposition of the
specified variable.  The formula for computing these standard deviations are
standard in the ANOVA literature and documented in [R] loneway.  It is
interesting to note that the reported standard deviations correspond to the
variance components in a constant only model.

Let's consider the case of -xtsum- in more detail.  Since the manual does not
go into great detail, I will.  Let

_
_     _
ytilde_it = y_it - y_i + y

be the within transformed variable,

where
y_it are the observations on the specified
variable in group i at time t,

_
y_i is the mean of y_it over the observations in group i,
and
_
_
y is the overall mean of y.

The reported within standard deviation is the estimated standard deviation
of ytilde.

For the between model, the reported standard deviation is the estimated
_
standard deviation of the n group means y_i.

Since the formulas for computing the standard deviations reported by
-loneway- are given in [R] loneway, I will not repeat them here.  Still, for
those who think in -xt- terms, it is interesting to note that the between
standard deviation is an estimate of the standard deviation of the
individual level effect in a random-effects models in which the only
regressor is a constant.  Furthermore, the reported within standard
deviation is an estimate of the standard deviation of the idiosyncratic
error in a random-effects model in which the only regressor is a constant.

I hope that this helps.

David
[email protected]
