Home  /  Resources & support  /  FAQs  /  Interpreting the within and between variances in xtsum

Why don’t the decomposed variances in xtsum add up?

Title   Interpreting the within and between variances in xtsum
Author James Hardin, StataCorp

The within and between variances may not sum in the way that you expect for two reasons:

  1. The reported variance estimates are the biased-corrected variance estimates (they are multiplied by n/(n−1); the square root of that for the printed standard deviations).
  2. The data are unbalanced with the results being that the overall mean is different from the mean of the panel means.

For unbalanced data, the between variance is calculated using the mean of the panel means. This may be different from the overall mean. The overall mean can be calculated as a weighted mean of the panel means where the weights are given by the number of observations in the panel. The mean of the panel means is unweighted (or all weights equal to one if you like).

For balanced data, the only difference is the n/(n−1) factor where the overall uses n=total number of observations and the between uses n=number of panels. For illustration, look at the following example for weakly balanced data.

. use http://www.stata-press.com/data/r14/nlswork
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. by idcode:keep if _N==10
(25,834 observations deleted)

. xtsum birth_yr

Variable Mean Std. dev. Min Max Observations
birth_yr overall 48.4963 3.091477 42 53 N = 2700
between 3.096644 42 53 n = 270
within 0 48.4963 48.4963 T = 10
. display 3.091477*sqrt(2699/2700) 3.0909045 . display 3.096644*sqrt(269/270) 3.0909042