|Interpreting the within and between variances in xtsum
|James Hardin, StataCorp
The within and between variances may not sum in the way that you expect for two reasons:
For unbalanced data, the between variance is calculated using the mean of the panel means. This may be different from the overall mean. The overall mean can be calculated as a weighted mean of the panel means where the weights are given by the number of observations in the panel. The mean of the panel means is unweighted (or all weights equal to one if you like).
For balanced data, the only difference is the n/(n−1) factor where the overall uses n=total number of observations and the between uses n=number of panels. For illustration, look at the following example for weakly balanced data.
. use http://www.stata-press.com/data/r14/nlswork (National Longitudinal Survey. Young Women 14-26 years of age in 1968) . by idcode:keep if _N==10 (25,834 observations deleted) . xtsum birth_yr
|Mean Std. dev. Min Max
|48.4963 3.091477 42 53
|N = 2700
|3.096644 42 53
|n = 270
|0 48.4963 48.4963
|T = 10
Classroom and web training
Teaching with Stata
Statalist: The Stata Forum
Last updated: 16 November 2022
StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.
These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.