<> Dirk said Very carefully I want to ask: Are the confidence intervals given by -mean- really correct? Below I compare the results of -mean- with the results of a different procedure: and goes on to show that -mean- CIs can be reproduced by collapsing, but maintaining the DF in the confidence interval as that of the whole sample. These are the same standard errors of mean reported by tabstat price,by(rep78) stat(mean sd n semean) He wonders whether the DF used in calculating s.e.(mean) should be that of the full sample. I think that -mean- and -tabstat- are both using the notion that you have a model y = mu + \epsilon, where var(\epsilon} is a population parameter. Thus the variance of \epsilon is a constant for all subsamples, and when you calculate s.e. mean, you use the sqrt of that common variance and divide by the sqrt(sample size) of the subpopulation. You can see that is being done by -tabstat- by comparing the sd, n and semean columns. What does surprise me is that the CIs generated by these methods differ so widely from those computed by reg price i.rep78 margins rep78 The differences are not just a small-sample/large-sample adjustment of the Root MSE. If you take apart the VCE of a regression of price on all five dummies, no constant term, you find a diagonal matrix containing the inverses of the respective sample sizes, so the difference has to lie in the computation of \hat{sigma^2} which multiplies inv(X'X). Kit Baum | Boston College Economics & DIW Berlin | http://ideas.repec.org/e/pba1.html An Introduction to Stata Programming | http://www.stata-press.com/books/isp.html An Introduction to Modern Econometrics Using Stata | http://www.stata-press.com/books/imeus.html * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

