Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# st: Re: correct confidence intervals of -mean- ?

 From Dirk Enzmann To statalist@hsphsun2.harvard.edu Subject st: Re: correct confidence intervals of -mean- ? Date Sat, 06 Mar 2010 15:01:10 +0100

Kit,


it is not the s.e.(mean) that pose a problem here but the df used by invttest(). I could understand if -mean- would calculate a z-test where the df are irrelevant (although then the CIs would be not useful for small samples). But it calculates a t-test, and using the correct df is the essence of a t-test.

Dirk

Kit Baum wrote:
> Dirk said
>
> Very carefully I want to ask: Are the confidence intervals given by
> -mean- really correct?
>
> Below I compare the results of -mean- with the results of a different
> procedure:
>
> and goes on to show that -mean- CIs can be reproduced by collapsing,
> but maintaining the DF in the confidence interval as that of the whole
> sample. These are the same standard errors of mean reported by
>
> tabstat price,by(rep78) stat(mean sd n semean)
>
> He wonders whether the DF used in calculating s.e.(mean) should be
> that of the full sample. I think that -mean- and -tabstat- are both
> using the notion that you have a model y = mu + \epsilon, where
> var(\epsilon} is a population parameter. Thus the variance of \epsilon
> is a constant for all subsamples, and when you calculate s.e. mean,
> you use the sqrt of that common variance and divide by the sqrt(sample
> size) of the subpopulation.  You can see that is being done by
> -tabstat- by comparing the sd, n and semean columns.
>
> What does surprise me is that the CIs generated by these methods
> differ so widely from those computed by
>
> reg price i.rep78
> margins rep78
>
> The differences are not just a small-sample/large-sample adjustment of
> the Root MSE. If you take apart the VCE of a regression of price on
> all five dummies, no constant term, you find a diagonal matrix
> containing the inverses of the respective sample sizes, so the
> difference has to lie in the computation of \hat{sigma^2} which
> multiplies inv(X'X).

*************************************************
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Schlueterstr. 28
D-20146 Hamburg
Germany

phone: +49-(0)40-42838.7498 (office)
+49-(0)40-42838.4591 (Mrs Billon)
fax:   +49-(0)40-42838.2344
email: dirk.enzmann@uni-hamburg.de

www: http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html
*************************************************
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/