Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# RE: st: Difference of means and t-test

 From "Nick Cox" To Subject RE: st: Difference of means and t-test Date Tue, 15 Jun 2010 17:59:27 +0100

As before, I see no difference in view here, despite what you say.

P-values based on imputed Gaussians will be correct only insofar as the
underlying distribution really was Gaussian, and otherwise dubious.
Naturally, that is impossible to check without the original data, which
in this circumstance we do not have. But often we have experience with
similar data, a point often overlooked.

As Bertrand Russell said somewhere, the method of "postulating" what we
want has many advantages; they are the same as the advantages of theft
over honest toil.

Sure, the test statistics will be the same, but not their
interpretation.

Nick
n.j.cox@durham.ac.uk

Richard Williams

At 02:25 PM 6/14/2010, Nick Cox wrote:
>I don't think our views are contradictory. It is clearly true that you
>can get results from summary statistics alone. But erecting fake
>Gaussians with those summaries is not equivalent to reconstructing the
>original data. That is my point, and no more. It is akin to arguments
at
>a higher level about "sufficient statistics". If something is normal,
>then it is sufficient to know mean and sd, but there isn't a reverse
>argument.
>
>At 11:19 AM 6/14/2010, Nick Cox wrote:
> >-- except that will surely overstate the strength of the conclusions,
>in
> >so far as the real distributions are unlikely to be exactly Gaussian.

Still, it is incorrect to say that constructing fake Gaussians "will
surely overstate the strength of the conclusions."  The p values are
based on various assumptions, e.g. normally distributed,
homoskedastic errors.  If the assumptions are wrong, the p values are
wrong.  But, whether the assumptions are correct or not, the
calculation of the test statistics and coefficients are the same,
i.e. for regression-type problems if you've got the means,
correlations and standard deviations there are all sorts of things
you can compute without having the rest of the data.  You run a
regression or Anova with the "fake" data and you'll get the exact
same results as with the real data.

Of course, without having the original data, you can't, say, do
diagnostic tests of assumptions, analyze subsets of the data, add an
x^2 term, etc. So, yes, you greatly prefer having the real data!  But
if the real data aren't available there is still a lot you can do.  I
don't know why the original poster was using ttesti instead of ttest,
but if it was because he only had summary statistics available to him
then it would be possible for him to run an Anova the way I suggested
and the numbers he would get would be the same as if he had the real
data.  There probably wouldn't be a whole lot else he could do
though, e.g. the predict command and most other post-estimation
commands won't be of much use without the real data.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/