Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Difference of means and t-test

From   "Nick Cox" <>
To   <>
Subject   RE: st: Difference of means and t-test
Date   Tue, 15 Jun 2010 17:59:27 +0100

As before, I see no difference in view here, despite what you say. 

P-values based on imputed Gaussians will be correct only insofar as the
underlying distribution really was Gaussian, and otherwise dubious.
Naturally, that is impossible to check without the original data, which
in this circumstance we do not have. But often we have experience with
similar data, a point often overlooked. 

As Bertrand Russell said somewhere, the method of "postulating" what we
want has many advantages; they are the same as the advantages of theft
over honest toil.

Sure, the test statistics will be the same, but not their


Richard Williams

At 02:25 PM 6/14/2010, Nick Cox wrote:
>I don't think our views are contradictory. It is clearly true that you
>can get results from summary statistics alone. But erecting fake
>Gaussians with those summaries is not equivalent to reconstructing the
>original data. That is my point, and no more. It is akin to arguments
>a higher level about "sufficient statistics". If something is normal,
>then it is sufficient to know mean and sd, but there isn't a reverse
>At 11:19 AM 6/14/2010, Nick Cox wrote:
> >-- except that will surely overstate the strength of the conclusions,
> >so far as the real distributions are unlikely to be exactly Gaussian.

Still, it is incorrect to say that constructing fake Gaussians "will 
surely overstate the strength of the conclusions."  The p values are 
based on various assumptions, e.g. normally distributed, 
homoskedastic errors.  If the assumptions are wrong, the p values are 
wrong.  But, whether the assumptions are correct or not, the 
calculation of the test statistics and coefficients are the same, 
i.e. for regression-type problems if you've got the means, 
correlations and standard deviations there are all sorts of things 
you can compute without having the rest of the data.  You run a 
regression or Anova with the "fake" data and you'll get the exact 
same results as with the real data.

Of course, without having the original data, you can't, say, do 
diagnostic tests of assumptions, analyze subsets of the data, add an 
x^2 term, etc. So, yes, you greatly prefer having the real data!  But 
if the real data aren't available there is still a lot you can do.  I 
don't know why the original poster was using ttesti instead of ttest, 
but if it was because he only had summary statistics available to him 
then it would be possible for him to run an Anova the way I suggested 
and the numbers he would get would be the same as if he had the real 
data.  There probably wouldn't be a whole lot else he could do 
though, e.g. the predict command and most other post-estimation 
commands won't be of much use without the real data.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index