"Nick Cox" <n.j.cox@durham.ac.uk>

<statalist@hsphsun2.harvard.edu>

st: RE: RE: RE: highly skewed, highly zeroed data

Thu, 26 Nov 2009 10:34:10 -0000

Jay makes an interesting point, although in turn it can be restated to acknowledge that the central limit theorem comes in numerous different flavours depending on quite what assumptions are being made. (For example, there are flavours allowing various kinds of dependence.) Alternatively, purists might want to talk of a family of central limit theorems. However, my guess is that this is not the central issue. (That pun was unintentional in my first draft and deliberate in my second.) Although with lots of zeros and strong skew the distribution concerned is awkward practically, I'd be surprised if it was pathological mathematically, or indicative of an underlying distribution that was. The point could be explored a little by e.g. bootstrapping. The median in the sample data was clearly zero! Nick n.j.cox@durham.ac.uk Verkuilen, Jay Kieran McCaul wrote: >The skew in the data does not stop you from calculating the mean, nor does it stop you from calculating a 95% CI around the mean. Regardless of the skew in the data, the sampling distribution of the mean will be Normal.< Not true. It will tend towards normality (in the sense of convergence in distribution) assuming regularity conditions for the central limit theorem hold, which for highly skewed variables is often NOT the case. But that convergence may be VERY slow and the resulting confidence interval for the mean may be extremely poor (incredibly wide) or even ludicrous (e.g., below the lower bound of the data). I would wonder whether the original poster might want to estimate a median instead of a mean? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

