[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: RE: RE: highly skewed, highly zeroed data |

Date |
Thu, 26 Nov 2009 09:52:18 -0500 |

Nick et al.-- The sample median might not necessarily be zero once sample weights are taken account of--for example if zeros tend to have very low relative weight and nonzero cases have relatively high weights--since we are not given weights, we cannot be sure. Depending on the weights, the data might look a lot less or a lot more skewed than the unweighted tab seems to imply! But examples or simulations (to explore coverage and small-sample bias) should include weights and clusters if possible, whether estimating the overall mean or the proportion nonzero and mean or median of nonzero cases, as in http://www.stata.com/statalist/archive/2009-11/msg01354.html On Thu, Nov 26, 2009 at 5:34 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > Jay makes an interesting point, although in turn it can be restated to > acknowledge that the central limit theorem comes in numerous different > flavours depending on quite what assumptions are being made. (For > example, there are flavours allowing various kinds of dependence.) > Alternatively, purists might want to talk of a family of central limit > theorems. > > However, my guess is that this is not the central issue. (That pun was > unintentional in my first draft and deliberate in my second.) Although > with lots of zeros and strong skew the distribution concerned is awkward > practically, I'd be surprised if it was pathological mathematically, or > indicative of an underlying distribution that was. The point could be > explored a little by e.g. bootstrapping. > > The median in the sample data was clearly zero! > > Nick > n.j.cox@durham.ac.uk > > Verkuilen, Jay > > Kieran McCaul wrote: > >>The skew in the data does not stop you from calculating the mean, nor > does it stop you from calculating a 95% CI around the mean. > Regardless of the skew in the data, the sampling distribution of the > mean will be Normal.< > > Not true. It will tend towards normality (in the sense of convergence in > distribution) assuming regularity conditions for the central limit > theorem hold, which for highly skewed variables is often NOT the case. > But that convergence may be VERY slow and the resulting confidence > interval for the mean may be extremely poor (incredibly wide) or even > ludicrous (e.g., below the lower bound of the data). > > I would wonder whether the original poster might want to estimate a > median instead of a mean? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: RE: RE: RE: highly skewed, highly zeroed data***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**References**:**st: highly skewed, highly zeroed data***From:*"Jason Ferris" <JasonF@TURNINGPOINT.ORG.AU>

**st: RE: highly skewed, highly zeroed data***From:*"Kieran McCaul" <Kieran.McCaul@uwa.edu.au>

**st: RE: RE: highly skewed, highly zeroed data***From:*"Verkuilen, Jay" <JVerkuilen@gc.cuny.edu>

**st: RE: RE: RE: highly skewed, highly zeroed data***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: RE: AW: RE: display all categories on pie chart for categorical variables (with some zero values)** - Next by Date:
**RE: st: RE: RE: RE: highly skewed, highly zeroed data** - Previous by thread:
**Re: st: RE: RE: RE: highly skewed, highly zeroed data** - Next by thread:
**RE: st: RE: RE: RE: highly skewed, highly zeroed data** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |