[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: RE: RE: RE: highly skewed, highly zeroed data |

Date |
Thu, 26 Nov 2009 15:06:24 -0000 |

Agreed. No one in the thread that I can recall focused on the -svy- aspect that the original poster did flag. He did return but did not address most of the comments made. Meanwhile, the others in the thread continue to discuss intermittently using the information we do have. Nick n.j.cox@durham.ac.uk Austin Nichols Nick et al.-- The sample median might not necessarily be zero once sample weights are taken account of--for example if zeros tend to have very low relative weight and nonzero cases have relatively high weights--since we are not given weights, we cannot be sure. Depending on the weights, the data might look a lot less or a lot more skewed than the unweighted tab seems to imply! But examples or simulations (to explore coverage and small-sample bias) should include weights and clusters if possible, whether estimating the overall mean or the proportion nonzero and mean or median of nonzero cases, as in http://www.stata.com/statalist/archive/2009-11/msg01354.html On Thu, Nov 26, 2009 at 5:34 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > Jay makes an interesting point, although in turn it can be restated to > acknowledge that the central limit theorem comes in numerous different > flavours depending on quite what assumptions are being made. (For > example, there are flavours allowing various kinds of dependence.) > Alternatively, purists might want to talk of a family of central limit > theorems. > > However, my guess is that this is not the central issue. (That pun was > unintentional in my first draft and deliberate in my second.) Although > with lots of zeros and strong skew the distribution concerned is awkward > practically, I'd be surprised if it was pathological mathematically, or > indicative of an underlying distribution that was. The point could be > explored a little by e.g. bootstrapping. > > The median in the sample data was clearly zero! > > Nick > n.j.cox@durham.ac.uk > > Verkuilen, Jay > > Kieran McCaul wrote: > >>The skew in the data does not stop you from calculating the mean, nor > does it stop you from calculating a 95% CI around the mean. > Regardless of the skew in the data, the sampling distribution of the > mean will be Normal.< > > Not true. It will tend towards normality (in the sense of convergence in > distribution) assuming regularity conditions for the central limit > theorem hold, which for highly skewed variables is often NOT the case. > But that convergence may be VERY slow and the resulting confidence > interval for the mean may be extremely poor (incredibly wide) or even > ludicrous (e.g., below the lower bound of the data). > > I would wonder whether the original poster might want to estimate a > median instead of a mean? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: highly skewed, highly zeroed data***From:*"Jason Ferris" <JasonF@TURNINGPOINT.ORG.AU>

**st: RE: highly skewed, highly zeroed data***From:*"Kieran McCaul" <Kieran.McCaul@uwa.edu.au>

**st: RE: RE: highly skewed, highly zeroed data***From:*"Verkuilen, Jay" <JVerkuilen@gc.cuny.edu>

**st: RE: RE: RE: highly skewed, highly zeroed data***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**Re: st: RE: RE: RE: highly skewed, highly zeroed data***From:*Austin Nichols <austinnichols@gmail.com>

- Prev by Date:
**Re: st: RE: RE: RE: highly skewed, highly zeroed data** - Next by Date:
**st: Problem with -odbc insert- in Stata 11: padded strings** - Previous by thread:
**Re: st: RE: RE: RE: highly skewed, highly zeroed data** - Next by thread:
**st: R: highly skewed, highly zeroed data** - Index(es):

© Copyright 1996–2023 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |