Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: RE: RE: highly skewed, highly zeroed data


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: RE: RE: highly skewed, highly zeroed data
Date   Thu, 26 Nov 2009 09:52:18 -0500

Nick et al.--
The sample median might not necessarily be zero once sample weights
are taken account of--for example if zeros tend to have very low
relative weight and nonzero cases have relatively high weights--since
we are not given weights, we cannot be sure.  Depending on the
weights, the data might look a lot less or a lot more skewed than the
unweighted tab seems to imply!  But examples or simulations (to
explore coverage and small-sample bias) should include weights and
clusters if possible, whether estimating the overall mean or the
proportion nonzero and mean or median of nonzero cases, as in
http://www.stata.com/statalist/archive/2009-11/msg01354.html

On Thu, Nov 26, 2009 at 5:34 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> Jay makes an interesting point, although in turn it can be restated to
> acknowledge that the central limit theorem comes in numerous different
> flavours depending on quite what assumptions are being made. (For
> example, there are flavours allowing various kinds of dependence.)
> Alternatively, purists might want to talk of a family of central limit
> theorems.
>
> However, my guess is that this is not the central issue. (That pun was
> unintentional in my first draft and deliberate in my second.) Although
> with lots of zeros and strong skew the distribution concerned is awkward
> practically, I'd be surprised if it was pathological mathematically, or
> indicative of an underlying distribution that was. The point could be
> explored a little by e.g. bootstrapping.
>
> The median in the sample data was clearly zero!
>
> Nick
> n.j.cox@durham.ac.uk
>
> Verkuilen, Jay
>
> Kieran McCaul wrote:
>
>>The skew in the data does not stop you from calculating the mean, nor
> does it stop you from calculating a 95% CI around the mean.
> Regardless of the skew in the data, the sampling distribution of the
> mean will be Normal.<
>
> Not true. It will tend towards normality (in the sense of convergence in
> distribution) assuming regularity conditions for the central limit
> theorem hold, which for highly skewed variables is often NOT the case.
> But that convergence may be VERY slow and the resulting confidence
> interval for the mean may be extremely poor (incredibly wide) or even
> ludicrous (e.g., below the lower bound of the data).
>
> I would wonder whether the original poster might want to estimate a
> median instead of a mean?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index