Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: RE: RE: highly skewed, highly zeroed data


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: RE: RE: highly skewed, highly zeroed data
Date   Thu, 26 Nov 2009 15:06:24 -0000

Agreed. No one in the thread that I can recall focused on the -svy-
aspect that the original poster did flag. 

He did return but did not address most of the comments made. Meanwhile,
the others in the thread continue to discuss intermittently using the
information we do have. 

Nick 
n.j.cox@durham.ac.uk 

Austin Nichols

Nick et al.--
The sample median might not necessarily be zero once sample weights
are taken account of--for example if zeros tend to have very low
relative weight and nonzero cases have relatively high weights--since
we are not given weights, we cannot be sure.  Depending on the
weights, the data might look a lot less or a lot more skewed than the
unweighted tab seems to imply!  But examples or simulations (to
explore coverage and small-sample bias) should include weights and
clusters if possible, whether estimating the overall mean or the
proportion nonzero and mean or median of nonzero cases, as in
http://www.stata.com/statalist/archive/2009-11/msg01354.html

On Thu, Nov 26, 2009 at 5:34 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> Jay makes an interesting point, although in turn it can be restated to
> acknowledge that the central limit theorem comes in numerous different
> flavours depending on quite what assumptions are being made. (For
> example, there are flavours allowing various kinds of dependence.)
> Alternatively, purists might want to talk of a family of central limit
> theorems.
>
> However, my guess is that this is not the central issue. (That pun was
> unintentional in my first draft and deliberate in my second.) Although
> with lots of zeros and strong skew the distribution concerned is
awkward
> practically, I'd be surprised if it was pathological mathematically,
or
> indicative of an underlying distribution that was. The point could be
> explored a little by e.g. bootstrapping.
>
> The median in the sample data was clearly zero!
>
> Nick
> n.j.cox@durham.ac.uk
>
> Verkuilen, Jay
>
> Kieran McCaul wrote:
>
>>The skew in the data does not stop you from calculating the mean, nor
> does it stop you from calculating a 95% CI around the mean.
> Regardless of the skew in the data, the sampling distribution of the
> mean will be Normal.<
>
> Not true. It will tend towards normality (in the sense of convergence
in
> distribution) assuming regularity conditions for the central limit
> theorem hold, which for highly skewed variables is often NOT the case.
> But that convergence may be VERY slow and the resulting confidence
> interval for the mean may be extremely poor (incredibly wide) or even
> ludicrous (e.g., below the lower bound of the data).
>
> I would wonder whether the original poster might want to estimate a
> median instead of a mean?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index