[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: RE: RE: highly skewed, highly zeroed data

From	Austin Nichols <[email protected]>
To	[email protected]
Subject	Re: st: RE: RE: RE: highly skewed, highly zeroed data
Date	Thu, 26 Nov 2009 09:52:18 -0500

Nick et al.--
The sample median might not necessarily be zero once sample weights
are taken account of--for example if zeros tend to have very low
relative weight and nonzero cases have relatively high weights--since
we are not given weights, we cannot be sure.  Depending on the
weights, the data might look a lot less or a lot more skewed than the
unweighted tab seems to imply!  But examples or simulations (to
explore coverage and small-sample bias) should include weights and
clusters if possible, whether estimating the overall mean or the
proportion nonzero and mean or median of nonzero cases, as in
http://www.stata.com/statalist/archive/2009-11/msg01354.html

On Thu, Nov 26, 2009 at 5:34 AM, Nick Cox <[email protected]> wrote:
> Jay makes an interesting point, although in turn it can be restated to
> acknowledge that the central limit theorem comes in numerous different
> flavours depending on quite what assumptions are being made. (For
> example, there are flavours allowing various kinds of dependence.)
> Alternatively, purists might want to talk of a family of central limit
> theorems.
>
> However, my guess is that this is not the central issue. (That pun was
> unintentional in my first draft and deliberate in my second.) Although
> with lots of zeros and strong skew the distribution concerned is awkward
> practically, I'd be surprised if it was pathological mathematically, or
> indicative of an underlying distribution that was. The point could be
> explored a little by e.g. bootstrapping.
>
> The median in the sample data was clearly zero!
>
> Nick
> [email protected]
>
> Verkuilen, Jay
>
> Kieran McCaul wrote:
>
>>The skew in the data does not stop you from calculating the mean, nor
> does it stop you from calculating a 95% CI around the mean.
> Regardless of the skew in the data, the sampling distribution of the
> mean will be Normal.<
>
> Not true. It will tend towards normality (in the sense of convergence in
> distribution) assuming regularity conditions for the central limit
> theorem hold, which for highly skewed variables is often NOT the case.
> But that convergence may be VERY slow and the resulting confidence
> interval for the mean may be extremely poor (incredibly wide) or even
> ludicrous (e.g., below the lower bound of the data).
>
> I would wonder whether the original poster might want to estimate a
> median instead of a mean?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: RE: RE: RE: highly skewed, highly zeroed data
  - From: "Nick Cox" <[email protected]>

References:
- st: highly skewed, highly zeroed data
  - From: "Jason Ferris" <[email protected]>
- st: RE: highly skewed, highly zeroed data
  - From: "Kieran McCaul" <[email protected]>
- st: RE: RE: highly skewed, highly zeroed data
  - From: "Verkuilen, Jay" <[email protected]>
- st: RE: RE: RE: highly skewed, highly zeroed data
  - From: "Nick Cox" <[email protected]>

Prev by Date: st: RE: AW: RE: display all categories on pie chart for categorical variables (with some zero values)
Next by Date: RE: st: RE: RE: RE: highly skewed, highly zeroed data
Previous by thread: Re: st: RE: RE: RE: highly skewed, highly zeroed data
Next by thread: RE: st: RE: RE: RE: highly skewed, highly zeroed data
Index(es):
- Date
- Thread