# st: RE: highly skewed, highly zeroed data

 From "Kieran McCaul" <[email protected]> To <[email protected]> Subject st: RE: highly skewed, highly zeroed data Date Wed, 25 Nov 2009 13:52:32 +0800

```...

The skew in the data does not stop you from calculating the mean, nor
does it stop you from calculating a 95% CI around the mean.
Regardless of the skew in the data, the sampling distribution of the
mean will be Normal.

clear *

input time wt
0  518
.25    2
.5    3
1   15
1.5    1
2   23
3   10
3.5    1
4   11
5   13
6    9
7    3
8   19
20   10
45    9
end

mean time [fw=wt]

Mean estimation                     Number of obs    =     647

--------------------------------------------------------------
|       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
time |   1.605873   .2343624      1.145669    2.066077
--------------------------------------------------------------

______________________________________________
Kieran McCaul MPH PhD
WA Centre for Health & Ageing (M573)
University of Western Australia
Level 6, Ainslie House
48 Murray St
Perth 6000
Phone: (08) 9224-2701
Fax: (08) 9224 8009
email: [email protected]

______________________________________________
If you live to be one hundred, you've got it made.
Very few people die past that age - George Burns

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Jason Ferris
Sent: Wednesday, 25 November 2009 10:07 AM
To: [email protected]
Subject: st: highly skewed, highly zeroed data

Hi,
I have tried to find my answer in the statalist repository but nothing
has quite hit the mark.

I would like to calculate a mean and 95% CI of this data - which is
highly skewed and the majority are zeros.

I am aware of adding a constant and the transforming on the log scale
(with antilog) for interpretation.  However after adding a constant to
overcome the zero issue and then transforming on the log scale I am
still left with a highly skewed distribution.  Which gets me no close to
a mean and CI.

PS. As this is survey data I would be most keen for the 'right' answer
to be addressed in svy: terms

Jason

time (hrs) |      Freq.     Percent        Cum.
------------+-----------------------------------
0 |        518       80.06       80.06
.25 |          2        0.31       80.37
.5 |          3        0.46       80.83
1 |         15        2.32       83.15
1.5 |          1        0.15       83.31
2 |         23        3.55       86.86
3 |         10        1.55       88.41
3.5 |          1        0.15       88.56
4 |         11        1.70       90.26
5 |         13        2.01       92.27
6 |          9        1.39       93.66
7 |          3        0.46       94.13
8 |         19        2.94       97.06
20 |         10        1.55       98.61
45 |          9        1.39      100.00
------------+-----------------------------------

------------------------------------------
DISCLAIMER: This message (including any attachments) is intended solely
for the addressee(s) named and may contain confidential or privileged
information.
If you are not the intended recipient, please delete it and notify the
sender.
Views expressed in this message are those of the individual sender,and
are not necessarily the views of the Turning Point Alcohol and Drug
Centre (ABN: 68 223 819 017).

<a href="http://www.turningpoint.org.au";>Turning Point Alcohol and Drug
Centre</a>

Although this message and any attachments have been scanned for viruses
by 'Trend Micro InterScan' at the time of sending, you are advised to
rescan on receipt.

The whole or parts of this email may be subject to copyright of Turning
Point Alcohol and Drug Centre (ABN: 68 223 819 017), and/or third
parties.
You can only re-transmit, distribute or use the material if you are
authorised to do so.

Please consider the environment before printing this email or
attachments.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```