# st: R: highly skewed, highly zeroed data

 From "Carlo Lazzaro" <[email protected]> To <[email protected]> Subject st: R: highly skewed, highly zeroed data Date Wed, 25 Nov 2009 09:21:56 +0100

```As an alternative to Kieran's hint, due to the positive skewness of his data
Jason may find useful to calculate the desired 95CI% by fitting a Gamma
distribution and drawing 10,000 random values from it (for two interesting
Briggs, A. and Nixon, R. and Dixon, S. and Thompson, S. (2005). Parametric
modelling of cost data: some simulation evidence. Health Economics 14(4):pp.
Briggs A, Sculpher M, Claxton K. Decision Modelling for Health Economic
Evaluation. Oxford: Oxford University Press, 2006: 77-120).

............................begin example.................................
input time wt
mean time [fweight = wt]
Mean estimation                     Number of obs    =     647

--------------------------------------------------------------
|       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
time |   1.605873   .2343624      1.145669    2.066077
--------------------------------------------------------------

set obs 10000
g Gamma=(.2343624^2/1.605873)*invgammap((1.605873/.2343624)^2, uniform())
sum Gamma
Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
Gamma |     10000    1.605746    .2343959   .8457972   2.601775

centile Gamma, centile (2.5 97.5)
-- Binom. Interp. --
Variable |     Obs  Percentile      Centile        [95% Conf. Interval]
-------------+-------------------------------------------------------------
Gamma |   10000        2.5      1.177285        1.170511    1.187588
|               97.5       2.09881        2.083514    2.114182
............................end example....................................

HTH and Kind Regards,
Carlo

-----Messaggio originale-----
Da: [email protected]
[mailto:[email protected]] Per conto di Jason Ferris
Inviato: mercoledì 25 novembre 2009 3.07
A: [email protected]
Oggetto: st: highly skewed, highly zeroed data

Hi,
I have tried to find my answer in the statalist repository but nothing
has quite hit the mark.

I would like to calculate a mean and 95% CI of this data - which is
highly skewed and the majority are zeros.

I am aware of adding a constant and the transforming on the log scale
(with antilog) for interpretation.  However after adding a constant to
overcome the zero issue and then transforming on the log scale I am
still left with a highly skewed distribution.  Which gets me no close to
a mean and CI.

PS. As this is survey data I would be most keen for the 'right' answer
to be addressed in svy: terms

Jason

time (hrs) |      Freq.     Percent        Cum.
------------+-----------------------------------
0 |        518       80.06       80.06
.25 |          2        0.31       80.37
.5 |          3        0.46       80.83
1 |         15        2.32       83.15
1.5 |          1        0.15       83.31
2 |         23        3.55       86.86
3 |         10        1.55       88.41
3.5 |          1        0.15       88.56
4 |         11        1.70       90.26
5 |         13        2.01       92.27
6 |          9        1.39       93.66
7 |          3        0.46       94.13
8 |         19        2.94       97.06
20 |         10        1.55       98.61
45 |          9        1.39      100.00
------------+-----------------------------------

------------------------------------------
DISCLAIMER: This message (including any attachments) is intended solely for
the addressee(s) named and may contain confidential or privileged
information.
If you are not the intended recipient, please delete it and notify the
sender.
Views expressed in this message are those of the individual sender,and are
not necessarily the views of the Turning Point Alcohol and Drug Centre (ABN:
68 223 819 017).

<a href="http://www.turningpoint.org.au";>Turning Point Alcohol and Drug
Centre</a>

Although this message and any attachments have been scanned for viruses by
'Trend Micro InterScan' at the time of sending, you are advised to rescan on
receipt.

The whole or parts of this email may be subject to copyright of Turning
Point Alcohol and Drug Centre (ABN: 68 223 819 017), and/or third parties.
You can only re-transmit, distribute or use the material if you are
authorised to do so.

Please consider the environment before printing this email or attachments.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```