Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Log transform of skewed data


From   Roger Newson <roger.newson@kcl.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Log transform of skewed data
Date   Wed, 02 Jun 2004 21:56:18 +0100

At 14:53 02/06/04 -0400, you wrote:
I have data on the "cost" (actually tranformed hours) of various types of
caretaking for Alzheimers patients. I'm interested in a regression model to
test treatment effects in a multisite study. As is usual for cost data, it
is positively skewed. So, I contemplated a log transform, either through a
direct transformation of the response, or through a log link in a glm, gee,
or something similar. I actually am using "xt" commands to allow for
nonindependence among caretakers treated at the same site.

the problem is that the mode cost is $0, so that the distribution is
bimodal. This, of course, remains true if I do a lof transform. Any ideas on
how to analyze such data would be apreciated.
Log-transformed data can often be understood in terms of geometric means and their ratios. If in Stata you type

findit gmratio

then you should be taken to my website, where you can download my Stata Tip on the -eform- option of -regress- (Newson, 2003), which shows how to use this to calculate confidence intervals for geometric means and their ratios.

If there are zeros, however, then there is a problem, because the log of zero is not defined. In this case, you either have to transform the zeros to something else, or use arithmetic means instead of geometric means, with a log link function, in a glm or gee, usually using the -eform- option. The parameters will then be arithmetic means and their ratios, instead of geometric means and their ratios. Arithmetic means are still defined if the outcome is possibly zero, as is the case with loglinear modelling of count data, and the principle is the same with non-count data such as your caretaker-hours. The trick with the -noconst- option, mentioned in Newson (2003) may still be useful if you want a baseline arithmetic mean for a baseline patient.

Hope this helps.

Roger

References

Newson R. Stata tip 1: The eform() option of regress. The Stata Journal 2003; 3(4): 445.


--
Roger Newson
Lecturer in Medical Statistics
Department of Public Health Sciences
King's College London
5th Floor, Capital House
42 Weston Street
London SE1 3QD
United Kingdom

Tel: 020 7848 6648 International +44 20 7848 6648
Fax: 020 7848 6620 International +44 20 7848 6620
or 020 7848 6605 International +44 20 7848 6605
Email: roger.newson@kcl.ac.uk
Website: http://www.kcl-phs.org.uk/rogernewson

Opinions expressed are those of the author, not the institution.

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index