Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: dilemma of transformation to normal dist.


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: dilemma of transformation to normal dist.
Date   Sat, 17 May 2003 16:16:26 +0100

Yoshiro Nagao

> I want to do a multivariate regression analysis (of any sort)
> on a continuous dependent variable, which has, highly skewed, 
>  somewhat log-normal distribution.
> 
> To enable linear regression, this dependent 
> was logarithm-transformed.  Since a considerable number
> of records have 0 value, however, they could not be
> log-transformed.  Setting an very small positive arbitrary value
> to these records would enable log-transformation.
> However, the size of this arbitrary value would
> affect the result of regression analysis.
> 
> What is the best way to transform this variable to normal 
> distribution?
> Or else, are there multivariate regression method
> which can be applied to variable with non-normal distribution?

I assume that you here talking about multiple regression, 
not multivariate regression, as evidently you have 
one response. 

Three short answers: 

0. Regression works best given normality of errors. 
The marginal distribution of each variable is not, 
itself, an issue. 

1. Without knowing anything about this variable, 
or your problem, or therefore about the underlying 
science, it is difficult to advise. Sometimes 
a variable is really a composite of lots of zeros 
and a skewed, possibly lognormal, set of positive 
values and there are various models for such 
cases. 

2. log(response + fudge) is, as you say, at best
an awkward and arbitrary solution. Arguably, 
it has long been superseded -- in problems to 
which is applicable -- by using generalised linear
models with log link. 

This last issue comes up about once every few months 
on Statalist. See in the archives threads 
started on 9 December 2002 and 4 September 2002. 

There is a very good pedagogic paper 

Lane, P.W. 2002. Generalized linear models
in soil science. European Journal of Soil 
Science 53, 241-251

which focuses on the merits of transformation 
as compared with generalized linear models. 
I owe this reference to Stata user Allan Reese. 

You may be able to access it electronically. 
Don't be put off by the mention of soil science; 
you don't need to understand anything much about 
soil science. The examples are not intrusive. 


Nick 
n.j.cox@durham.ac.uk 


Nick 
n.j.cox@durham.ac.uk 


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index