# st: RE: dilemma of transformation to normal dist.

 From "Nick Cox" To Subject st: RE: dilemma of transformation to normal dist. Date Sat, 17 May 2003 16:16:26 +0100

```Yoshiro Nagao

> I want to do a multivariate regression analysis (of any sort)
> on a continuous dependent variable, which has, highly skewed,
>  somewhat log-normal distribution.
>
> To enable linear regression, this dependent
> was logarithm-transformed.  Since a considerable number
> of records have 0 value, however, they could not be
> log-transformed.  Setting an very small positive arbitrary value
> to these records would enable log-transformation.
> However, the size of this arbitrary value would
> affect the result of regression analysis.
>
> What is the best way to transform this variable to normal
> distribution?
> Or else, are there multivariate regression method
> which can be applied to variable with non-normal distribution?

I assume that you here talking about multiple regression,
not multivariate regression, as evidently you have
one response.

0. Regression works best given normality of errors.
The marginal distribution of each variable is not,
itself, an issue.

science, it is difficult to advise. Sometimes
a variable is really a composite of lots of zeros
and a skewed, possibly lognormal, set of positive
values and there are various models for such
cases.

2. log(response + fudge) is, as you say, at best
an awkward and arbitrary solution. Arguably,
it has long been superseded -- in problems to
which is applicable -- by using generalised linear

This last issue comes up about once every few months
on Statalist. See in the archives threads
started on 9 December 2002 and 4 September 2002.

There is a very good pedagogic paper

Lane, P.W. 2002. Generalized linear models
in soil science. European Journal of Soil
Science 53, 241-251

which focuses on the merits of transformation
as compared with generalized linear models.
I owe this reference to Stata user Allan Reese.

You may be able to access it electronically.
Don't be put off by the mention of soil science;
you don't need to understand anything much about
soil science. The examples are not intrusive.

Nick
n.j.cox@durham.ac.uk

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```