Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Yeo-Johnson Power Transformation


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Yeo-Johnson Power Transformation
Date   Wed, 19 Dec 2012 11:53:51 +0000

I mostly agree with Maarten.

A short answer to the question here -- is there an archived program or
are there postings -- is that if they exist you should be able to find
them with -search- or a search of the archives.

On the whole, this posting would be clearer with any or all of

1. Yeo-Johnson transformation: reference, formula, explanation.

2. URL for the "old Statalist thread".

3. Details of what you tried and in what sense it did not work.

In cases where the response can be negative, zero, or positive it is
my experience that

1. The sign has substantive (physical, biological, economic, ...)
meaning and it would not help to discard it by a translation (shift)
to all positive values.

2. There can be a mixture situation in which zeros and negatives arise
in specific circumstances. There can in particular be spikes at zero.
This is common, although far from universal, but when it happens it is
often decisive in guiding the modelling.

3. At best the distribution of the response is unimodal and
well-behaved in which case modelling in its terms is likely to be more
productive than thinking of a transformation parameter to be
estimated. (Such distributions on the entire real line seem in short
supply, but the Gumbel is one such.) If the parameter is a vector of
parameters, the point gains strength.

4. I am often more worried by outliers than by skewness, in which case
I might re-run analyses after a cube root or asinh transformation,
both of which pull in the tails while preserving sign.

I commented on Box-Cox in another old Statalist thread (*), so I need
not repeat that here.

"Cox", here and always deserves an upper-case "C".

Nick

(*) See how needlessly vague such references are? Here is the reference

http://www.stata.com/statalist/archive/2006-04/msg00471.html

On Wed, Dec 19, 2012 at 9:47 AM, Maarten Buis <maartenlbuis@gmail.com> wrote:
> On Wed, Dec 19, 2012 at 4:27 AM, Daniels, Joseph  wrote:
>> I need to use the Yeo-Johnson transformation as my data contains zero and negative values along with positive values. I used an old Statalist thread to try copy and edit the boxcox.ado and boxco_l.ado files with no success. Are ado files for Yeo-Johnson archived anywhere or are there any recent instructions on how to edit the Boxcox files?
>
> I am usually not convinced by such methods. It is too often used to
> avoid making hard decisions. In the end, the choice of model must be
> made by the investigator and no computer program can take that
> responsibility away. Moreover, I would not use a non-linear transform
> the variables unless I really really really have to. Often using link
> functions in a -glm- context is much better, as that way you can have
> the transformation and stick to effects in the original metric of the
> variables.
>
> If I would still want to implement this, I would look at the
> boxcox.ado and boxco_l.ado files, but I would not start with them. I
> would start with recreating a maximum likelihood estimator for linear
> regression in a .do file without any bells and wistles, and step by
> step move up from there. I would be continuously looking things up in
> <http://www.stata.com/bookstore/maximum-likelihood-estimation-stata/>.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index