Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Yeo-Johnson Power Transformation


From   Rajiv Sabherwal <[email protected]>
To   [email protected]
Subject   Re: st: RE: Yeo-Johnson Power Transformation
Date   Tue, 23 Jan 2007 12:55:41 -0600

Kit, Nick, and others:

Thanks for your suggestions as well as patience. I am a complete novice with Stata (and with this arena of statistics -- my earlier work has mainly been based on SEM and case studies), and please excuse my ignorance.

Based on how the percentages are computed [100*(x-y)/(w+x+y)], Yeo- Johnson transformation does seem appropriate.

I follow most of what Kit suggested. Thanks again. However, based on the Weisberg paper on Yeo-Johnson transformation (www.stat.umn.edu/ arc/yjpower.pdf), I have a different interpretation on four aspects.

1. I believe I should be using 2-`theta' instead of 2*`theta' at both places toward the end of the code you suggested.
2. I believe Equation 2 on page 1 of the above PDF file is the one being modeled. This includes two possibilities for y<0, one when lambda <> 2 (I believe this captured in the line two above else in your suggested code), and the other when lambda = 2 (which I am don't think is captured).
3. There should be a negative sign prior to ( ( (abs($ML_y1)+1)^(2- `theta')-1)/(2-`theta' )
4. In the line after else, I believe there should be a +1 within parentheses.

Assuming I am right on the above points, should the last block of code be as follows?

qui gen double `yt' = .
if `diffL'> 1e-10 {
qui replace `yt' =( ( ($ML_y1+1)^`theta'-1)/ `theta' ) if $ML_y1 >= 0
qui replace `yt' = -( ( (abs($ML_y1)+1)^(2-`theta')-1)/(2-`theta' ) if ($ML_y1 < 0 and `diffL' <>2)
qui replace `yt' = -ln((abs($ML_y1)+1) if ($ML_y1 < 0 and `diffL' =2)
}
else {
qui replace `yt' = ln( $ML_y1+1 )
}

Please advise.

Thanks, and best wishes,

Rajiv

On Jan 21, 2007, at 12:20 PM, Nick Cox wrote:


Kit Baum has already replied on the assumption that
you want to estimate the parameter in this transformation
by maximum likelihood, in which case his advice is in
effect change the parts of -boxcox- that do not apply
to this transformation until they do apply.

On a quite different point: I think the assumption in
your post is dubious. Given the percent flavour, you
may need a generalisation of the logit-and-folded-power
family, not a generalisation of power-and-logarithm family.
But that depends on your data generation process.

Nick
[email protected]

Rajiv Sabherwal

How can I perform Yeo-Johnson Power transformation in STATA? It is
similar to Box-Cox transformation, but can be used with negative
variables as well, unlike Box-Cox transformation which can only be
used for positive variables. Please see
www.stat.umn.edu/arc/yjpower.pdf and
rweb.stat.umn.edu/R/library/alr3/html/powtran.html.
My dependent variable is a percentage that varies from -100 to +100,
and hence Box-Cox transformation would be inappropriate, but Yeo-
Johnson Power transformation would be perfect.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index