[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Rescaling |

Date |
Thu, 12 Aug 2004 13:07:15 +0100 |

I think this needs a precise definition of rescaling. If you rescale variables using a linear re-expression x -> ax + b where a might be 1 and b might be 0, then lots of things remain unchanged, or themselves predictable from this rescaling, so a and b might as well as chosen as a matter of convenience. Your definition of rescaling appears elastic enough to include x -> log(x + b) (which I wouldn't call rescaling, but no matter): this re-expression is clearly not linear, so model results may differ substantially. More importantly, you need a rationale for the transformation. I don't think that log(x + b) ever really fixes problems with negative x, but nor have I an alternative suggestion. Nick n.j.cox@durham.ac.uk Cordula Stolberg > I (again) have a question about rescaling. I have a panel > data set in which > two variables (the dependent variable & one independent variable) are > expressed thousands of dollars and the other independent > variables are all > index numbers or percentages. As I'm taking logs but had > several negative > numbers, I rescaled the whole dataset by adding a value to > all variables > such that the biggest negative value equals 1. The problem > there was that I > had to add a very large value (over 23367 thousand) to all > variables, which > meant that after taking logs all variables were nearly the > same, as the > index numbers were mostly below 1. This meant then that I > could not test > for the endogeneity of one of the variables due to > collinearity problems. > > What I then did was to go back to the original dataset and > converted the > variables expressed in thousands into millions. This also > meant that the > biggest negative value occurred in a different variable and I > only had to > add a value of 45 to each variable to get positive values for > all variables > (in order to log them). When I did the regression then, I got > different > results. In particular, the suspected endogenous variable became > insignificant which kind of made the endogeneity test > redundant (I did it > anyhow, but the F-test for the overall regression became > insignificant, > which is not surprising I guess). > > My question is whether my second approach to the rescaling is > ok to do or > whether I cannot do it like that. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: Syntax colouring (was Multiple Condition Statement)** - Next by Date:
**st: -onewayplot- update on SSC** - Previous by thread:
**RE: st: RE: Rescaling** - Next by thread:
**st: RE: RE: comparing mean** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |