# st: RE: Rescaling

 From "Nick Cox" To Subject st: RE: Rescaling Date Thu, 12 Aug 2004 13:07:15 +0100

```I think this needs a precise definition of rescaling.

If you rescale variables using a linear re-expression

x -> ax + b

where a might be 1 and b might be 0, then lots of things
remain unchanged, or themselves predictable from this
rescaling, so a and b might as well as chosen
as a matter of convenience. Your definition of rescaling
appears elastic enough to include

x -> log(x + b)

(which I wouldn't call rescaling, but no matter):
this re-expression is clearly not linear, so model results
may differ substantially.

More importantly, you need a rationale for the transformation.
I don't think that log(x + b) ever really fixes problems with negative
x, but nor have I an alternative suggestion.

Nick
n.j.cox@durham.ac.uk

Cordula Stolberg

> I (again) have a question about rescaling. I have a panel
> data set in which
> two variables (the dependent variable & one independent variable) are
> expressed thousands of dollars and the other independent
> variables are all
> index numbers or percentages. As I'm taking logs but had
> several negative
> numbers, I rescaled the whole dataset by adding a value to
> all variables
> such that the biggest negative value equals 1. The problem
> there was that I
> had to add a very large value (over 23367 thousand) to all
> variables, which
> meant that after taking logs all variables were nearly the
> same, as the
> index numbers were mostly below 1. This meant then that I
> could not test
> for the endogeneity of one of the variables due to
> collinearity problems.
>
> What I then did was to go back to the original dataset and
> converted the
> variables expressed in thousands into millions. This also
> meant that the
> biggest negative value occurred in a different variable and I
> only had to
> add a value of 45 to each variable to get positive values for
> all variables
> (in order to log them). When I did the regression then, I got
> different
> results. In particular, the suspected endogenous variable became
> insignificant which kind of made the endogeneity test
> redundant (I did it
> anyhow, but the F-test for the overall regression became
> insignificant,
> which is not surprising I guess).
>
> My question is whether my second approach to the rescaling is
> ok to do or
> whether I cannot do it like that.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```