Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Rescaling

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: Rescaling
Date   Thu, 12 Aug 2004 13:07:15 +0100

I think this needs a precise definition of rescaling. 

If you rescale variables using a linear re-expression 

x -> ax + b 

where a might be 1 and b might be 0, then lots of things 
remain unchanged, or themselves predictable from this 
rescaling, so a and b might as well as chosen
as a matter of convenience. Your definition of rescaling
appears elastic enough to include 

x -> log(x + b) 

(which I wouldn't call rescaling, but no matter): 
this re-expression is clearly not linear, so model results 
may differ substantially.  

More importantly, you need a rationale for the transformation. 
I don't think that log(x + b) ever really fixes problems with negative 
x, but nor have I an alternative suggestion. 

[email protected] 

Cordula Stolberg
> I (again) have a question about rescaling. I have a panel 
> data set in which 
> two variables (the dependent variable & one independent variable) are 
> expressed thousands of dollars and the other independent 
> variables are all 
> index numbers or percentages. As I'm taking logs but had 
> several negative 
> numbers, I rescaled the whole dataset by adding a value to 
> all variables 
> such that the biggest negative value equals 1. The problem 
> there was that I 
> had to add a very large value (over 23367 thousand) to all 
> variables, which 
> meant that after taking logs all variables were nearly the 
> same, as the 
> index numbers were mostly below 1. This meant then that I 
> could not test 
> for the endogeneity of one of the variables due to 
> collinearity problems.
> What I then did was to go back to the original dataset and 
> converted the 
> variables expressed in thousands into millions. This also 
> meant that the 
> biggest negative value occurred in a different variable and I 
> only had to 
> add a value of 45 to each variable to get positive values for 
> all variables 
> (in order to log them). When I did the regression then, I got 
> different 
> results. In particular, the suspected endogenous variable became 
> insignificant which kind of made the endogeneity test 
> redundant (I did it 
> anyhow, but the F-test for the overall regression became 
> insignificant, 
> which is not surprising I guess).
> My question is whether my second approach to the rescaling is 
> ok to do or 
> whether I cannot do it like that.

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index