Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: normalize variables

From   "Nick Cox" <>
To   <>
Subject   st: RE: normalize variables
Date   Sun, 11 Apr 2010 17:57:23 +0100

The word "normalize" here evidently means scale to a [0,1] range. 

Note first that using -egen- to do this is unnecessary unless you want
to do this panelwise. 

su x1, meanonly 
gen normal_x1 = (x1 - r(min)) / (r(max) - r(min)) 

If you want to do this panelwise, it does becomes convenient to use
-egen- as you say. 

What I don't understand is how your main question can be answered
without knowing why you want to do this and why you think that you
"must" normalize. The best answer I can offer is that your indexes will
vary depending on whether they calculated w.r.t. the entire dataset or
individual panels, and the choice between them is a scientific or
substantive one. 


I am using panel data analysis and I want to generate an index but first
must normalise the variables (x1,x2) contained in the index. I
them by the following set of commands:

egen min_x1=min(x1)
egen max_x1=max(x1)
gen normal_x1=(x1-min_x1)/(max_x1-min_x1)

So, my question is whether I need to transform the commands to include
"by(.)" option i.e.

egen min_x1=min(x1), by(.)
egen max_x1=max(x1), by(.)
gen normal_x1=(x1-min_x1)/(max_x1-min_x1)

and if so, should i include the panel or time variable.

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index