Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: normalize variables


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: normalize variables
Date   Sun, 11 Apr 2010 17:57:23 +0100

The word "normalize" here evidently means scale to a [0,1] range. 

Note first that using -egen- to do this is unnecessary unless you want
to do this panelwise. 

su x1, meanonly 
gen normal_x1 = (x1 - r(min)) / (r(max) - r(min)) 

If you want to do this panelwise, it does becomes convenient to use
-egen- as you say. 

What I don't understand is how your main question can be answered
without knowing why you want to do this and why you think that you
"must" normalize. The best answer I can offer is that your indexes will
vary depending on whether they calculated w.r.t. the entire dataset or
individual panels, and the choice between them is a scientific or
substantive one. 

Nick 
n.j.cox@durham.ac.uk 

Evangelos.Constantinou@warwick.ac.uk

I am using panel data analysis and I want to generate an index but first
I
must normalise the variables (x1,x2) contained in the index. I
normalised
them by the following set of commands:

egen min_x1=min(x1)
egen max_x1=max(x1)
gen normal_x1=(x1-min_x1)/(max_x1-min_x1)

So, my question is whether I need to transform the commands to include
the
"by(.)" option i.e.

egen min_x1=min(x1), by(.)
egen max_x1=max(x1), by(.)
gen normal_x1=(x1-min_x1)/(max_x1-min_x1)

and if so, should i include the panel or time variable.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index