Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Rescaling variables


From   <suryadipta.roy@lawrence.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: Rescaling variables
Date   Tue, 08 Aug 2006 18:42:22 -0500

Nick,

The following is what I am trying to do and the following are my concerns:

1. I have using two separate corruption indices in my regressions, one going from 0 to 6 and the other going from 0 to 10 for a cross-section of countries over time. In my regressions using these two separately as independent variables, I found that the latter series possesses more of between and within standard deviation than the former and (hence?) giving higher velues of t/z (Cross country corruption data tend to be very persistent for a country over time but tends to vary more across countries in a given year).

2. Consequently, I was thinking of converting both the series to one(s) where they would have the same standard deviations, ie. one- hence the idea of converting them to standard normal variables, i.e. option (a).

(Kit Baum had earlier suggested the following commands to do this:

ssc install center

bys country: center variable, gen(newvariable) standardize
replace newvariable=0 if newvariable==.)

I understand the problem with transformations, but my idea was to make the two series comparable. Please let me know if I am wrong in doing this.

Thanks!
Suryadipta.


On Tue, 8 Aug 2006 23:50:26 +0100
"Nick Cox" <n.j.cox@durham.ac.uk> wrote:

There is some confusion here. There is a difference between
(a) converting to a scale on which mean is 0 and sd 1
and
(b) transforming to Gaussian (a.k.a. normal)

which is unaffected by the fact that people often want both.
I can use -egen, std()- for (a) but this is just a linear rescaling and has no effect of the degree of non-Gaussianity in
the data. So, for example, skewness and kurtosis are invariant under linear rescalings.
Thus suppose I have a variable which is 42 most of the time and 3.14159 the rest of the time. In this example, two spikes will remain two spikes under any one-to-one mapping and a bell-shape will remain out of reach, indeed
out of sight.
Your own distribution does not sound so refractory, but be aware of the principle that transformation
often fails.
Nick n.j.cox@durham.ac.uk
suryadipta.roy@lawrence.edu


I had another question on rescaling variables. At this point, I am trying to convert my some of my independent variables (running from 0 to 6) to standard normal variables. I have been able to do it by subtracting the numbers from the mean and dividing by the standard deviation for each group (I have an unbalanced panel of countries), but the problem comes for countries which had no within variation for that variable, since dividing by zero is giving missing observations in such cases. The following command does not work:

by code, sort: egen newvar=std(oldvar), since egen does not work with by.

Can anyone point me towards the correct command for direct conversion of numbers into their standard normals thereby circumventing the problem of dividing by zeros? Thanks as usual!
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Suryadipta Roy, Ph.D.
Visiting Assistant Professor,
Department of Economics,
Lawrence University,
115 South Drew Street,
Appleton, WI-54911.
Phone: 920-832-7343.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index