Nick,
The following is what I am trying to do and the following
are my concerns:
1. I have using two separate corruption indices in my
regressions, one going from 0 to 6 and the other going
from 0 to 10 for a cross-section of countries over time.
In my regressions using these two separately as
independent variables, I found that the latter series
possesses more of between and within standard deviation
than the former and (hence?) giving higher velues of t/z
(Cross country corruption data tend to be very persistent
for a country over time but tends to vary more across
countries in a given year).
2. Consequently, I was thinking of converting both the
series to one(s) where they would have the same standard
deviations, ie. one- hence the idea of converting them to
standard normal variables, i.e. option (a).
(Kit Baum had earlier suggested the following commands to
do this:
ssc install center
bys country: center variable, gen(newvariable) standardize
replace newvariable=0 if newvariable==.)
I understand the problem with transformations, but my idea
was to make the two series comparable. Please let me know
if I am wrong in doing this.
Thanks!
Suryadipta.
On Tue, 8 Aug 2006 23:50:26 +0100
"Nick Cox" <n.j.cox@durham.ac.uk> wrote:
There is some confusion here. There is
a difference between
(a) converting to a scale on which mean is 0 and sd 1
and
(b) transforming to Gaussian (a.k.a. normal)
which is unaffected by the fact that people
often want both.
I can use -egen, std()- for (a) but
this is just a linear rescaling and has no
effect of the degree of non-Gaussianity in
the data. So, for example, skewness and
kurtosis are invariant under linear rescalings.
Thus suppose I have a variable which is
42 most of the time and 3.14159 the rest
of the time. In this example, two spikes will
remain two spikes under any one-to-one mapping
and a bell-shape will remain out of reach, indeed
out of sight.
Your own distribution does not sound so refractory,
but be aware of the principle that transformation
often fails.
Nick
n.j.cox@durham.ac.uk
suryadipta.roy@lawrence.edu
I had another question on rescaling variables. At this
point, I am trying to convert my some of my independent
variables (running from 0 to 6) to standard normal
variables. I have been able to do it by subtracting the
numbers from the mean and dividing by the standard
deviation for each group (I have an unbalanced panel of
countries), but the problem comes for countries which
had
no within variation for that variable, since dividing by
zero is giving missing observations in such cases. The
following command does not work:
by code, sort: egen newvar=std(oldvar), since egen does
not work with by.
Can anyone point me towards the correct command for
direct
conversion of numbers into their standard normals
thereby
circumventing the problem of dividing by zeros? Thanks
as
usual!
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
Suryadipta Roy, Ph.D.
Visiting Assistant Professor,
Department of Economics,
Lawrence University,
115 South Drew Street,
Appleton, WI-54911.
Phone: 920-832-7343.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/