There is some confusion here. There is
a difference between
(a) converting to a scale on which mean is 0 and sd 1
and
(b) transforming to Gaussian (a.k.a. normal)
which is unaffected by the fact that people
often want both.
I can use -egen, std()- for (a) but
this is just a linear rescaling and has no
effect of the degree of non-Gaussianity in
the data. So, for example, skewness and
kurtosis are invariant under linear rescalings.
Thus suppose I have a variable which is
42 most of the time and 3.14159 the rest
of the time. In this example, two spikes will
remain two spikes under any one-to-one mapping
and a bell-shape will remain out of reach, indeed
out of sight.
Your own distribution does not sound so refractory,
but be aware of the principle that transformation
often fails.
Nick
n.j.cox@durham.ac.uk
suryadipta.roy@lawrence.edu
> I had another question on rescaling variables. At this
> point, I am trying to convert my some of my independent
> variables (running from 0 to 6) to standard normal
> variables. I have been able to do it by subtracting the
> numbers from the mean and dividing by the standard
> deviation for each group (I have an unbalanced panel of
> countries), but the problem comes for countries which had
> no within variation for that variable, since dividing by
> zero is giving missing observations in such cases. The
> following command does not work:
>
> by code, sort: egen newvar=std(oldvar), since egen does
> not work with by.
>
> Can anyone point me towards the correct command for direct
> conversion of numbers into their standard normals thereby
> circumventing the problem of dividing by zeros? Thanks as
> usual!
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/