Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: Rescaling variables


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: Rescaling variables
Date   Wed, 9 Aug 2006 01:00:00 +0100

This is interesting, but does not much affect the 
point so far as I can see. On a terminological matter, 
many people refer to values with mean 0 and sd (or variance) 1
as standard scores or standardised values or some 
such, but referring to them as standard normal I take 
to be a severe abuse of terminology and indeed 
muddying the distinction I tried to make. Kit's 
suggestion is entirely consistent with this distinction, 
and all to do with linear scaling and nothing
to do with non-linear transformation. 

As you will know the crucial assumption 
concerning Gaussianity within regression is to do with 
error terms and not original variables. For that and other 
reasons my prejudice is that pursuit of Gaussianity 
is unlikely to be crucial here, but naturally I do not
know anything about your data or problem that you don't. 

As both corruption measures appear to have arbitrary 
units and to lack well-defined dimensionality I am not 
clear what problem is solved by rescaling, but if you 
find it simplifies things, so much the better. Scaling both 
measures to 0,1 range sounds more natural to me than 
standard scores, but that's just another prejudices. 

Nick 
n.j.cox@durham.ac.uk 

suryadipta.roy@lawrence.edu
 
> The following is what I am trying to do and the following 
> are my concerns:
> 
> 1. I have using two separate corruption indices in my 
> regressions, one going from 0 to 6 and the other going 
> from 0 to 10 for a cross-section of countries over time. 
> In my regressions using these two separately as 
> independent variables, I found that the latter series 
> possesses more of between and within standard deviation 
> than the former and (hence?) giving higher velues of t/z 
> (Cross country corruption data tend to be very persistent 
> for a country over time but tends to vary more across 
> countries in a given year).
> 
> 2. Consequently, I was thinking of converting both the 
> series to one(s) where they would have the same standard 
> deviations, ie. one- hence the idea of converting them to 
> standard normal variables, i.e. option (a).
> 
> (Kit Baum had earlier suggested the following commands to 
> do this:
> 
> ssc install center
> 
> bys country: center variable, gen(newvariable) standardize
> replace newvariable=0 if newvariable==.)
> 
> I understand the problem with transformations, but my idea 
> was to make the two series comparable. Please let me know 
> if I am wrong in doing this.
   
> On Tue, 8 Aug 2006 23:50:26 +0100
>   "Nick Cox" <n.j.cox@durham.ac.uk> wrote:
> > There is some confusion here. There is 
> > a difference between 
> > 
> > (a) converting to a scale on which mean is 0 and sd 1 
> > 
> > and 
> > 
> > (b) transforming to Gaussian (a.k.a. normal)
> > 
> > which is unaffected by the fact that people 
> > often want both. 
> > 
> > I can use -egen, std()- for (a) but 
> > this is just a linear rescaling and has no 
> > effect of the degree of non-Gaussianity in
> > the data. So, for example, skewness and 
> > kurtosis are invariant under linear rescalings. 
> > 
> > Thus suppose I have a variable which is 
> > 42 most of the time and 3.14159 the rest 
> > of the time. In this example, two spikes will 
> > remain two spikes under any one-to-one mapping 
> > and a bell-shape will remain out of reach, indeed
> > out of sight. 
> > 
> > Your own distribution does not sound so refractory, 
> > but be aware of the principle that transformation
> > often fails. 
> > 
> > Nick 
> > n.j.cox@durham.ac.uk 
> > 
> > suryadipta.roy@lawrence.edu
> > 
> >> I had another question on rescaling variables. At this 
> >> point, I am trying to convert my some of my independent 
> >> variables (running from 0 to 6) to standard normal 
> >> variables. I have been able to do it by subtracting the 
> >> numbers from the mean and dividing by the standard 
> >> deviation for each group (I have an unbalanced panel of 
> >> countries), but the problem comes for countries which 
> >>had 
> >> no within variation for that variable, since dividing by 
> >> zero is giving missing observations in such cases. The 
> >> following command does not work:
> >> 
> >> by code, sort: egen newvar=std(oldvar), since egen does 
> >> not work with by.
> >> 
> >> Can anyone point me towards the correct command for 
> >>direct 
> >> conversion of numbers into their standard normals 
> >>thereby 
> >> circumventing the problem of dividing by zeros? Thanks 
> >>as 
> >> usual!
> > 
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> 
> Suryadipta Roy, Ph.D.
> Visiting Assistant Professor,
> Department of Economics,
> Lawrence University,
> 115 South Drew Street,
> Appleton, WI-54911.
> Phone: 920-832-7343.
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index