Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

re:Re: st: Normalize Variables by s.d. (programmatically)


From   Christopher Baum <kit.baum@bc.edu>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   re:Re: st: Normalize Variables by s.d. (programmatically)
Date   Thu, 22 Dec 2011 09:12:48 -0500

<>
Ryan said

Yes I found regress option beta in my searches; however, as noted I am using xtreg.  Further I would like only to divide by std but not subtract the mean, and I would like to leave dummies unnormalized.

The real problem with my program below is that I am trying to dynamically allocate backup variables for original values, and operate on the original variable names.  The reasoning for this is that I didn't want my regression output cluttered with temp variable tags e.g. std_`varname'.  However this made my program much more complicated because I couldn't rely on stata's native handling of varlists, wildcards, and difference operators, and I would always have to test the existence of variables.  A big mess.

I got it working quite nicely (as below) by simply generating temporary variables and using some sed magic to clean up my regression output.  However, the program given quickly reaches the string limit of 244 characters since I am fully expanding all wildcards.  By shortening my varnames (and relying on additional sed magic) I stay within the limit and am able to run my current regressions of interest, but the program is not robust.

I looked at -center-.  I don't think that is what I want unless there is something special in byable that I don't yet understand.  I think my real question is, how do I efficiently operate on the observations that would be included in a regression (e.g. that are not missing)?  I test if !missing(comma separated varlist).  The only other way I can think of is to run the regression twice; discard the first and use e(sample) for the second.  Is there a better way?

Thanks for your feedback.



Maybe something like this (does not deal with dummies, but that shouldn't be a difficult addition):

------------------
prog drop _all
prog stdize
syntax varlist(ts) [if] [in] [, *]
tsunab vl: `varlist'
marksample touse
tempvar rs
qui g `rs' = 0 if `touse'
foreach w of local vl {
	qui replace `rs' = `rs' + `w' if `touse'
}
preserve
foreach w of local vl {
	qui summ `w' if `touse' & !mi(`rs')
	tempvar ww
	qui g `ww' = `w' / r(sd) if `touse' & !mi(`rs')
	loc neww: subinstr loc w "." "_"
	rename `ww' _`neww'
	loc vl2 "`vl2' _`neww'"
}
xtreg `vl2', `options'
restore
end
---------------------

With webuse grunfeld, this handles something like

stdize invest D.mvalue L(1/2).kstock if company < 7, fe

with no problem.

Cheers
Kit

Kit Baum   |   Boston College Economics & DIW Berlin   |   http://ideas.repec.org/e/pba1.html
                             An Introduction to Stata Programming  |   http://www.stata-press.com/books/isp.html
  An Introduction to Modern Econometrics Using Stata  |   http://www.stata-press.com/books/imeus.html


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index