Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Normalize Variables by s.d. (programmatically)

From   Ryan Turner <>
Subject   Re: st: Normalize Variables by s.d. (programmatically)
Date   Thu, 22 Dec 2011 08:36:23 -0500

On Dec 21, 2011, at 11:01 PM, Richard Williams wrote:

> At 04:49 PM 12/21/2011, Austin Nichols wrote:
>> Ryan Turner <>:
>> help regress
>> (see option beta) or
> That was my first impulse too, but if you wade through his code you see there is an -xtreg- command at the end of it, and xtreg doesn't have a beta option. (Ryan, regress can mean a lot of things, so it would be good to be explicit about what you mean at the beginning).
> I haven't worked my way through Ryan's code, but I wonder if Ben Jann's -center- command, available from SSC, could simplify the process.

Thanks Austin, Richard,

Yes I found regress option beta in my searches; however, as noted I am using xtreg.  Further I would like only to divide by std but not subtract the mean, and I would like to leave dummies unnormalized.

The real problem with my program below is that I am trying to dynamically allocate backup variables for original values, and operate on the original variable names.  The reasoning for this is that I didn't want my regression output cluttered with temp variable tags e.g. std_`varname'.  However this made my program much more complicated because I couldn't rely on stata's native handling of varlists, wildcards, and difference operators, and I would always have to test the existence of variables.  A big mess.

I got it working quite nicely (as below) by simply generating temporary variables and using some sed magic to clean up my regression output.  However, the program given quickly reaches the string limit of 244 characters since I am fully expanding all wildcards.  By shortening my varnames (and relying on additional sed magic) I stay within the limit and am able to run my current regressions of interest, but the program is not robust.

I looked at -center-.  I don't think that is what I want unless there is something special in byable that I don't yet understand.  I think my real question is, how do I efficiently operate on the observations that would be included in a regression (e.g. that are not missing)?  I test if !missing(comma separated varlist).  The only other way I can think of is to run the regression twice; discard the first and use e(sample) for the second.  Is there a better way?

Thanks for your feedback.

Ryan Turner

// divide regressors by std
capture program drop doreg
program define doreg
    syntax varlist [, *]
    local reg_list
    // expand wildcards and remove spaces
    foreach item of var `varlist' {
        local reg_list "`reg_list' `item'"
    //assert length("`reg_list'") <= 244
    // generate test of what observations are included in regression
    local include_list = "!missing(" + subinstr("`reg_list'"," ",",",.) + ")"
    assert length("`include_list'") <= 244

    local drop_list
    local dummies
    foreach item of var `reg_list' {
        if strpos("`item'","d_") == 1 {
            // add to dummies list
            local dummies "`dummies' `item'"
            assert length("`reg_list'") <= 244
            //display "`dummies'"
            // don't normalize dummies
        quietly: summ `item' if `include_list'
        gen std_`item' = `item' / r(sd)
        //gen std_`item' = `item'
        local drop_list "`drop_list' std_`item'"
    // do the actual regression
    xtreg `drop_list' `dummies', `options'
    // drop so that we don't accidently reuse this one-time varlist
    drop `drop_list'
    quietly: summ 

Ryan J. Turner <>

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index