Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Normalize Variables by s.d. (programmatically)


From   Ryan Turner <rjturner@cmu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: Normalize Variables by s.d. (programmatically)
Date   Wed, 21 Dec 2011 15:46:13 -0500

Hi all,

I want to normalize each variable in my regressions by its own standard deviation.  Simple enough, but I have a lot of variables and a lot of regressions, so I would like to do this programmatically to simplify the code and avoid mistakes.  Further, the data have many missing records and most of the time the number of observations are different for each regression; therefore I need to calculate a variable's standard deviation over the subset of records included in that particular regression.

I have been searching for three days for a simple way to do this, and, finding none, have spent significant time writing a program "doreg" which takes a varlist to regress on, backs up that varlist, creates a rule to determine what records would be included in the regression (e.g. if !missing(varlist)), and for each variable in varlist, it calculates the s.d. given that rule and replaces the original variable with the normalized value (divide by standard deviation).  I more or less got it working, but the program is plagued by special cases; dummy variables, difference operators, wildcards.  When my program devolved into manually parsing wildcards I knew it was time to ask for help.

So, what is the proper way to accomplish my goal?  Do I just need to get my program working, or is there some other fundamentally better way to do it?  I have included my program doreg for reference but really I am looking for a higher level response.

Thanks,
--
Ryan J. Turner <rjturner@cmu.edu>




// Reference program; fails when passing a wildcard in the varlist

// Function to normalize beta by standard deviation:
capture program drop doreg
program define doreg
    syntax anything [, *]
    //local reg_list `anything'
    local bak_reg_list
    
    // backup each `item' in `reg_list' and return `bak_reg_list'
    foreach item in `anything' {
        // generate backed up name of `item'
        local bak_item "bak_`item'"
        
        // check if `item' exists
        capture confirm var `item'
        if _rc == 0 {  // `item' exists
            // check that `bak_item' is empty
            capture confirm var `bak_item'
            assert _rc != 0
            
            // move `item' to `bak_item'
            rename `item' `bak_item'
        }
        else {         // `item' does not exist OR ITEM CONTAINS WILDCARD
            // check that `item' has already been backed up
            capture confirm var `bak_item'
            assert _rc == 0
        }
        // we now know all variables have been backed up; none of 
        // `reg_list' exists now
        local bak_reg_list `bak_reg_list' `bak_item'
    }
    
    // generate test of what observations are included in regression
    local keep_list = "!missing(" + subinstr("`bak_reg_list'"," ",",",.) + ")"
    assert length("`keep_list'") < 244
        
    // generate normalized variables from `bak_reg_list'
    foreach bak_item of var `bak_reg_list' {
        // generate original item name
        local item = substr("`bak_item'",5,.)
        
        // Get s.d. of `bak_item' for observations included in the regression    
        quietly: summ `bak_item' if `keep_list'
                
        // divide item by its s.d. and store in new_item
        gen `item' = `bak_item' / r(sd)
        
        // Add new_item to the list of regression variables
        local reg_list `reg_list' `item'
    }
    
    // do the actual regression
    xtreg `reg_list', `options'
    
    // drop so that we don't accidently reuse this one-time varlist
    drop `reg_list'
end

eststo: doreg s_sem_grd_pts s_tot_time   , robust // success
eststo: doreg s_n_courses   s_tot_time   , robust // success
eststo: doreg s_sem_grd_pts s_tot_time app_avg_grade cohort curr_year_* major_*, robust // FAILS ON curr_year_*



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index