[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: a program to make dummy variables |

Date |
Thu, 9 Sep 2004 12:07:35 +0100 |

Sometimes it is best to work out how one would solve a problem oneself before looking at why someone's solution doesn't work. I can foresee lots of difficulties which such a program should tackle: the value label might not exist, it might not be suitable as a variable name, the putative name might already be in use, etc. The second difficulty is to be seen in your example: "Native Indian" certainly won't qualify. A work-around is to try replacing spaces by underscores. A trial program with some error traps might then be: ------------------------------------ mydummies *! 1.0.0 NJC 9 Sept 2004 program mydummies, rclass version 8.2 syntax varname(numeric) [if] [in] marksample touse qui count if `touse' if r(N) == 0 error 2000 local label : value label `varlist' if "`label'" == "" { di as err "`varlist' not labelled" exit 182 } qui levels `varlist' if `touse', local(levels) // test the names: exit if problem foreach l of local levels { local name : label `label' `l' local name : subinstr local name " " "_", all confirm new var `name' local names "`names' `name'" } // generate the variables local i = 1 qui foreach l of local levels { local name : word `i++' of `names' gen `name' = `varlist' == `l' if `touse' } return local varlist "`names'" end ----------------------- This leaves the question of what's wrong with your program, apart from the lack of any error trapping. I note your assumption that values in your labelled variable run over the integers 1 up. This no doubt is fine for your applications, but lacks generality. Here is your program again. I have removed the comments to save space program define my_dummy version 8 tempvar max1 egen `max1'=rmax(`1') tempvar max2 egen `max2'=max(`max1') local maxval=`max2' forvalues i = 1/`maxval' { egen resp`i' = eqany(`1'), v(`i') } tokenize `1' local j = 1 forvalues i = 1/`maxval' { local labval`j' : label `1' `i' local j = `j' + 1 } local i 1 local j 1 while `i' == `j' & `i' <= `maxval' { rename resp`i' `labval`j'' local i = `i' + 1 local j = `j' + 1 } end The first thing to note is the lack of any -syntax- statement. That is not illegal, but it means that, in homespun terms, your door is wide open and anything can walk in. There seem to be ambitions here of being able to tackle several variables at once; I'd rather solve the case of one variable first, knowing that I can always loop over variables with -foreach-. Then you use -egen- to generate a variable to hold the maximum of the variable supplied. You can do that do that directly with -summarize- and avoid the extra variable. Similarly, -egen, eqany()- is an awkward beast which you don't need for getting a dummy when -generate- will do it directly, and much faster. Also, you are assuming that there are no variables in the dataset called resp1, resp2, etc. That strictly calls for temporary variables. Putting those together, your program becomes: program define my_dummy version 8 // cleaned up a bit from here on syntax varname(numeric) su `varlist', meanonly local maxval = r(max) forvalues i = 1/`maxval' { tempvar dummy gen `dummy' = `varlist' == `i' local dummies "`dummies' `dummy'" } // not yet touched tokenize `1' local j = 1 forvalues i = 1/`maxval' { local labval`j' : label `1' `i' local j = `j' + 1 } local i 1 local j 1 while `i' == `j' & `i' <= `maxval' { rename resp`i' `labval`j'' local i = `i' + 1 local j = `j' + 1 } end Turning now to the remainder, "not yet touched", I see more loops than seem necessary. The code seems to boil down to forvalues i = 1/`maxval' { local labval : label `varlist' `i' local dummy : word `i' of `dummies' rename `dummy' `labval' } I can't however see why you get the bizarre one-letter names. Perhaps someone else can illuminate. Nick n.j.cox@durham.ac.uk Lim, Nelson > I am trying to create dummies variables from a categorical > variable and > want to have value labels of the categorical variable to be > the names of > the dummy variables. > > For example, I have a variable called race_n: > > Numeric | > version of | > race | Freq. Percent Cum. > --------------+----------------------------------- > Asian | 1,692 3.19 3.19 > White | 41,311 77.90 81.09 > Hispanic | 2,237 4.22 85.30 > Black | 6,770 12.77 98.07 > Native Indian | 272 0.51 98.58 > Other | 752 1.42 100.00 > --------------+----------------------------------- > Total | 53,034 100.00 > > I want to create 6 dummies whose names are the value labels of race_n. > For example, I would like to have the first dummy variable to > be called Asian. > > I wrote a program called my_dummy. It seems to work, but when > I describe > the data, I get the following. The dummies only take the > first letter of the variable. > > . describe > > -------------------------------------------------------------- > ---------- > ---- storage display value > variable name type format label variable label > -------------------------------------------------------------- > ---------- > ---- > A byte %8.0g race_n == 1 > C byte %8.0g race_n == 2 > H byte %8.0g race_n == 3 > N byte %8.0g race_n == 4 > T byte %8.0g race_n == 5 > X byte %8.0g race_n == 6 > -------------------------------------------------------------- > ---------- > ---- > > > /* beginning of the program */ > program define my_dummy > > version 8 > > /* computing the maximum value of the variable */ > tempvar max1 > egen `max1'=rmax(`1') > tempvar max2 > egen `max2'=max(`max1') > local maxval=`max2' > > /* generating the set of dummy variables */ > forvalues i = 1/`maxval' { > egen resp`i' = eqany(`1'), v(`i') > } > > > /* naming the value labels of the original variable */ > */ to the dummy variables > > tokenize `1' > local j = 1 > forvalues i = 1/`maxval' { > local labval`j' : label `1' `i' > local j = `j' + 1 > } > > local i 1 > local j 1 > while `i' == `j' & `i' <= `maxval' { > rename resp`i' `labval`j'' > local i = `i' + 1 > local j = `j' + 1 > } > > > end > > my_dummy race_n > > > . describe > > -------------------------------------------------------------- > ---------- > ---- storage display value > variable name type format label variable label > -------------------------------------------------------------- > ---------- > ---- > A byte %8.0g race_n == 1 > C byte %8.0g race_n == 2 > H byte %8.0g race_n == 3 > N byte %8.0g race_n == 4 > T byte %8.0g race_n == 5 > X byte %8.0g race_n == 6 > -------------------------------------------------------------- * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: RE: Truncating data** - Next by Date:
**st: poisson-distribution** - Previous by thread:
**st: outtex and append** - Next by thread:
**st: poisson-distribution** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |