Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: a program to make dummy variables


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: a program to make dummy variables
Date   Thu, 9 Sep 2004 12:07:35 +0100

Sometimes it is best to work out how one would 
solve a problem oneself before looking at 
why someone's solution doesn't work. 

I can foresee lots of difficulties which such a program 
should tackle: the value label might not exist, 
it might not be suitable as a variable name, the 
putative name might already be in use, etc. 

The second difficulty is to be seen in your example: 
"Native Indian" certainly won't qualify. A work-around
is to try replacing spaces by underscores. 

A trial program with some error traps might then be: 

------------------------------------ mydummies 
*! 1.0.0 NJC 9 Sept 2004 
program mydummies, rclass 
	version 8.2  
	syntax varname(numeric) [if] [in] 

	marksample touse 
	qui count if `touse' 
	if r(N) == 0 error 2000 

	local label : value label `varlist' 
	if "`label'" == "" { 
		di as err "`varlist' not labelled"
		exit 182
	}

	qui levels `varlist' if `touse', local(levels) 

	// test the names: exit if problem  
	foreach l of local levels { 
		local name : label `label' `l' 
		local name : subinstr local name " " "_", all 
		confirm new var `name' 
		local names "`names' `name'" 
	} 	

	// generate the variables 
	local i = 1 
	qui foreach l of local levels { 
		local name : word `i++' of `names' 
		gen `name' = `varlist' == `l' if `touse' 
	} 

	return local varlist "`names'" 
end 		
-----------------------

This leaves the question of what's wrong with 
your program, apart from the lack of any error 
trapping. 

I note your assumption that values in your 
labelled variable run over the integers 1 up. 
This no doubt is fine for your applications, but 
lacks generality. 

Here is your program again. I have removed the 
comments to save space 

program define my_dummy
 	version 8
 
	tempvar max1
	egen `max1'=rmax(`1')
	tempvar max2
	egen `max2'=max(`max1')
	local maxval=`max2'
 
	forvalues i = 1/`maxval' {
		egen resp`i' = eqany(`1'), v(`i')
	}
 
	tokenize `1'
	local j = 1
	forvalues i = 1/`maxval' {
		local labval`j' : label `1' `i'
		local j = `j' + 1
	}
 
	local i 1
	local j 1
	while `i' == `j' & `i' <= `maxval' {
		rename resp`i' `labval`j''
		local i = `i' + 1
		local j = `j' + 1
	}
end

The first thing to note is the lack of any 
-syntax- statement. That is not illegal, but 
it means that, in homespun terms, your  
door is wide open and anything can walk in. 
There seem to be ambitions here of being 
able to tackle several variables at once; 
I'd rather solve the case of one variable 
first, knowing that I can always loop over 
variables with -foreach-. 

Then you use -egen- to generate a variable
to hold the maximum of the variable supplied. 
You can do that do that directly with -summarize- 
and avoid the extra variable. Similarly, 
-egen, eqany()- is an awkward beast which you 
don't need for getting a dummy when -generate- 
will do it directly, and much faster. 

Also, you are assuming 
that there are no variables in the dataset 
called resp1, resp2, etc. That strictly 
calls for temporary variables. 

Putting those together, your program becomes: 

program define my_dummy
 	version 8

	// cleaned up a bit from here on 
	syntax varname(numeric) 

	su `varlist', meanonly 
	local maxval = r(max)  
 
	forvalues i = 1/`maxval' {
		tempvar dummy
		gen `dummy' = `varlist' == `i'
		local dummies "`dummies' `dummy'" 
	}
 
	// not yet touched 
	tokenize `1'
	local j = 1
	forvalues i = 1/`maxval' {
		local labval`j' : label `1' `i'
		local j = `j' + 1
	}
 
	local i 1
	local j 1
	while `i' == `j' & `i' <= `maxval' {
		rename resp`i' `labval`j''
		local i = `i' + 1
		local j = `j' + 1
	}
end

Turning now to the remainder, "not yet touched", 
I see more loops than seem necessary. The code seems to boil down 
to 

	forvalues i = 1/`maxval' {
		local labval : label `varlist' `i'
		local dummy : word `i' of `dummies' 
 		rename `dummy' `labval'
	}

I can't however see why you get the bizarre one-letter 
names. Perhaps someone else can illuminate. 

Nick 
n.j.cox@durham.ac.uk 

Lim, Nelson
 
> I am trying to create dummies variables from a categorical 
> variable and
> want to have value labels of the categorical variable to be 
> the names of
> the dummy variables.
> 
> For example, I have a variable called race_n:
> 
>       Numeric |
>    version of |
>          race |      Freq.     Percent        Cum.
> --------------+-----------------------------------
>         Asian |      1,692        3.19        3.19
>         White |     41,311       77.90       81.09
>      Hispanic |      2,237        4.22       85.30
>         Black |      6,770       12.77       98.07
> Native Indian |        272        0.51       98.58
>         Other |        752        1.42      100.00
> --------------+-----------------------------------
>         Total |     53,034      100.00
> 
> I want to create 6 dummies whose names are the value labels of race_n.
> For example, I would like to have the first dummy variable to 
> be called Asian. 
> 
> I wrote a program called my_dummy. It seems to work, but when 
> I describe
> the data, I get the following. The dummies only take the 
> first letter of the variable.
> 
> . describe
> 
> --------------------------------------------------------------
> ----------
> ----              storage  display     value
> variable name   type   format      label      variable label
> --------------------------------------------------------------
> ----------
> ----
> A               byte   %8.0g                  race_n == 1
> C               byte   %8.0g                  race_n == 2
> H               byte   %8.0g                  race_n == 3
> N               byte   %8.0g                  race_n == 4
> T               byte   %8.0g                  race_n == 5
> X               byte   %8.0g                  race_n == 6
> --------------------------------------------------------------
> ----------
> ----
>  
> 
> /* beginning of the program */
> program define my_dummy
> 
> version 8
> 
> /* computing the maximum value of the variable */
> tempvar max1
> egen `max1'=rmax(`1')
> tempvar max2
> egen `max2'=max(`max1')
> local maxval=`max2'
> 
> /* generating the set of dummy variables */
> forvalues i = 1/`maxval' {
> egen resp`i' = eqany(`1'), v(`i')
> }
> 
> 
> /* naming the value labels of the original variable */
>  */ to the dummy variables
> 
> tokenize `1'
> local j = 1
> forvalues i = 1/`maxval' {
> local labval`j' : label `1' `i'
> local j = `j' + 1
> }
> 
> local i 1
> local j 1
> while `i' == `j' & `i' <= `maxval' {
> rename resp`i' `labval`j''
> local i = `i' + 1
> local j = `j' + 1
> }
> 
> 
> end
> 
> my_dummy race_n
> 
> 
> . describe
> 
> --------------------------------------------------------------
> ----------
> ----              storage  display     value
> variable name   type   format      label      variable label
> --------------------------------------------------------------
> ----------
> ----
> A               byte   %8.0g                  race_n == 1
> C               byte   %8.0g                  race_n == 2
> H               byte   %8.0g                  race_n == 3
> N               byte   %8.0g                  race_n == 4
> T               byte   %8.0g                  race_n == 5
> X               byte   %8.0g                  race_n == 6
> --------------------------------------------------------------

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index