Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Creating dummy variables


From   Matthew White <mwhite@poverty-action.org>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Creating dummy variables
Date   Wed, 16 Nov 2011 11:28:15 -0500

I see... Maybe this:

***BEGIN***
clear
input fips1 fips2
1001 1073
1001 1021
1001 1101
1003 12031
1003 1099
end

levelsof fips1
foreach county1 in `r(levels)' {
	generate fips1_`county1' = fips1 == `county1'
	label variable fips1_`county1' "fips1==`county1'"
	
	levelsof fips2 if fips1 == `county1'
	foreach county2 in `r(levels)' {
		capture generate fips2_`county2' = fips2 == `county2'
		if !_rc label variable fips2_`county2' "fips2==`county2'"
		
		generate diff_`county1'_`county2' = fips1_`county1' - fips2_`county2'
		label variable diff_`county1'_`county2' "fips1_`county1' - fips2_`county2'"
	}
}

order fips1_* fips2_* diff*, after(fips2)
***END***

Best,
Matt

On Wed, Nov 16, 2011 at 10:52 AM, Michael Betz
<betz.40@buckeyemail.osu.edu> wrote:
> Thanks Matt,
>
> This is getting close but there is still a hang-up. The program you wrote differences all "fips1" dummies with all "fips2" dummies. I need to get difference dummies only for the pairs (i.e 1001-1073, 1001-1021, and 1001-1101, but not 1001-12031). Because I have 3,000 levels for each "fips" variable, this program would create 3,000 x 3,000 variables, which is where Stata runs into a problem.
>
> Mike
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Matthew White
> Sent: Wednesday, November 16, 2011 9:14 AM
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: Creating dummy variables
>
> Hi Mike,
>
> There's probably a more efficient way to do this, but here's one way:
>
> ***BEGIN***
> clear
> input fips1 fips2
> 1001 1073
> 1001 1021
> 1001 1101
> 1003 12031
> 1003 1099
> end
>
> forvalues i = 1/2 {
>        levelsof fips`i'
>        foreach county in `r(levels)' {
>                generate fips`i'_`county' = fips`i' == `county'
>                label variable fips`i'_`county' "fips`i'==`county'"
>
>                local dummies`i' `dummies`i'' fips`i'_`county'
>        }
> }
>
> foreach dummy1 of local dummies1 {
>        local num1 = substr("`dummy1'", strpos("`dummy1'", "_") + 1, .)
>
>        foreach dummy2 of local dummies2 {
>                local num2 = substr("`dummy2'", strpos("`dummy2'", "_") + 1, .)
>
>                generate diff_`num1'_`num2' = `dummy1' - `dummy2'
>                label variable diff_`num1'_`num2' "`dummy1' - `dummy2'"
>        }
> }
> ***END***
>
> Best,
> Matt
>
> On Tue, Nov 15, 2011 at 9:44 PM, Michael Betz
> <betz.40@buckeyemail.osu.edu> wrote:
>> Hi all,
>>
>> I have two categorical variables "fips1" and "fips2" that record the US county of the observation. For each "fips1" there are many "fips2" counties as below
>>
>> fips1   fips2
>> 1001    1073
>> 1001    1021
>> 1001    1101
>> 1003    12031
>> 1003    1099
>>
>> I need to create dummy variables for each county in "fips1" and "fips2" and then create variables representing the difference between the two dummy variables as below:
>>
>> fips1   fips2   dum1_1  dum1_2  dum2_1  dum2_2  dum2_3  dum2_4  1_1-2_1 1_1-2_2 1_1-2_3
>> 1001    1003    1               0               1               0               0               0               0               1               1
>> 1001    1021    1               0               0               1               0               0               1               0               1
>> 1001    1101    1               0               0               0               1               0               1               1               0
>> 1003    1021    0               1               0               1               0               0               0               0               0
>> 1003    1001    0               1               0               0               0               1               0               0               0
>>
>> One added constraint is that each of "fips1" and "fips2" creates 3,000 dummies, so Stata cannot hold variables representing the difference between all pairs of dummy variables. I need to only calculate the difference in dummies for the pairs that in the data (i.e. according to the example above I would not need the difference between the dummies for "fips1"=1001 and "fips2"=1001 because that pair doesn't exist in my data)
>>
>> I've been thinking all day trying to come up with a solution, but to no avail. I appreciate and help or suggestions.
>>
>> Thanks,
>> Mike
>>
>>
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
>
>
> --
> Matthew White
> Data Coordinator
> Innovations for Poverty Action
> 101 Whitney Avenue, New Haven, CT 06510 USA
> +1 434-305-9861
> www.poverty-action.org
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Matthew White
Data Coordinator
Innovations for Poverty Action
101 Whitney Avenue, New Haven, CT 06510 USA
+1 434-305-9861
www.poverty-action.org

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index