Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: splitting columns


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: splitting columns
Date   Thu, 28 Nov 2002 10:16:57 -0000

Katsuhide Isa  wrote

> I have a survey data which has the following structure.
> Each column contains the result of a survey question
> that allows MA(multiple answer).
>
> * data1
> Q1 Q2 Q3 Q4
> 1 1 1
> 2 2 2
> 3 3 12
> 12 4 1
> 13 5 1
> 23 12 2
> 123 24 1
> 13 14 2
> 23 125 12
> 2 234 1
>
> In the example above, Q1 has three choices,
> Q2 five choices, and Q3 two choice.
> But, it should have been typed as below:
>
> * data2
> Q1_1 Q1_2 Q1_3 Q2_1 ...
> 1 0 0 1
> 0 1 0 0
> 0 0 1 0
> 1 1 0 0
> 1 0 1 0
> 0 1 1 1
> 1 1 1 0
> 1 0 1 1
> 0 1 1 1
> 0 1 0 0

> Q1_1 indecates the 1st choice of Q1, Q1_2 the 2nd
> choice, and so on.
> That is, I'd like each variable to take on one if a
> relevant choice is selected.
>
> The question is, is it possible to convert the data1 into
> data2 from within Stata?
> One immediate solution would be to create new
> variables using -generate- as below (in the case of
> three choices):
>
> gen Q1_1 = 1 if Q1 == 1 | Q1 == 12 | Q1 == 13 | Q1 == 123
> gen Q1_2 = 1 if Q1 == 2 | Q1 == 12 | Q1 == 23 | Q1 == 123
> gen Q1_3 = 1 if Q1 == 3 | Q1 == 13 | Q1 == 23 | Q1 == 123
>
> But this method is rather troublesome(the maximum
> number of choices is seven, and there are as many as
> more than 80 questions in the original data!) and easy
> to mistype.
> I'd like to know if there is some better way to do the
> same thing.
>

Ulrich Kohler
> 
> My strategy would be to generate a string-variable and use the 
> string-function -index(s1,s2)-. The following might work 
> (not tested):
> 
> 
> foreach var of varlist Q1 Q2 Q3 {           /* Starts a 
> loop over vars */
>   gen str1 `var's= ""                       /* empty string var */
>   replace `var's = string(`var')            /* Fill string 
> with contents */ 
>   forval i=1/7 {   
>     gen `var'_`i' = index(`var's,"1") ~= 0  /* var=1 if val 
> exists, else 0 */
>   }
> }
> 
> (This works already for seven choices. You might replace Q1 
> Q2 Q3 with the 
> names of the 80 variables.)
> 

This would be my approach too. It can be corrected for a typo 
and condensed a little further. 

foreach var of varlist Q1 Q2 Q3 {           
	forval i=1/7 {   
		 gen `var'_`i' = index(string(`var'),"`i'") ~= 0  
	}
}

That is, you don't need to create the string version 
of each variable: you can do the search on the fly. 

A further extension would be something like this: 

foreach var of varlist Q* {           
	forval i=1/7 {   
		capture assert index(string(`var'),"`i'") == 0 
		if _rc { 
			gen `var'_`i' = index(string(`var'),"`i'") ~= 0  
		} 
	}
}

Suppose on Q42, for example, there are only 
5 choices, 1 2 3 4 5, so nobody answers 6 or 7, 
and we don't need an indicator variable for 
either 6 or 7. When we get to 

	assert index(string(Q42),"6") == 0 

this assertion will be true of all the data 
and the return code eaten by -assert- will 
be 0. But any assertion which is false for 
some of the data will result in a non-zero 
return code and generation of a new variable. 
On the other hand, this approach will zap 
variables for choices which were possible 
but happen to have been chosen by none 
of the sample. 

For more, see the manual entries on -assert- 
and -capture-. 

Nick 
n.j.cox@durham.ac.uk 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index