# RE: st: splitting columns

 From "Nick Cox" To Subject RE: st: splitting columns Date Thu, 28 Nov 2002 10:16:57 -0000

```Katsuhide Isa  wrote

> I have a survey data which has the following structure.
> Each column contains the result of a survey question
>
> * data1
> Q1 Q2 Q3 Q4
> 1 1 1
> 2 2 2
> 3 3 12
> 12 4 1
> 13 5 1
> 23 12 2
> 123 24 1
> 13 14 2
> 23 125 12
> 2 234 1
>
> In the example above, Q1 has three choices,
> Q2 five choices, and Q3 two choice.
> But, it should have been typed as below:
>
> * data2
> Q1_1 Q1_2 Q1_3 Q2_1 ...
> 1 0 0 1
> 0 1 0 0
> 0 0 1 0
> 1 1 0 0
> 1 0 1 0
> 0 1 1 1
> 1 1 1 0
> 1 0 1 1
> 0 1 1 1
> 0 1 0 0

> Q1_1 indecates the 1st choice of Q1, Q1_2 the 2nd
> choice, and so on.
> That is, I'd like each variable to take on one if a
> relevant choice is selected.
>
> The question is, is it possible to convert the data1 into
> data2 from within Stata?
> One immediate solution would be to create new
> variables using -generate- as below (in the case of
> three choices):
>
> gen Q1_1 = 1 if Q1 == 1 | Q1 == 12 | Q1 == 13 | Q1 == 123
> gen Q1_2 = 1 if Q1 == 2 | Q1 == 12 | Q1 == 23 | Q1 == 123
> gen Q1_3 = 1 if Q1 == 3 | Q1 == 13 | Q1 == 23 | Q1 == 123
>
> But this method is rather troublesome(the maximum
> number of choices is seven, and there are as many as
> more than 80 questions in the original data!) and easy
> to mistype.
> I'd like to know if there is some better way to do the
> same thing.
>

Ulrich Kohler
>
> My strategy would be to generate a string-variable and use the
> string-function -index(s1,s2)-. The following might work
> (not tested):
>
>
> foreach var of varlist Q1 Q2 Q3 {           /* Starts a
> loop over vars */
>   gen str1 `var's= ""                       /* empty string var */
>   replace `var's = string(`var')            /* Fill string
> with contents */
>   forval i=1/7 {
>     gen `var'_`i' = index(`var's,"1") ~= 0  /* var=1 if val
> exists, else 0 */
>   }
> }
>
> (This works already for seven choices. You might replace Q1
> Q2 Q3 with the
> names of the 80 variables.)
>

This would be my approach too. It can be corrected for a typo
and condensed a little further.

foreach var of varlist Q1 Q2 Q3 {
forval i=1/7 {
gen `var'_`i' = index(string(`var'),"`i'") ~= 0
}
}

That is, you don't need to create the string version
of each variable: you can do the search on the fly.

A further extension would be something like this:

foreach var of varlist Q* {
forval i=1/7 {
capture assert index(string(`var'),"`i'") == 0
if _rc {
gen `var'_`i' = index(string(`var'),"`i'") ~= 0
}
}
}

Suppose on Q42, for example, there are only
5 choices, 1 2 3 4 5, so nobody answers 6 or 7,
and we don't need an indicator variable for
either 6 or 7. When we get to

assert index(string(Q42),"6") == 0

this assertion will be true of all the data
and the return code eaten by -assert- will
be 0. But any assertion which is false for
some of the data will result in a non-zero
return code and generation of a new variable.
On the other hand, this approach will zap
variables for choices which were possible
but happen to have been chosen by none
of the sample.

For more, see the manual entries on -assert-
and -capture-.

Nick
n.j.cox@durham.ac.uk
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```