Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Organizing non-ranked multiple responses by generating new variable

From   JD Wright <>
Subject   st: Organizing non-ranked multiple responses by generating new variable
Date   Sat, 18 Feb 2012 13:27:29 -0800 (PST)


I am currently organizing a data set. There is one question in particular
(see A22b below) that has an "other: specify" option that has generated
multiple responses. These responses have been coded by the original data
collectors by assigning the response a number between 1-98; each time a
respondent mentioned some new “other” it was assigned the next available

The main question that begins the question at hand is about whether one is
religious or not: 

Q. A21: Do you identify yourself with any religion? [bq1_a21_rel_ind]
1. Yes
0. No
8. Don't Know
9. Refused 

If the respondent answered "Yes" 

Then a second question is asked: 

Q. A22a. What religion? [BQ1_A22_REL_TYPE]
1. Buddhist
2. Christian
3. Hindu
4. Jewish
5. Muslim
6. Other (Specify): [this becomes a new variable in the data set:
bq1_a22_othrel and is followed by a new variable for the coded response
[There is no option "7"]
8. Don't Know
9. Refused

If option “6. Other (Specify)” was chosen then respondents were asked to
fill in the blank. This generated 88 responses—albeit some multiple “other”
responses were organized under “Catholic” and “Mormon,” many responses were

I went through these options and further categorized them according to the
original question Q. A22a—since many of the new answers, e.g., Adventist,
Baptist, etc. could be easily organized under the “Christian” category. This
makes more sense than leaving such options categorized under “other.” 

So my question is “How is the most efficient way to organize these?”

Here is the current approach I am using, but it seems too repetitive and

**[I am generating a new variable in order to organize all of these multiple
. generate bq1_a22b_rel_cd1_type = .

***Buddhist [Buddhism was mentioned in “Other” and coded as “03”
**So I proceeded to begin defining my new variable]
. replace bq1_a22b_rel_cd1 = 1 if (bq1_a22_othrel==1 & bq1_a22b_rel_cd1==3 &

****Christian [Here is where it becomes more difficult 
****because there are 30 different codes that can be categorized 
*****as Christian] 
. replace bq1_a22b_rel_cd1 = 2 if (bq1_a22_othrel==1 & bq1_a22b_rel_cd1!=.)
& (bq1_a22b_rel_cd1==67 | bq1_a22b_rel_cd1==43 | bq1_a22b_rel_cd1==13 [etc.,
i.e., using “ | bq1_a22b_rel_cd1==” with each of the following codes below])


[These are the codes for Christians

I also planned on continuing this for the other categories of Q. A21 (Hindu,
Jewish, Muslim, Other) using the other coded “other” responses. 

Once that was done I planned on generating another new variable so that I
could combine answers from Q. A22a and Q. A22b in one variable that
reflected the values of the original question Q. A22a. 

Note: I have also come across posts about egen and forvalues, etc. in terms
of organizing multiple responses or organizing data, but none have addressed
an example quite like this one where there really is no order or logic to
numbers assigned and they are not necessarily sequential either. 

My knowledge of Stata is obviously limited … so I am not even sure if my
initial inclination to deal with such data by generating a new variable,
then replacing values, is even a typical approach. 

I would appreciate any guidance. Thank you, Jaime 

View this message in context:
Sent from the Statalist mailing list archive at

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index