Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Organizing non-ranked multiple responses by generating new variable


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Organizing non-ranked multiple responses by generating new variable
Date   Sun, 19 Feb 2012 08:51:19 +0000

The details of the re-categorization are really up to you. My guess is
that there are several groupings of religions and denominations, each
with some claims to being standard, but no two agreeing completely.

But the mechanics in Stata can be simplified a bit, particularly by
using -inlist()-. See -help inlist()- and

SJ-6-4  dm0026  . . . . . . Stata tip 39: In a list or out? In a range or out?
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q4/06   SJ 6(4):593--595                                 (no commands)
        tip for use of inlist() and inrange()

The last is accessible to all readers. -search inlist- will bring up
clickable links.

In addition.

. replace bq1_a22b_rel_cd1 = 1 if bq1_a22_othrel==1 &
bq1_a22b_rel_cd1==3 & bq1_a22b_rel_cd1!=.

could just be

. replace bq1_a22b_rel_cd1 = 1 if bq1_a22_othrel==1 & bq1_a22b_rel_cd1==3

as if a value is 3 it is necessarily not missing.

Also, many people would use -recode- here. (I wouldn't, but that is a
matter of personal taste.)

Nick

On Sat, Feb 18, 2012 at 9:27 PM, JD Wright <jwright@ses.gtu.edu> wrote:

> I am currently organizing a data set. There is one question in particular
> (see A22b below) that has an "other: specify" option that has generated
> multiple responses. These responses have been coded by the original data
> collectors by assigning the response a number between 1-98; each time a
> respondent mentioned some new “other” it was assigned the next available
> number.
>
> The main question that begins the question at hand is about whether one is
> religious or not:
>
> Q. A21: Do you identify yourself with any religion? [bq1_a21_rel_ind]
> 1. Yes
> 0. No
> 8. Don't Know
> 9. Refused
>
> If the respondent answered "Yes"
>
> Then a second question is asked:
>
> Q. A22a. What religion? [BQ1_A22_REL_TYPE]
> 1. Buddhist
> 2. Christian
> 3. Hindu
> 4. Jewish
> 5. Muslim
> 6. Other (Specify): [this becomes a new variable in the data set:
> bq1_a22_othrel and is followed by a new variable for the coded response
> bq1_a22b_rel_cd1]
> [There is no option "7"]
> 8. Don't Know
> 9. Refused
>
> If option “6. Other (Specify)” was chosen then respondents were asked to
> fill in the blank. This generated 88 responses—albeit some multiple “other”
> responses were organized under “Catholic” and “Mormon,” many responses were
> unique.
>
> I went through these options and further categorized them according to the
> original question Q. A22a—since many of the new answers, e.g., Adventist,
> Baptist, etc. could be easily organized under the “Christian” category. This
> makes more sense than leaving such options categorized under “other.”
>
> So my question is “How is the most efficient way to organize these?”
>
> Here is the current approach I am using, but it seems too repetitive and
> cumbersome:
>
> _______________________________________________________________
> **[I am generating a new variable in order to organize all of these multiple
> responses]
> . generate bq1_a22b_rel_cd1_type = .
>
> ***Buddhist [Buddhism was mentioned in “Other” and coded as “03”
> **So I proceeded to begin defining my new variable]
> . replace bq1_a22b_rel_cd1 = 1 if (bq1_a22_othrel==1 & bq1_a22b_rel_cd1==3 &
> bq1_a22b_rel_cd1!=.)
>
> ****Christian [Here is where it becomes more difficult
> ****because there are 30 different codes that can be categorized
> *****as Christian]
> . replace bq1_a22b_rel_cd1 = 2 if (bq1_a22_othrel==1 & bq1_a22b_rel_cd1!=.)
> & (bq1_a22b_rel_cd1==67 | bq1_a22b_rel_cd1==43 | bq1_a22b_rel_cd1==13 [etc.,
> i.e., using “ | bq1_a22b_rel_cd1==” with each of the following codes below])
>
> ____________________________________________
>
> [These are the codes for Christians
> 67,43,13,04,82,59,05,37,80,06,64,46,41,10,14,12,16,71,76,44,29,09,57,63,34,08,07,31,12,69]
>
> I also planned on continuing this for the other categories of Q. A21 (Hindu,
> Jewish, Muslim, Other) using the other coded “other” responses.
>
> Once that was done I planned on generating another new variable so that I
> could combine answers from Q. A22a and Q. A22b in one variable that
> reflected the values of the original question Q. A22a.
>
> Note: I have also come across posts about egen and forvalues, etc. in terms
> of organizing multiple responses or organizing data, but none have addressed
> an example quite like this one where there really is no order or logic to
> numbers assigned and they are not necessarily sequential either.
>
> My knowledge of Stata is obviously limited … so I am not even sure if my
> initial inclination to deal with such data by generating a new variable,
> then replacing values, is even a typical approach.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index