Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Cleaning Survey Data


From   Erika Kociolek <ekociole@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: Cleaning Survey Data
Date   Wed, 5 Feb 2014 09:47:28 -0800

When working with survey data - specifically closed-ended, multiple
response questions - datasets are often structured like this:

Q1_R1 Q1_R2 Q1_R3 Q1_ROTHER
1           3        98       "lemons"
2
1           2

I ultimately want to know the number of respondents that selected 1,
2, 3, 98, so I write code that looks something like this:

local values 1 2 3 98

foreach x of local values {
    generate Q1_`x'_flag = `x' if (Q1_R1 == `x' | Q1_R2 == `x' | Q1_R3 == `x')
}

Is there a better way to get to the goal (what's below)?

label define Q1_1_label "Milk"
label values Q1_1_flag Q1_1_label
label define Q1_2_label "Bread"
label values Q1_2_flag Q1_2_label
label define Q1_3_label "Apples"
label values Q1_3_flag Q1_3_label
label define Q1_98_label "Other"
label values Q1_98_flag Q1_98_label

Q1_1_flag   Q1_2_flag   Q1_3_flag   Q1_98_flag
1                                  1               1
                  1
1                1

It can be tedious to type out "if (Q1_R1 == | Q1_R2 == | Q1_R3 == |
...)" when different questions have different numbers of variables and
there are many possible responses to a given question (i.e. Q1_R1
through Q1_R17).

Thanks for any advice you have.

Best,
Erika
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index