Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: converting multiple choice (string) response options to numeric values


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: converting multiple choice (string) response options to numeric values
Date   Fri, 7 Feb 2014 11:30:41 +0000

This is quite a common problem, and it's easy to get bitten.

label def mylabels 1 "A" 2 "B" 3 "C" 4 "D" 5 "E"

foreach v of var <varlist> {
     encode `v', gen(n_`v') label(mylabels)
}

is a sketch of how to do it. You must replace <varlist> by an actual varlist.

Alternatively, as said, look at -multencode- (SSC).

Nick
[email protected]


On 7 February 2014 09:22, Nick Cox <[email protected]> wrote:
> Applying -encode- to several variables is a little dangerous. If the
> values "A" to "D" occur for every variable and "E" occurs only for
> those variables for which it is possible, and for all of them, you
> should be fine. But suppose the only answers that occur for one
> variable are "A", "C", "D". Then those will be, by default, mapped to
> 1,2,3. -encode- has by default no intelligence that spots that "B" is
> missing and decides that the appropriate coding is 1, 3, 4. You would
> need to define value labels in advance and specify those as the labels
> to be used.
>
> Note also -multencode- (SSC).
>
> Nick
> [email protected]
>
>
> On 7 February 2014 08:04, Ronnie Babigumira <[email protected]> wrote:
>> encode worked just fine. What you see as the "exact same variable" is
>> just the label
>>
>> *****
>> clear *
>> input id str1 qn1 str1 strqn3
>> 1 A D
>> 2 A A
>> 3 E B
>> 4 B C
>> end
>>
>> encode qn1, g(nqn1)
>> list
>> list, nolabel
>> *****
>>
>> Ps: note the label option of encode which allows you to provide your own label
>>
>> On Fri, Feb 7, 2014 at 1:59 AM, Katherine Picho <[email protected]> wrote:
>>> I have a huge dataset which has test data with multiple choice
>>> questions. 2 questions have choices A -E,  and the rest have 4 options
>>> A-D
>>>
>>> I was looking to convert these options to numeric values with A
>>> corresponding to 1, B=2, etc.
>>>
>>> I'm using stata 12.
>>>
>>> I tried using the egen newvar= group (oldvar) command, it seems to
>>> work for some questions but not others. For instance the sequence of
>>> the 1st 5 students' answers for question 18 are  AAAAA, which should
>>> translate to 5 consecutive 1s..but I get consecutive 2s instead.
>>>
>>> For another test question 10, a value of 6 is reported for one
>>> observation which actually has a letter value of C which should
>>> correspond to a value of 3.
>>>
>>> I also tried encode oldvar, gen (newvar)
>>> but I get the exact same variable data as in the original (i.e.
>>> letters, not numbers) even though the data storage type now shows
>>> 'long'
>>>
>>> I've checked to make sure there is consistency in data entry and there
>>> appears to be; i.e. all responses are entered in capital letters, and
>>> there is no mix of numeric and letters in the same variable/ column.
>>>
>>> What am I doing wrong? Any thoughts on this problem would be highly
>>> welcome as I dread the idea of having to manually convert these
>>> letters to numbers!
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index