Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: converting multiple choice (string) response options to numeric values |
Date | Fri, 7 Feb 2014 13:30:27 +0000 |
See also on a different variant of the same problem SJ-11-2 dm0057 . . . . . . . . . Stata tip 99: Taking extra care with encode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Schechter Q2/11 SJ 11(2):321--322 (no commands) tip on safely using encode across datasets Nick njcoxstata@gmail.com On 7 February 2014 11:30, Nick Cox <njcoxstata@gmail.com> wrote: > This is quite a common problem, and it's easy to get bitten. > > label def mylabels 1 "A" 2 "B" 3 "C" 4 "D" 5 "E" > > foreach v of var <varlist> { > encode `v', gen(n_`v') label(mylabels) > } > > is a sketch of how to do it. You must replace <varlist> by an actual varlist. > > Alternatively, as said, look at -multencode- (SSC). > > Nick > njcoxstata@gmail.com > > > On 7 February 2014 09:22, Nick Cox <njcoxstata@gmail.com> wrote: >> Applying -encode- to several variables is a little dangerous. If the >> values "A" to "D" occur for every variable and "E" occurs only for >> those variables for which it is possible, and for all of them, you >> should be fine. But suppose the only answers that occur for one >> variable are "A", "C", "D". Then those will be, by default, mapped to >> 1,2,3. -encode- has by default no intelligence that spots that "B" is >> missing and decides that the appropriate coding is 1, 3, 4. You would >> need to define value labels in advance and specify those as the labels >> to be used. >> >> Note also -multencode- (SSC). >> >> Nick >> njcoxstata@gmail.com >> >> >> On 7 February 2014 08:04, Ronnie Babigumira <rb.glists@gmail.com> wrote: >>> encode worked just fine. What you see as the "exact same variable" is >>> just the label >>> >>> ***** >>> clear * >>> input id str1 qn1 str1 strqn3 >>> 1 A D >>> 2 A A >>> 3 E B >>> 4 B C >>> end >>> >>> encode qn1, g(nqn1) >>> list >>> list, nolabel >>> ***** >>> >>> Ps: note the label option of encode which allows you to provide your own label >>> >>> On Fri, Feb 7, 2014 at 1:59 AM, Katherine Picho <thestatsbabe@gmail.com> wrote: >>>> I have a huge dataset which has test data with multiple choice >>>> questions. 2 questions have choices A -E, and the rest have 4 options >>>> A-D >>>> >>>> I was looking to convert these options to numeric values with A >>>> corresponding to 1, B=2, etc. >>>> >>>> I'm using stata 12. >>>> >>>> I tried using the egen newvar= group (oldvar) command, it seems to >>>> work for some questions but not others. For instance the sequence of >>>> the 1st 5 students' answers for question 18 are AAAAA, which should >>>> translate to 5 consecutive 1s..but I get consecutive 2s instead. >>>> >>>> For another test question 10, a value of 6 is reported for one >>>> observation which actually has a letter value of C which should >>>> correspond to a value of 3. >>>> >>>> I also tried encode oldvar, gen (newvar) >>>> but I get the exact same variable data as in the original (i.e. >>>> letters, not numbers) even though the data storage type now shows >>>> 'long' >>>> >>>> I've checked to make sure there is consistency in data entry and there >>>> appears to be; i.e. all responses are entered in capital letters, and >>>> there is no mix of numeric and letters in the same variable/ column. >>>> >>>> What am I doing wrong? Any thoughts on this problem would be highly >>>> welcome as I dread the idea of having to manually convert these >>>> letters to numbers! * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/