Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Eric A. Booth" <eric.a.booth@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Unanticipated behavior of -encode- |
Date | Mon, 19 Aug 2013 23:15:18 -0500 |
<> It's because the automatic label name created by the -encode- is "temp" in both cases. So, the second time through the loop -encode- adds more categories to your already defined label "temp". In your loop, add the command -label list- to see this in action. To prevent this, add the command -label drop temp- to the end of your loop or take advantage of the 'label()' option for -encode- to create a custom label name for each encode (e.g. add: "label(label`v')" to your extant -encode- command) in your loop. - Eric On Mon, Aug 19, 2013 at 10:18 PM, Lacy,Michael <Michael.Lacy@colostate.edu> wrote: > Under certain circumstances, -encode- will number the numeric version of a string variable starting where it left off at the last encode, rather > than starting at 1. I encountered this while encoding a varlist of string variables in a large file, which gave me oddities such > a string variable with the values "male" and "female" being encoded with large consecutive numbers rather than with 1 and 2. > This is hardly tragic, but it is inconvenient, and not behavior I could anticipate from the documentation of -encode-. > > Here's an example of code showing a mild version of this: > > clear > version 13 > set seed 23456 > set obs 4 > gen str x = cond(runiform() > 0.5, "this", "that") > gen str y = cond(runiform() > 0.5, "blue", "green ") > // > foreach v of varlist x y { > encode `v', gen(temp) > drop `v' > rename temp `v' > } > tab1 x y, nolab > // > -> tabulation of x > > x | Freq. Percent Cum. > ------------+----------------------------------- > 1 | 2 50.00 50.00 > 2 | 2 50.00 100.00 > ------------+----------------------------------- > Total | 4 100.00 > > -> tabulation of y > > y | Freq. Percent Cum. > ------------+----------------------------------- > 3 | 3 75.00 75.00 > 4 | 1 25.00 100.00 > ------------+----------------------------------- > Total | 4 100.00 > > > I would expect both x and y to be encoded with 1 and 2. This oddity can be avoided by not using "temp" repeatedly, but I'm curious if others can explain why this > occurs > > Regards, > > > Mike Lacy > Dept. of Sociology > Colorado State University > Fort Collins CO 80523-1784 > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/