Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Unanticipated behavior of -encode-


From   "Eric A. Booth" <[email protected]>
To   [email protected]
Subject   Re: st: Unanticipated behavior of -encode-
Date   Mon, 19 Aug 2013 23:15:18 -0500

<>
It's because the automatic label name created by the -encode- is
"temp" in both cases.  So, the second time through the loop -encode-
adds more categories to your already defined label "temp".  In your
loop, add the command -label list- to see this in action.

To prevent this, add the command    -label drop temp- to the end of
your loop or take advantage of the 'label()' option for -encode- to
create a custom label name for each encode (e.g. add:
"label(label`v')" to your extant -encode- command) in your loop.

- Eric


On Mon, Aug 19, 2013 at 10:18 PM, Lacy,Michael
<[email protected]> wrote:
> Under certain circumstances, -encode- will number  the numeric version of a string variable starting where it left off at the last encode, rather
> than starting at 1.  I encountered this while encoding a varlist of string variables in a large file, which gave me oddities such
> a string variable with the values "male" and "female" being encoded with large consecutive numbers rather than with 1 and 2.
> This is hardly tragic, but it is inconvenient, and not behavior I could anticipate from the documentation of -encode-.
>
>  Here's an example of code showing a mild version of this:
>
> clear
> version 13
> set seed 23456
> set obs 4
> gen str x = cond(runiform() > 0.5, "this", "that")
> gen str y = cond(runiform() > 0.5, "blue", "green ")
> //
> foreach v of varlist x y {
>    encode `v', gen(temp)
>    drop `v'
>    rename temp `v'
> }
> tab1 x y, nolab
> //
> -> tabulation of x
>
>           x |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>           1 |          2       50.00       50.00
>           2 |          2       50.00      100.00
> ------------+-----------------------------------
>       Total |          4      100.00
>
> -> tabulation of y
>
>           y |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>           3 |          3       75.00       75.00
>           4 |          1       25.00      100.00
> ------------+-----------------------------------
>       Total |          4      100.00
>
>
> I would expect both x and y to be encoded with 1 and 2. This oddity can be avoided by not using "temp" repeatedly, but I'm curious if others can explain why this
> occurs
>
> Regards,
>
>
> Mike Lacy
> Dept. of Sociology
> Colorado State University
> Fort Collins CO 80523-1784
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index