Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: encode


From   Sarah Mustillo <[email protected]>
To   [email protected]
Subject   Re: st: encode
Date   Fri, 21 Jan 2005 20:16:30 -0500

Thank you Phil and Daniel for responding to my question on a Friday evening! Phil - you were absolutely right. I tried to drop the label associated with the variable I was encoding - and that didn't work, so I thought perhaps that wasn't my problem. But, when I changed the name of the variable I was encoding, I got the correct results. Thank you, thank you! I appreciate your help!

Sarah



Phil Schumm wrote:


At 5:47 PM -0500 1/21/05, Sarah Mustillo wrote:

I'm recoding a substantial number of text responses into categorical variables. I'm finding it easier to -encode- the variables with the text responses first, before replacing the categorical variables with the correct value - this way I can avoid typing out all the text responses in the -replace- command and just type their encoded numbers. I have done this for 6 variables, and it worked fine for 5 of them. I cannot figure out what went wrong with the 6th.

The variable I am trying to encode has about 90 categories. When I encode though, the resulting variable I generate begins at number 8 and ends at 238. The first category (text response) gets an 8, the second gets a 12, and so forth. The manual states that -encode- alphabetizes before it encodes, but that doesn't explain my problem. I would still expect the numbers to go sequentially, which they have with the other 5 variables.


Sara,

One possibility is that there is a pre-existing value label with the same name as the target variable you are encoding to and which already contains some of the same values that are in the variable you are trying to encode (your description of what you are doing suggests that this may have been the case). For example:


. input str1 y

y
1. a
2. b
3. c
4. end

. encode y, gen(target)

. lab li target
target:
1 a
2 b
3 c

. drop target


(Note that although I have dropped the variable target, the corresponding value label still exists.)


. input str1 x

x
1. c
2. d
3. e

. encode x, gen(target)

. tab target, nol

target | Freq. Percent Cum.
------------+-----------------------------------
3 | 1 33.33 33.33
4 | 1 33.33 66.67
5 | 1 33.33 100.00
------------+-----------------------------------
Total | 3 100.00


As you can see, this produces the same result you observed. The reason is that the pre-existing value label target was used by -encode-, with new values being added to accommodate the values in x that were not already there:


. lab li target
target:
1 a
2 b
3 c
4 d
5 e


It is easy to determine if this is what happened -- just look at the value label attached to your target variable. And if this is what happened, then the fix is simple. Just encode to a different target, or use -label drop- first.


-- Phil
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

--
Sarah A. Mustillo, Ph.D
Department of Psychiatry and Behavioral Sciences
Duke University School of Medicine
Box 3454
Durham NC 27710

919 687-4686 x231
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index