Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: encode


From   Sarah Mustillo <[email protected]>
To   [email protected]
Subject   st: encode
Date   Fri, 21 Jan 2005 17:47:28 -0500

Hi -

I'm recoding a substantial number of text responses into categorical variables. I'm finding it easier to -encode- the variables with the text responses first, before replacing the categorical variables with the correct value - this way I can avoid typing out all the text responses in the -replace- command and just type their encoded numbers. I have done this for 6 variables, and it worked fine for 5 of them. I cannot figure out what went wrong with the 6th.

The variable I am trying to encode has about 90 categories. When I encode though, the resulting variable I generate begins at number 8 and ends at 238. The first category (text response) gets an 8, the second gets a 12, and so forth. The manual states that -encode- alphabetizes before it encodes, but that doesn't explain my problem. I would still expect the numbers to go sequentially, which they have with the other 5 variables.

Here's what I mean:


. tab custoo_2

custody status of child if #11 |
indicated on previous question | Freq. Percent Cum.
----------------------------------------+-----------------------------------
#1 and #8 | 1 0.02 0.02
2nd Cousin | 1 0.02 0.05
666 | 3,961 95.19 95.24
8, Guardianship | 1 0.02 95.27
888 | 1 0.02 95.29
999 | 82 1.97 97.26
BIO MOM & ADOPTIVE FATHER | 1 0.02 97.28
Bio Mom and Adop. Dad | 1 0.02 97.31
Bio mother & adoptive father | 1 0.02 97.33
COUSIN | 1 0.02 97.36
County has temporary custody | 1 0.02 97.38
DCSF | 1 0.02 97.40


-snip-

. encode custoo_2, gen(place2)

. tab place2

custody status of child if #11 |
indicated on previous question | Freq. Percent Cum.
----------------------------------------+-----------------------------------
666 | 3,961 95.19 95.19
888 | 1 0.02 95.22
999 | 82 1.97 97.19
COUSIN | 1 0.02 97.21
DSS | 5 0.12 97.33
Great Aunt | 1 0.02 97.36
Guardian | 1 0.02 97.38
Legal Guardian | 1 0.02 97.40
Legal guardian | 1 0.02 97.43
Other | 3 0.07 97.50
Self | 1 0.02 97.52
aunt | 1 0.02 97.55
-snip-


. su place2

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
place2 | 4161 12.72242 28.58735 8 238


. tab place2, nolabel

custody |
status of |
child if |
#11 |
indicated |
on previous |
question | Freq. Percent Cum.
------------+-----------------------------------
8 | 3,961 95.19 95.19
12 | 1 0.02 95.22
13 | 82 1.97 97.19
26 | 1 0.02 97.21
44 | 5 0.12 97.33
57 | 1 0.02 97.36
61 | 1 0.02 97.38
68 | 1 0.02 97.40
70 | 1 0.02 97.43
79 | 3 0.07 97.50
84 | 1 0.02 97.52
99 | 1 0.02 97.55
110 | 4 0.10 97.64
114 | 2 0.05 97.69

So why would it skip from 8 to 12, then from 13 to 26 to 44 to 57...etc.?

I tried searching the archives for an answer to no avail. I also tried -sencode- after reading about it on the archives, but didn't have much luck there either.

Thank you.

Sarah

--
Sarah A. Mustillo, Ph.D
Department of Psychiatry and Behavioral Sciences
Duke University School of Medicine
Box 3454
Durham NC 27710

919 687-4686 x231
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index