Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: still having problems with converting string variables


From   Suzy <[email protected]>
To   [email protected]
Subject   Re: st: RE: still having problems with converting string variables
Date   Sun, 17 Oct 2004 13:03:36 -0400

I need the frequency of some of the categories under the string variables. For example, for a numeric variable such as sex, I get the frequency easily....
. tab sex

sex | Freq. Percent Cum.
------------+-----------------------------------
1 | 133,769 40.88 40.88
2 | 193,485 59.12 100.00
------------+-----------------------------------
Total | 327,254 100.00

For the string variables such as dxcode1, I cannot obtain the frequency for either code:

tab dxcode1 if dxcode1==4010-
invalid syntax
r(198);

. tab dxcode1 if dxcode1==40200
type mismatch
r(109);

Thus, I felt that I needed to destring the variable dxcode1. But because some observations are coded with dashes, it seems that the only way to get it to work is to use the ignore and force option, but then I'd lose the dashed observations since STATA would treat them as missing. So that has been my dilemma.

Suzy




Nick Cox wrote:


-destring- is telling you what the problem is, but you also know yourself what the problem is, the "-" characters.


From what you say, you cannot ignore these characters,
even though -destring- will do that with the -ignore()- option, as then "4010" will be mixed up with "4010-".
It appears unclear why you want to -destring- in this circumstance, as your variables are not really numeric in content and you just run the risk of losing information.
If, however, you need a numeric categorical variable for some purpose, e.g. -anova-, then -encode- not -destring- is the appropriate way to do it.
Nick [email protected]
Suzy


I have a number of string variables in my dataset. Within these string variables are observations that are coded with either numeric categorical values coded like this: (4010, 40120) or with dashes (4010-, 4012-) meaning that these four codes represent four different things. Each string variable has hundreds of different categories that are coded in this manner. I'm trying to convert the string variables to numeric and also be able to retain analysing the observations that the contain dashes.

I've tried various destringing options such as these listed below with the following STATA response.:

.destring dxcode1, generate(dxcode1num)
dxcode1 contains non-numeric characters; no generate

.destring dxcode1, replace
dxcode1 contains non-numeric characters; no replace

I've e-mailed once before on this topic, so I apologize in advance for the redundant question, but I'm still having trouble and I'm not computer/STATA savvy. If anyone has some beginner user-friendly advice with explicit coding, it would be much appreciated.

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/





*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index