[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Christian Holz" <statalist@krueschan.de> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
AW: st: Re: destringing values led to Stata recoding them as missing |

Date |
Sat, 28 Aug 2004 12:26:58 +0200 |

Hi Suzy, first I think it is worth stressing that you should indeed very carefully consider the point with which Daniel came up recently: Although it is technically not very hard to remove nonnumeric characters from your string to allow the destring command to produce numbers (see below), you should be sure that it is that what you want. You should carefully thing whether your data is really numeric in the meaning of interval or ratio (or at least ordinal) level of measurement. You can of course for example perform a regression analysis with an RHS variable in which is a value 1002=diabetes and 1003=malaria and 1004=hernia or what ever and Stata will give you estimates for that regression, but interpreting those coefficients will not be very meaningful. But besides these objections, you may use the following code to remove everything which is not a number from your string variables. #delimit; foreach varname of varlist xvar {; local i 1; while `i'<=_N {; local digit 1; local tempstring ""; while `digit' <= length(`varname'[`i']) {; local s_digit =substr(`varname'[`i'],`digit',1); if ("`s_digit'">="0"&"`s_digit'"<="9") local tempstring="`tempstring'`s_digit'"; local digit=`digit'+1; }; replace `varname'="`tempstring'" in `i'; local i=`i'+1; }; destring(`varname'), replace; }; Please note that you have to wirte all the variable names of the variables which you want to convert into numeric instead of xvar in the first opening line. Please note further that the code will replace all the values in your original variables whith numeric ones. The program does as follows: Original (from your message): . d Contains data obs: 4 vars: 4 size: 100 (99.9% of memory free) ---------------------------------------------------------------------------- --- storage display value variable name type format label variable label ---------------------------------------------------------------------------- --- patient float %9.0g var1 str5 %9s var2 str6 %9s var3 str6 %9s ---------------------------------------------------------------------------- --- Sorted by: Note: dataset has changed since last saved . l +-----------------------------------+ | patient var1 var2 var3 | |-----------------------------------| 1. | 1001 1235- V2347 456 | 2. | 1002 1233 143135 E28950 | 3. | 1003 38568 05476- 89076 | 4. | 1004 126 333 v5678 | +-----------------------------------+ Will be as follows after running the code (which may take some time in your cases with 300k observations) . d Contains data obs: 4 vars: 4 size: 80 (99.9% of memory free) ---------------------------------------------------------------------------- --- storage display value variable name type format label variable label ---------------------------------------------------------------------------- --- patient float %9.0g var1 long %10.0g var2 long %10.0g var3 long %10.0g ---------------------------------------------------------------------------- --- Sorted by: Note: dataset has changed since last saved . l +----------------------------------+ | patient var1 var2 var3 | |----------------------------------| 1. | 1001 1235 2347 456 | 2. | 1002 1233 143135 28950 | 3. | 1003 38568 5476 89076 | 4. | 1004 126 333 5678 | +----------------------------------+ Best wishes Christian. -----Ursprüngliche Nachricht----- Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Suzy Gesendet: 28 August 2004 05:26 An: statalist@hsphsun2.harvard.edu Betreff: Re: st: Re: destringing values led to Stata recoding them as missing Dear Daniel, I used the destring option because I wasn't able to analyze the data as is - I would get error messages regarding not being able to analyze string. These values are codes that represent disorders, so you are correct. But since I am a fairly new user of Stata, I just figured that it couldn't read those values because of the dashes or the alpha-numeric since the datapoints that were only numbers were read and analyzed with no problem. Daniel Egan wrote: >Hi Suzy, > >Just to be clear, are you sure you want to create numeric values? The usual >reason for destringing a variable is that it IS a numeric variable that has >typos which cause it to be regarded as text. Is this is a continuous >variable that does have a numeric (linear etc) relationship. If each of >these string variables represent different disorders, you should have a good >methodological reason for making them numeric. Otherwise, keep them in an >"apples and oranges" arrangement of strings, i.e. diabetes (1003) is not >"one more than" malaria (1002)... > >In essence, if you want to use each of these variables as categoricals, they >are fine as is - as strings. You will be able to analyze them as strings, in >a categorical or dummy variable sense. > > >I may be way off here, but just wanted to make sure you knew you could >analyze them as is..... > >Apologies if I am being obvious. > >Dan > >----- Original Message ----- >From: "Suzy" <scott_788@wowway.com> >To: <statalist@hsphsun2.harvard.edu> >Sent: Friday, August 27, 2004 5:44 PM >Subject: st: destringing values led to Stata recoding them as missing > > >| Dear Statalisters; >| >| I have seven variables of over 300,000 observations each. Within each >| variable, I have over 2000 different values. These datapoints >| represent specific codes - for example : (72200 = intervertebral disc >| disorder). Within each of these seven variables, there are datapoints >| (values) with dashes or alphabets (Ie: 4109- or V2389). The majority >| of the values though, are purely numeric (23405). I used the destring >| option so that I could analyze the data and Stata treated all those >| datapoints that included dashes and alphabets as missing. Now there is a >| period . where there used to be a value. I have two questions: >| >| 1. Will the restring option restore the datapoints? >| >| 2. How can I successfully "destring" these values so that I can include >| them in my analysis? >| >| Any help and/or specific code would be very helpful as I am only >| marginally competent with Stata basics. >| >| Thank you! >| Suzy >| >| >| * >| * For searches and help try: >| * http://www.stata.com/support/faqs/res/findit.html >| * http://www.stata.com/support/statalist/faq >| * http://www.ats.ucla.edu/stat/stata/ >| >* >* For searches and help try: >* http://www.stata.com/support/faqs/res/findit.html >* http://www.stata.com/support/statalist/faq >* http://www.ats.ucla.edu/stat/stata/ > > > > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**AW: st: Re: destringing values led to Stata recoding them as missing***From:*"Christian Holz" <statalist@krueschan.de>

**References**:**Re: st: Re: destringing values led to Stata recoding them as missing***From:*Suzy <scott_788@wowway.com>

- Prev by Date:
**AW: st: Re: destringing values led to Stata recoding them as missing** - Next by Date:
**st: Re: destringing** - Previous by thread:
**Re: st: Re: destringing values led to Stata recoding them as missing** - Next by thread:
**AW: st: Re: destringing values led to Stata recoding them as missing** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |