Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <n.j.cox@durham.ac.uk> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: destring ignores more than what specified in ignore() |

Date |
Mon, 21 Nov 2011 10:36:19 +0000 |

On the information here destring <varlist>, replace ignore("nas") or destring <varlist>, replace force should work. Note that you don't need to set up your own loop or a prior filter of numeric variables; -destring- will do both for you. Nick n.j.cox@durham.ac.uk -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: 21 November 2011 08:22 To: statalist@hsphsun2.harvard.edu Subject: Re: st: destring ignores more than what specified in ignore() -destring- ignores characters, not substrings. The problem is at most that this is not clear to you when you read the help. -destring- did what you told it to do, which was, among other things, to remove ".". You need to fix your "n.a." and "n.s." first, e.g. within a loop replace `var' = subinstr("`var'", "n.a.", ".", .) replace `var' = subinstr("`var'", "n.s.", ".", .) or as you did it. -destring- is just a wrapper for -real()-, so -real()- is not really an alternative except in so far as -destring- is not understood. Your code is shorter and more efficient than -destring- as it can be tailored to your problem. In fact your last code segment can be shortened as -real("n.a.")- for example results in numeric missing. Nick On Mon, Nov 21, 2011 at 1:51 AM, Impavido, Gregorio <GImpavido@imf.org> wrote: > I looked at the many FAQ on destring but could not find an answer for my problem. Hence, the post and hopefully, it is not a duplicate. > > I have a dataset with an unknown (ex ante) number of string variables containing entries of the following three types: (i) "###.###"; (ii) "n.a."; and "n.s.". > > These variables should be numeric and I would like to destring them by coding: > > foreach var of varlist * { > capture confirm numeric variable `var' > if _rc { > destring `var', replace ignore("n.a." "n.s.") > } > } > > This does not work as destring, for some inexplicable (to me) reason, treats "." as a separate non numeric character from "n.a." or "n.s.". > > Therefore, it drops the "." in the entries like "###.###" changing them in double numeric ######. Same happens if option is specified as ignore("n.a" "n.s") (i.e., without final "."). > > > First question (of two): Why is destring ignoring more things than what specified in the option ignore()? > > I found two ways around this odd behaviour of destring. > > The first option uses an extra line of code and it is: > > foreach var of varlist * { > capture confirm numeric variable `var' > if _rc { > replace `var' = "na" if inlist(`var', "n.a.", "n.s.") // this gets rid of the "." > destring `var', replace ignore("na") // no "." here!!! > } > } > > This preserves both the order and the variable labels of my original string variables (which I need in subsequent code) but it uses again the dreaded destring command (after seeing how it treats "n.a.", I don't "trust" it anymore). > > The second option uses generate with the real() function but also more lines of code as real() does not work with replace. > > foreach var of varlist * { > capture confirm numeric variable `var' > if _rc { > replace `var' = "." if inlist(`var', "n.a.", "n.s.") > local lbl : variable label `var' > gen `var'r = real(`var') > label var `var'r `"`lbl'"' > order `var'r, after(`var') > drop `var' > } > } > > Both loops seem to end up with numeric only variables in the same order and with the same variable labels as the original dataset. My second question is: should we use real() instead of destring when possible, which is more "fool proof" (my third loop is much faster than the other two)? > > Finally, is there a more efficient way to get where I want without writing all this code (especially the last loop)? > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: destring ignores more than what specified in ignore()***From:*"Impavido, Gregorio" <GImpavido@imf.org>

**References**:**st: destring ignores more than what specified in ignore()***From:*"Impavido, Gregorio" <GImpavido@imf.org>

**Re: st: destring ignores more than what specified in ignore()***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**RE: st: means compairison with weights and unequal variance** - Next by Date:
**st: RE: Probability and non-response weights; how can I create a composite weight?** - Previous by thread:
**Re: st: destring ignores more than what specified in ignore()** - Next by thread:
**RE: st: destring ignores more than what specified in ignore()** - Index(es):